Commit Graph

5612 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan 09d47af2c0 Optimize zscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-10 17:15:33 -06:00
Martin Kroeker ef0238ba2b
Merge pull request #3130 from martin-frbg/issue3128
Replace spurious AVX512 requirement in the Haswell srot microkernel with an AVX2/FMA3 guard
2021-03-06 19:15:53 +01:00
Martin Kroeker a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard 2021-03-06 14:35:49 +01:00
Martin Kroeker 1d254d321b
Merge pull request #3129 from RajalakshmiSR/asum_p10
Optimize s/dasum function for POWER10
2021-03-06 09:13:59 +01:00
Rajalakshmi Srinivasaraghavan 41646ed006 Optimize s/dasum function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-03-05 16:22:36 -06:00
Martin Kroeker 3679781872
Merge pull request #3126 from martin-frbg/m1bench
Support timing Apple M1 in the benchmarks
2021-03-02 21:27:21 +01:00
Martin Kroeker 38dcf3454b
Support timing Apple M1 2021-03-02 17:50:55 +01:00
Martin Kroeker e34d57ca90
Merge pull request #3125 from martin-frbg/issue3123
Fix AMD AOCC compiler detection
2021-03-02 09:58:40 +01:00
Martin Kroeker 20f492c298
Fix AMD AOCC compiler detection 2021-03-01 21:00:10 +01:00
Martin Kroeker c7c82be1c3
Merge pull request #3122 from martin-frbg/xeigtstz
Fix unusual stack size requirements of the LAPACK EIG tests (from Reference-LAPACK PR 335)
2021-02-28 22:13:09 +01:00
Martin Kroeker 9564f688c4
Adjust build rules for ?chkee.F 2021-02-28 18:57:05 +01:00
Martin Kroeker 90c1776c86
Adjust build rules for ?chkee.F 2021-02-28 18:53:20 +01:00
Martin Kroeker 9cf861e8fa
Add rewritten cchkee.F from Reference-LAPACK PR335 2021-02-28 18:51:03 +01:00
Martin Kroeker 9b7b1da133
Add rewritten dchkee.F from Reference-LAPACK PR335 2021-02-28 18:50:26 +01:00
Martin Kroeker a5ab891292
Add rewritten schkee.F from Reference-LAPACK PR335 2021-02-28 18:49:50 +01:00
Martin Kroeker 90bb4ac821
Add rewritten zchkee.F from Reference-LAPACK PR335 2021-02-28 18:49:10 +01:00
Martin Kroeker 23a0d1bc1f
Delete zchkee.f 2021-02-28 18:47:06 +01:00
Martin Kroeker 0e96c378fd
Delete schkee.f 2021-02-28 18:46:52 +01:00
Martin Kroeker ee16efff3c
Delete dchkee.f 2021-02-28 18:46:38 +01:00
Martin Kroeker 0197519dd7
Delete cchkee.f 2021-02-28 18:46:08 +01:00
Martin Kroeker 865829cfac
Merge pull request #3121 from RajalakshmiSR/mmarename
POWER10: Rename mma builtins
2021-02-27 19:15:49 +01:00
Rajalakshmi Srinivasaraghavan 0571c3187b POWER10: Rename mma builtins
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.

Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
2021-02-26 20:56:34 -06:00
Martin Kroeker d12a2d0d04
Merge pull request #3120 from martin-frbg/3118-x
Fix use of undefined CC variable in f_check
2021-02-26 11:50:47 +01:00
Martin Kroeker 2d369bd916
fix undefined CC variable 2021-02-26 09:09:43 +01:00
Martin Kroeker 93843c55b6
Merge pull request #15 from xianyi/develop
rebase
2021-02-26 09:06:25 +01:00
Martin Kroeker e3a6132e12
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:33 +01:00
Martin Kroeker 736f0146c3
Revert "Fix undefined CC in f_check (again)" 2021-02-26 04:18:04 +01:00
Martin Kroeker 897fc2b6ef
Merge pull request #3118 from martin-frbg/issue3018-2
Fix undefined CC in f_check (again)
2021-02-25 13:48:41 +01:00
Martin Kroeker 441c116105
fix undefined CC again 2021-02-25 13:47:34 +01:00
Martin Kroeker 8ecd80a34a
Merge pull request #14 from xianyi/develop
rebase
2021-02-25 13:45:27 +01:00
Martin Kroeker 4ba53db0da
Merge pull request #3117 from haampie/fix-perl
use /usr/bin/env perl
2021-02-24 18:39:28 +01:00
Martin Kroeker 6c365ff648
Merge pull request #3114 from martin-frbg/issue3113
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
2021-02-24 18:38:25 +01:00
Martin Kroeker e33bcdbb7b
Merge pull request #3115 from martin-frbg/issue2532
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
2021-02-24 18:37:36 +01:00
Harmen Stoppels ec6b354c32 use /usr/bin/env perl 2021-02-24 14:07:20 +01:00
Martin Kroeker 292d1af1a0
Update omatcopy_rt.c 2021-02-24 09:34:14 +01:00
Martin Kroeker 325b398e3c
Update omatcopy_rt.c 2021-02-24 09:13:12 +01:00
Martin Kroeker 6f5667b4d4
Enable optimized S/D OMATCOPY_RT 2021-02-24 09:03:41 +01:00
Martin Kroeker cceeee7806
Add optimized omatcopy_rt 2021-02-24 09:00:54 +01:00
Martin Kroeker 0a4546b742
Typo fix 2021-02-23 13:14:35 +01:00
Martin Kroeker b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
as suggested by MigMuc in issue 2532
2021-02-22 21:35:42 +01:00
Martin Kroeker 1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64 2021-02-22 19:40:36 +01:00
Martin Kroeker 86a5f98e4a
Merge pull request #13 from xianyi/develop
rebase
2021-02-22 19:31:41 +01:00
Martin Kroeker 1caa44bea9
Merge pull request #3111 from hawkinsp/forkrace
Fix race in blas_thread_shutdown.
2021-02-19 09:57:18 +01:00
Peter Hawkins dbbf92c1d1 Fix race in blas_thread_shutdown.
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
2021-02-18 13:46:50 -05:00
Martin Kroeker cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
Fix get_num_procs()  in the USE_TLS branch for non-glibc systems
2021-02-18 15:45:25 +01:00
Martin Kroeker b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems 2021-02-18 11:14:05 +01:00
Martin Kroeker f9aaf22fc3
Merge pull request #3105 from martin-frbg/tigerlake
Recognize Intel Tiger Lake CPUID as SkylakeX
2021-02-12 13:29:53 +01:00
Martin Kroeker 35ff3c731d
Merge pull request #3106 from RajalakshmiSR/ppcbe
Fix build issue on POWER8 with DYNAMIC_ARCH
2021-02-12 13:29:23 +01:00
Rajalakshmi Srinivasaraghavan 63fa6c832e Fix build issue on POWER8 with DYNAMIC_ARCH
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
2021-02-11 21:28:03 -06:00
Martin Kroeker e4e5042e38
Recognize Intel Tiger Lake as SkylakeX 2021-02-11 20:17:11 +01:00