Martin Kroeker
90c1776c86
Adjust build rules for ?chkee.F
2021-02-28 18:53:20 +01:00
Martin Kroeker
9cf861e8fa
Add rewritten cchkee.F from Reference-LAPACK PR335
2021-02-28 18:51:03 +01:00
Martin Kroeker
9b7b1da133
Add rewritten dchkee.F from Reference-LAPACK PR335
2021-02-28 18:50:26 +01:00
Martin Kroeker
a5ab891292
Add rewritten schkee.F from Reference-LAPACK PR335
2021-02-28 18:49:50 +01:00
Martin Kroeker
90bb4ac821
Add rewritten zchkee.F from Reference-LAPACK PR335
2021-02-28 18:49:10 +01:00
Martin Kroeker
23a0d1bc1f
Delete zchkee.f
2021-02-28 18:47:06 +01:00
Martin Kroeker
0e96c378fd
Delete schkee.f
2021-02-28 18:46:52 +01:00
Martin Kroeker
ee16efff3c
Delete dchkee.f
2021-02-28 18:46:38 +01:00
Martin Kroeker
0197519dd7
Delete cchkee.f
2021-02-28 18:46:08 +01:00
Martin Kroeker
865829cfac
Merge pull request #3121 from RajalakshmiSR/mmarename
...
POWER10: Rename mma builtins
2021-02-27 19:15:49 +01:00
Rajalakshmi Srinivasaraghavan
0571c3187b
POWER10: Rename mma builtins
...
The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
__builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
__builtin_vsx_disassemble_pair respectively. This patch is to make
corresponding changes in dgemm kernel. Also made changes in
inputs to those builtins to avoid some potential typecasting issues.
Reference gcc commit id:77ef995c1fbcab76a2a69b9f4700bcfd005d8e62
2021-02-26 20:56:34 -06:00
Martin Kroeker
d12a2d0d04
Merge pull request #3120 from martin-frbg/3118-x
...
Fix use of undefined CC variable in f_check
2021-02-26 11:50:47 +01:00
Martin Kroeker
2d369bd916
fix undefined CC variable
2021-02-26 09:09:43 +01:00
Martin Kroeker
93843c55b6
Merge pull request #15 from xianyi/develop
...
rebase
2021-02-26 09:06:25 +01:00
Martin Kroeker
e3a6132e12
Merge pull request #3119 from xianyi/revert-3118-issue3018-2
...
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:33 +01:00
Martin Kroeker
736f0146c3
Revert "Fix undefined CC in f_check (again)"
2021-02-26 04:18:04 +01:00
Martin Kroeker
897fc2b6ef
Merge pull request #3118 from martin-frbg/issue3018-2
...
Fix undefined CC in f_check (again)
2021-02-25 13:48:41 +01:00
Martin Kroeker
441c116105
fix undefined CC again
2021-02-25 13:47:34 +01:00
Martin Kroeker
8ecd80a34a
Merge pull request #14 from xianyi/develop
...
rebase
2021-02-25 13:45:27 +01:00
Martin Kroeker
4ba53db0da
Merge pull request #3117 from haampie/fix-perl
...
use /usr/bin/env perl
2021-02-24 18:39:28 +01:00
Martin Kroeker
6c365ff648
Merge pull request #3114 from martin-frbg/issue3113
...
Fix dll_callback and p_process_term signatures for USE_TLS on Windows x64
2021-02-24 18:38:25 +01:00
Martin Kroeker
e33bcdbb7b
Merge pull request #3115 from martin-frbg/issue2532
...
Replace unoptimized OMATCOPY_RT with 4x4 blocked version
2021-02-24 18:37:36 +01:00
Harmen Stoppels
ec6b354c32
use /usr/bin/env perl
2021-02-24 14:07:20 +01:00
Martin Kroeker
292d1af1a0
Update omatcopy_rt.c
2021-02-24 09:34:14 +01:00
Martin Kroeker
325b398e3c
Update omatcopy_rt.c
2021-02-24 09:13:12 +01:00
Martin Kroeker
6f5667b4d4
Enable optimized S/D OMATCOPY_RT
2021-02-24 09:03:41 +01:00
Martin Kroeker
cceeee7806
Add optimized omatcopy_rt
2021-02-24 09:00:54 +01:00
Martin Kroeker
0a4546b742
Typo fix
2021-02-23 13:14:35 +01:00
Martin Kroeker
b1eed27a54
Replace naive omatcopy_rt with 4x4 blocked implementation
...
as suggested by MigMuc in issue 2532
2021-02-22 21:35:42 +01:00
Martin Kroeker
1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
2021-02-22 19:40:36 +01:00
Martin Kroeker
86a5f98e4a
Merge pull request #13 from xianyi/develop
...
rebase
2021-02-22 19:31:41 +01:00
Martin Kroeker
1caa44bea9
Merge pull request #3111 from hawkinsp/forkrace
...
Fix race in blas_thread_shutdown.
2021-02-19 09:57:18 +01:00
Peter Hawkins
dbbf92c1d1
Fix race in blas_thread_shutdown.
...
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
2021-02-18 13:46:50 -05:00
Martin Kroeker
cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
...
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 15:45:25 +01:00
Martin Kroeker
b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 11:14:05 +01:00
Martin Kroeker
f9aaf22fc3
Merge pull request #3105 from martin-frbg/tigerlake
...
Recognize Intel Tiger Lake CPUID as SkylakeX
2021-02-12 13:29:53 +01:00
Martin Kroeker
35ff3c731d
Merge pull request #3106 from RajalakshmiSR/ppcbe
...
Fix build issue on POWER8 with DYNAMIC_ARCH
2021-02-12 13:29:23 +01:00
Rajalakshmi Srinivasaraghavan
63fa6c832e
Fix build issue on POWER8 with DYNAMIC_ARCH
...
Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'
2021-02-11 21:28:03 -06:00
Martin Kroeker
e4e5042e38
Recognize Intel Tiger Lake as SkylakeX
2021-02-11 20:17:11 +01:00
Martin Kroeker
ae53e3e233
Recognize Intel Tiger Lake as SkylakeX
2021-02-11 20:16:27 +01:00
Martin Kroeker
074d9bff7f
Merge pull request #3104 from martin-frbg/issue3103
...
Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen
2021-02-11 15:42:47 +01:00
Martin Kroeker
f36862603a
Merge pull request #3101 from jake-arkinstall/issue-3100
...
Addressed issue #3100 - removing an unnecessary write to the include directory
2021-02-11 15:42:18 +01:00
Martin Kroeker
47691c031f
Use Haswell optimizations for Zen as well
2021-02-11 09:26:15 +01:00
Martin Kroeker
ce7ddd8921
Use Haswell optimizations for Zen as well
2021-02-11 09:25:36 +01:00
Martin Kroeker
950c047b49
Use Haswell optimizations for Zen as well
2021-02-11 09:24:51 +01:00
Martin Kroeker
46509953a9
Use Haswell optimizations for Zen as well
2021-02-11 09:24:16 +01:00
Martin Kroeker
db348dcff2
Enable optimized srot/drot kernels from Haswell
2021-02-11 09:23:05 +01:00
Martin Kroeker
a33f471065
Merge pull request #3102 from martin-frbg/issue3099
...
Strip pkgversion info from compiler version string before comparing
2021-02-11 08:56:46 +01:00
Martin Kroeker
ece3ce581e
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
2021-02-10 14:22:59 +01:00
Martin Kroeker
8189a98d85
Merge pull request #12 from xianyi/develop
...
rebase
2021-02-10 14:17:24 +01:00