Commit Graph

5465 Commits

Author SHA1 Message Date
Martin Kroeker
cb61d3b46b Add DYNAMIC_LIST support for ARM64 2021-01-25 13:13:20 +01:00
Martin Kroeker
deb2e66bcc Add DYNAMIC_LIST support for ARM64 2021-01-24 23:18:52 +01:00
Martin Kroeker
9b2d69aa80 Add DYNAMIC_LIST option for ARM64 2021-01-24 23:18:01 +01:00
Martin Kroeker
e3ff4cdd23 Merge pull request #9 from xianyi/develop
rebase
2021-01-24 23:14:45 +01:00
Martin Kroeker
0745ba43a4 Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
2021-01-24 19:03:40 +01:00
Rajalakshmi Srinivasaraghavan
3ede843d50 Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-24 07:48:28 -06:00
Martin Kroeker
69a5558203 Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00
Martin Kroeker
d6905403e3 Merge pull request #3068 from alexhenrie/scan-build
scan-build fixes
2021-01-23 19:06:29 +01:00
Martin Kroeker
411926b572 Merge pull request #3079 from RajalakshmiSR/rotp10
Optimize s/drot function for POWER10
2021-01-22 08:26:00 +01:00
Rajalakshmi Srinivasaraghavan
439b93f6d2 Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-21 13:24:45 -06:00
Martin Kroeker
d6cf67778c Merge pull request #3075 from martin-frbg/issue3074
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
2021-01-21 08:51:30 +01:00
Martin Kroeker
b94dab5250 patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel 2021-01-20 21:34:36 +01:00
Martin Kroeker
63fa3c3f8f Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
2021-01-20 15:41:04 +01:00
Martin Kroeker
3612d9a57a Merge pull request #8 from xianyi/develop
rebase
2021-01-20 15:38:30 +01:00
Martin Kroeker
16dddb760e Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
2021-01-16 15:47:34 +01:00
Rajalakshmi Srinivasaraghavan
eff7c9166e Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-15 13:40:34 -06:00
Alex Henrie
f1bf2603e6 Remove dead assignment to dflag in rotmg functions 2021-01-14 19:40:32 -07:00
Alex Henrie
6f32991eae Don't define the mode variable when not needed in gemm functions 2021-01-14 19:40:31 -07:00
Alex Henrie
202fc9e8ed Fix uninitialized argument value in dasum_k 2021-01-14 19:40:31 -07:00
Martin Kroeker
e378b24487 Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
2021-01-14 21:35:19 +01:00
Martin Kroeker
3628b22d49 Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
2021-01-14 16:47:59 +01:00
Martin Kroeker
af2b0d0205 Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
2021-01-14 16:00:38 +01:00
Martin Kroeker
4bf988959a Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
2021-01-14 15:59:53 +01:00
Martin Kroeker
a0e4fb3a28 Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
2021-01-14 15:59:21 +01:00
Martin Kroeker
2c445be8ba Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Albert Ziegenhagel
e3f4063683 Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
2021-01-14 10:00:49 +01:00
Martin Kroeker
6bbe6d5b92 Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor 2021-01-13 22:36:04 +01:00
Martin Kroeker
89ae305e11 Workaround for cmake having its own C_COMPILER variable 2021-01-13 12:30:26 +01:00
Martin Kroeker
da8d7f09f1 try to work around gcc update problems 2021-01-13 09:46:53 +01:00
Martin Kroeker
25c986db5a Add prototypes for CBLAS_CROTG and CBLAS_ZROTG 2021-01-13 00:30:27 +01:00
Martin Kroeker
a8f249458d Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:29:38 +01:00
Martin Kroeker
bc5b35367f restore Makefile after accidental overwrite 2021-01-13 00:28:43 +01:00
Martin Kroeker
930aff2c2e Build CBLAS interfaces for CROTG and ZROTG as well 2021-01-13 00:27:42 +01:00
Martin Kroeker
ac3e2a3fdd Add CBLAS interfaces for csrot and zdrot 2021-01-12 23:22:00 +01:00
Martin Kroeker
9ccb12b031 Add prototypes for cblas_csrot and cblas_zdrot 2021-01-12 23:20:07 +01:00
Martin Kroeker
e18a2c22db Merge pull request #3060 from martin-frbg/dyn_arm64
Label the assembly part of the ARMV8 dynamic arch detection as volatile
2021-01-12 23:02:05 +01:00
Martin Kroeker
b716c0ef01 Add workaround for NVIDIA HPC 2021-01-12 16:51:35 +01:00
Martin Kroeker
2efa3b70dc Add workaround for NVIDIA HPC 2021-01-12 16:49:39 +01:00
Martin Kroeker
49959d4f1c Add workaround for NVIDIA HPC 2021-01-12 16:47:15 +01:00
Martin Kroeker
0f27a03607 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 2021-01-12 16:39:35 +01:00
Martin Kroeker
c2a8ebfe69 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 2021-01-12 16:38:51 +01:00
Martin Kroeker
43aac5bacc Support NVIDIA HPC compiler 2021-01-12 16:36:12 +01:00
Martin Kroeker
bff2b7c94d Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options) 2021-01-12 16:34:18 +01:00
Martin Kroeker
2d45a262d9 Support compilation with nvfortran 2021-01-12 16:32:29 +01:00
Gordon Fossum
ed652d8136 Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h. 2021-01-11 21:13:53 -05:00
Martin Kroeker
6fe0f1fab9 Label get_cpu_ftr as volatile to keep gcc from rearranging the code 2021-01-11 19:05:29 +01:00
Chen, Guobing
b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 2021-01-11 02:15:21 +08:00
Martin Kroeker
018dec8588 Merge pull request #7 from xianyi/develop
rebase
2021-01-10 17:09:46 +01:00
Martin Kroeker
5d6209e1f9 Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
2021-01-09 00:11:44 +01:00
Rajalakshmi Srinivasaraghavan
601b711c78 Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-08 08:01:36 -06:00