Alex Henrie
6f32991eae
Don't define the mode variable when not needed in gemm functions
2021-01-14 19:40:31 -07:00
Alex Henrie
202fc9e8ed
Fix uninitialized argument value in dasum_k
2021-01-14 19:40:31 -07:00
Martin Kroeker
e378b24487
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
...
Fix building "generic" TRMM kernel with CMake
2021-01-14 21:35:19 +01:00
Martin Kroeker
3628b22d49
Merge pull request #3064 from martin-frbg/issue3063
...
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
2021-01-14 16:47:59 +01:00
Martin Kroeker
af2b0d0205
Merge pull request #3066 from martin-frbg/buffsizefix
...
Fix compile-time setting of the GEMM buffer size for gmake builds
2021-01-14 16:00:38 +01:00
Martin Kroeker
4bf988959a
Merge pull request #3062 from austinpagan/GemmPreferedSize3
...
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
2021-01-14 15:59:53 +01:00
Martin Kroeker
a0e4fb3a28
Merge pull request #3061 from martin-frbg/arm64-pgi
...
Support NVIDIA HPC SDK on ARM64
2021-01-14 15:59:21 +01:00
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
...
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Albert Ziegenhagel
e3f4063683
Fix building "generic" TRMM kernel with CMake
...
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
2021-01-14 10:00:49 +01:00
Martin Kroeker
6bbe6d5b92
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
2021-01-13 22:36:04 +01:00
Martin Kroeker
89ae305e11
Workaround for cmake having its own C_COMPILER variable
2021-01-13 12:30:26 +01:00
Martin Kroeker
da8d7f09f1
try to work around gcc update problems
2021-01-13 09:46:53 +01:00
Martin Kroeker
25c986db5a
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
2021-01-13 00:30:27 +01:00
Martin Kroeker
a8f249458d
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:29:38 +01:00
Martin Kroeker
bc5b35367f
restore Makefile after accidental overwrite
2021-01-13 00:28:43 +01:00
Martin Kroeker
930aff2c2e
Build CBLAS interfaces for CROTG and ZROTG as well
2021-01-13 00:27:42 +01:00
Martin Kroeker
ac3e2a3fdd
Add CBLAS interfaces for csrot and zdrot
2021-01-12 23:22:00 +01:00
Martin Kroeker
9ccb12b031
Add prototypes for cblas_csrot and cblas_zdrot
2021-01-12 23:20:07 +01:00
Martin Kroeker
e18a2c22db
Merge pull request #3060 from martin-frbg/dyn_arm64
...
Label the assembly part of the ARMV8 dynamic arch detection as volatile
2021-01-12 23:02:05 +01:00
Martin Kroeker
b716c0ef01
Add workaround for NVIDIA HPC
2021-01-12 16:51:35 +01:00
Martin Kroeker
2efa3b70dc
Add workaround for NVIDIA HPC
2021-01-12 16:49:39 +01:00
Martin Kroeker
49959d4f1c
Add workaround for NVIDIA HPC
2021-01-12 16:47:15 +01:00
Martin Kroeker
0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:39:35 +01:00
Martin Kroeker
c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:38:51 +01:00
Martin Kroeker
43aac5bacc
Support NVIDIA HPC compiler
2021-01-12 16:36:12 +01:00
Martin Kroeker
bff2b7c94d
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
2021-01-12 16:34:18 +01:00
Martin Kroeker
2d45a262d9
Support compilation with nvfortran
2021-01-12 16:32:29 +01:00
Gordon Fossum
ed652d8136
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
2021-01-11 21:13:53 -05:00
Martin Kroeker
6fe0f1fab9
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
2021-01-11 19:05:29 +01:00
Chen, Guobing
b0beb0b1ca
Initial code for Cooperlake BF16 GEMM kernel
2021-01-11 02:15:21 +08:00
Martin Kroeker
018dec8588
Merge pull request #7 from xianyi/develop
...
rebase
2021-01-10 17:09:46 +01:00
Martin Kroeker
5d6209e1f9
Merge pull request #3055 from RajalakshmiSR/swapp10
...
Optimize swap function for POWER10
2021-01-09 00:11:44 +01:00
Rajalakshmi Srinivasaraghavan
601b711c78
Optimize swap function for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2021-01-08 08:01:36 -06:00
Martin Kroeker
78702753f2
Merge pull request #3053 from pkubaj/patch-1
...
Fix build on FreeBSD/powerpc64le
2021-01-02 16:14:07 +01:00
pkubaj
7aa1ff8ff6
Fix build on FreeBSD/powerpc64le
2021-01-01 21:19:57 +00:00
Martin Kroeker
d6c97cf010
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
...
arm64: Fix nrm2 for input vectors with Inf
2021-01-01 15:51:07 +01:00
Ashwin Sekhar T K
1b2508362b
arm64: Fix nrm2 for input vectors with Inf
...
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
2021-01-01 02:49:37 -08:00
Martin Kroeker
cd898af59f
Merge pull request #3050 from aurel32/riscv64-openblas-supported
...
getarch.c: define OPENBLAS_SUPPORTED for riscv64
2020-12-29 21:59:40 +01:00
Aurelien Jarno
0a535e58d8
getarch.c: define OPENBLAS_SUPPORTED for riscv64
2020-12-29 12:06:39 +00:00
Martin Kroeker
9ce9e295fe
Merge pull request #3049 from martin-frbg/readme
...
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
2020-12-27 22:54:20 +01:00
Martin Kroeker
9a38592c79
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
2020-12-27 21:55:08 +01:00
Martin Kroeker
9b3965b08c
Merge pull request #6 from xianyi/develop
...
rebase
2020-12-27 21:28:10 +01:00
Martin Kroeker
531cb4f673
Merge pull request #3035 from Joshua-Ashton/patch-1
...
Define BLAS acronym in README
2020-12-27 21:26:52 +01:00
Martin Kroeker
3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998
...
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
2020-12-21 13:30:08 +01:00
Martin Kroeker
8631e2976a
Temporarily revert to the old nrm2 kernels
2020-12-21 07:45:13 +01:00
Martin Kroeker
2768bc1764
Temporarily revert to the old nrm2 kernels
2020-12-21 07:42:51 +01:00
Martin Kroeker
6f4698ee1f
Temporarily revert to the old nrm2 kernel
2020-12-21 07:41:18 +01:00
Martin Kroeker
85e5165e98
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
...
Support NVIDIA HPC SDK on POWERPC
2020-12-20 11:55:53 +01:00
Martin Kroeker
17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
2020-12-19 23:21:22 +01:00
Martin Kroeker
91c3f86c2b
NVIDIA compiler does not yet support POWER10
2020-12-19 23:19:05 +01:00