Martin Kroeker
|
6bbe6d5b92
|
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
|
2021-01-13 22:36:04 +01:00 |
Martin Kroeker
|
89ae305e11
|
Workaround for cmake having its own C_COMPILER variable
|
2021-01-13 12:30:26 +01:00 |
Martin Kroeker
|
da8d7f09f1
|
try to work around gcc update problems
|
2021-01-13 09:46:53 +01:00 |
Martin Kroeker
|
25c986db5a
|
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
|
2021-01-13 00:30:27 +01:00 |
Martin Kroeker
|
a8f249458d
|
Build CBLAS interfaces for CROTG and ZROTG as well
|
2021-01-13 00:29:38 +01:00 |
Martin Kroeker
|
bc5b35367f
|
restore Makefile after accidental overwrite
|
2021-01-13 00:28:43 +01:00 |
Martin Kroeker
|
930aff2c2e
|
Build CBLAS interfaces for CROTG and ZROTG as well
|
2021-01-13 00:27:42 +01:00 |
Martin Kroeker
|
ac3e2a3fdd
|
Add CBLAS interfaces for csrot and zdrot
|
2021-01-12 23:22:00 +01:00 |
Martin Kroeker
|
9ccb12b031
|
Add prototypes for cblas_csrot and cblas_zdrot
|
2021-01-12 23:20:07 +01:00 |
Martin Kroeker
|
e18a2c22db
|
Merge pull request #3060 from martin-frbg/dyn_arm64
Label the assembly part of the ARMV8 dynamic arch detection as volatile
|
2021-01-12 23:02:05 +01:00 |
Martin Kroeker
|
b716c0ef01
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:51:35 +01:00 |
Martin Kroeker
|
2efa3b70dc
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:49:39 +01:00 |
Martin Kroeker
|
49959d4f1c
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:47:15 +01:00 |
Martin Kroeker
|
0f27a03607
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:39:35 +01:00 |
Martin Kroeker
|
c2a8ebfe69
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:38:51 +01:00 |
Martin Kroeker
|
43aac5bacc
|
Support NVIDIA HPC compiler
|
2021-01-12 16:36:12 +01:00 |
Martin Kroeker
|
bff2b7c94d
|
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
|
2021-01-12 16:34:18 +01:00 |
Martin Kroeker
|
2d45a262d9
|
Support compilation with nvfortran
|
2021-01-12 16:32:29 +01:00 |
Gordon Fossum
|
ed652d8136
|
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
|
2021-01-11 21:13:53 -05:00 |
Martin Kroeker
|
6fe0f1fab9
|
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
|
2021-01-11 19:05:29 +01:00 |
Chen, Guobing
|
b0beb0b1ca
|
Initial code for Cooperlake BF16 GEMM kernel
|
2021-01-11 02:15:21 +08:00 |
Martin Kroeker
|
018dec8588
|
Merge pull request #7 from xianyi/develop
rebase
|
2021-01-10 17:09:46 +01:00 |
Martin Kroeker
|
5d6209e1f9
|
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
|
2021-01-09 00:11:44 +01:00 |
Rajalakshmi Srinivasaraghavan
|
601b711c78
|
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
|
2021-01-08 08:01:36 -06:00 |
Martin Kroeker
|
78702753f2
|
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
|
2021-01-02 16:14:07 +01:00 |
pkubaj
|
7aa1ff8ff6
|
Fix build on FreeBSD/powerpc64le
|
2021-01-01 21:19:57 +00:00 |
Martin Kroeker
|
d6c97cf010
|
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
|
2021-01-01 15:51:07 +01:00 |
Ashwin Sekhar T K
|
1b2508362b
|
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
|
2021-01-01 02:49:37 -08:00 |
Martin Kroeker
|
cd898af59f
|
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
|
2020-12-29 21:59:40 +01:00 |
Aurelien Jarno
|
0a535e58d8
|
getarch.c: define OPENBLAS_SUPPORTED for riscv64
|
2020-12-29 12:06:39 +00:00 |
Martin Kroeker
|
9ce9e295fe
|
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
|
2020-12-27 22:54:20 +01:00 |
Martin Kroeker
|
9a38592c79
|
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
|
2020-12-27 21:55:08 +01:00 |
Martin Kroeker
|
9b3965b08c
|
Merge pull request #6 from xianyi/develop
rebase
|
2020-12-27 21:28:10 +01:00 |
Martin Kroeker
|
531cb4f673
|
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
|
2020-12-27 21:26:52 +01:00 |
Martin Kroeker
|
3559c5d7a2
|
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
|
2020-12-21 13:30:08 +01:00 |
Martin Kroeker
|
8631e2976a
|
Temporarily revert to the old nrm2 kernels
|
2020-12-21 07:45:13 +01:00 |
Martin Kroeker
|
2768bc1764
|
Temporarily revert to the old nrm2 kernels
|
2020-12-21 07:42:51 +01:00 |
Martin Kroeker
|
6f4698ee1f
|
Temporarily revert to the old nrm2 kernel
|
2020-12-21 07:41:18 +01:00 |
Martin Kroeker
|
85e5165e98
|
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
|
2020-12-20 11:55:53 +01:00 |
Martin Kroeker
|
17c16f2a71
|
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
|
2020-12-19 23:21:22 +01:00 |
Martin Kroeker
|
91c3f86c2b
|
NVIDIA compiler does not yet support POWER10
|
2020-12-19 23:19:05 +01:00 |
Martin Kroeker
|
75b1f3becc
|
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
|
2020-12-19 23:17:40 +01:00 |
Martin Kroeker
|
07c5e549b2
|
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
|
2020-12-19 23:14:02 +01:00 |
Martin Kroeker
|
114eb159a4
|
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
|
2020-12-19 22:15:58 +01:00 |
Martin Kroeker
|
005cce5507
|
Amend SkylakeX options to support the NVIDIA compiler
|
2020-12-19 22:11:49 +01:00 |
Martin Kroeker
|
b859b6e79d
|
Add nvfortran
|
2020-12-19 22:09:57 +01:00 |
Martin Kroeker
|
b212a2fb9f
|
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
|
2020-12-19 22:08:37 +01:00 |
Martin Kroeker
|
e40416567a
|
Add version printout for PGI/NVIDIA compiler
|
2020-12-19 22:06:56 +01:00 |
Martin Kroeker
|
b37e5fa2f8
|
Merge pull request #5 from xianyi/develop
rebase
|
2020-12-19 20:11:06 +01:00 |
Martin Kroeker
|
326469ef4a
|
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
|
2020-12-19 20:04:19 +01:00 |