Martin Kroeker
|
b716c0ef01
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:51:35 +01:00 |
Martin Kroeker
|
2efa3b70dc
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:49:39 +01:00 |
Martin Kroeker
|
49959d4f1c
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:47:15 +01:00 |
Martin Kroeker
|
0f27a03607
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:39:35 +01:00 |
Martin Kroeker
|
c2a8ebfe69
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:38:51 +01:00 |
Martin Kroeker
|
43aac5bacc
|
Support NVIDIA HPC compiler
|
2021-01-12 16:36:12 +01:00 |
Martin Kroeker
|
bff2b7c94d
|
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
|
2021-01-12 16:34:18 +01:00 |
Martin Kroeker
|
2d45a262d9
|
Support compilation with nvfortran
|
2021-01-12 16:32:29 +01:00 |
Gordon Fossum
|
ed652d8136
|
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
|
2021-01-11 21:13:53 -05:00 |
Martin Kroeker
|
6fe0f1fab9
|
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
|
2021-01-11 19:05:29 +01:00 |
Chen, Guobing
|
b0beb0b1ca
|
Initial code for Cooperlake BF16 GEMM kernel
|
2021-01-11 02:15:21 +08:00 |
Martin Kroeker
|
018dec8588
|
Merge pull request #7 from xianyi/develop
rebase
|
2021-01-10 17:09:46 +01:00 |
Martin Kroeker
|
5d6209e1f9
|
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
|
2021-01-09 00:11:44 +01:00 |
Rajalakshmi Srinivasaraghavan
|
601b711c78
|
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
|
2021-01-08 08:01:36 -06:00 |
Martin Kroeker
|
78702753f2
|
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
|
2021-01-02 16:14:07 +01:00 |
pkubaj
|
7aa1ff8ff6
|
Fix build on FreeBSD/powerpc64le
|
2021-01-01 21:19:57 +00:00 |
Martin Kroeker
|
d6c97cf010
|
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
|
2021-01-01 15:51:07 +01:00 |
Ashwin Sekhar T K
|
1b2508362b
|
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
|
2021-01-01 02:49:37 -08:00 |
Martin Kroeker
|
cd898af59f
|
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
|
2020-12-29 21:59:40 +01:00 |
Aurelien Jarno
|
0a535e58d8
|
getarch.c: define OPENBLAS_SUPPORTED for riscv64
|
2020-12-29 12:06:39 +00:00 |
Martin Kroeker
|
9ce9e295fe
|
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
|
2020-12-27 22:54:20 +01:00 |
Martin Kroeker
|
9a38592c79
|
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
|
2020-12-27 21:55:08 +01:00 |
Martin Kroeker
|
9b3965b08c
|
Merge pull request #6 from xianyi/develop
rebase
|
2020-12-27 21:28:10 +01:00 |
Martin Kroeker
|
531cb4f673
|
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
|
2020-12-27 21:26:52 +01:00 |
Martin Kroeker
|
3559c5d7a2
|
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
|
2020-12-21 13:30:08 +01:00 |
Martin Kroeker
|
8631e2976a
|
Temporarily revert to the old nrm2 kernels
|
2020-12-21 07:45:13 +01:00 |
Martin Kroeker
|
2768bc1764
|
Temporarily revert to the old nrm2 kernels
|
2020-12-21 07:42:51 +01:00 |
Martin Kroeker
|
6f4698ee1f
|
Temporarily revert to the old nrm2 kernel
|
2020-12-21 07:41:18 +01:00 |
Martin Kroeker
|
85e5165e98
|
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
|
2020-12-20 11:55:53 +01:00 |
Martin Kroeker
|
17c16f2a71
|
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
|
2020-12-19 23:21:22 +01:00 |
Martin Kroeker
|
91c3f86c2b
|
NVIDIA compiler does not yet support POWER10
|
2020-12-19 23:19:05 +01:00 |
Martin Kroeker
|
75b1f3becc
|
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
|
2020-12-19 23:17:40 +01:00 |
Martin Kroeker
|
07c5e549b2
|
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
|
2020-12-19 23:14:02 +01:00 |
Martin Kroeker
|
114eb159a4
|
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
|
2020-12-19 22:15:58 +01:00 |
Martin Kroeker
|
005cce5507
|
Amend SkylakeX options to support the NVIDIA compiler
|
2020-12-19 22:11:49 +01:00 |
Martin Kroeker
|
b859b6e79d
|
Add nvfortran
|
2020-12-19 22:09:57 +01:00 |
Martin Kroeker
|
b212a2fb9f
|
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
|
2020-12-19 22:08:37 +01:00 |
Martin Kroeker
|
e40416567a
|
Add version printout for PGI/NVIDIA compiler
|
2020-12-19 22:06:56 +01:00 |
Martin Kroeker
|
b37e5fa2f8
|
Merge pull request #5 from xianyi/develop
rebase
|
2020-12-19 20:11:06 +01:00 |
Martin Kroeker
|
326469ef4a
|
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
|
2020-12-19 20:04:19 +01:00 |
Xianyi Zhang
|
a3cac9cca0
|
Update sgemm kernel 1x4 for C910.
|
2020-12-18 11:53:23 +08:00 |
Martin Kroeker
|
c73d8ee40d
|
Conditionally add -mfma to compiler options where needed
|
2020-12-17 11:34:05 +01:00 |
Martin Kroeker
|
abef2ea770
|
Move -fma option setting to kernel/Makefile.L1
|
2020-12-17 11:32:27 +01:00 |
Martin Kroeker
|
b26e32c3af
|
Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
|
2020-12-16 00:05:04 +01:00 |
Martin Kroeker
|
7822eff936
|
Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
|
2020-12-16 00:04:45 +01:00 |
Martin Kroeker
|
865676682d
|
Add Intel Rocket Lake
|
2020-12-14 22:40:23 +01:00 |
Martin Kroeker
|
0f7776af0b
|
Add Intel Rocket Lake
|
2020-12-14 22:30:36 +01:00 |
Martin Kroeker
|
b03dc011be
|
Fix undefined CC variable in clang check
|
2020-12-14 19:21:52 +01:00 |
Martin Kroeker
|
00ce35336e
|
Fix spurious removal of a trailing character from the hostarch string on x86_64
|
2020-12-13 21:28:01 +01:00 |
Martin Kroeker
|
723776ddf7
|
Merge pull request #4 from xianyi/develop
rebase
|
2020-12-13 21:22:41 +01:00 |