Commit Graph

6065 Commits

Author SHA1 Message Date
Ashwin Sekhar T K 1b2508362b arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
2021-01-01 02:49:37 -08:00
Martin Kroeker cd898af59f
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
2020-12-29 21:59:40 +01:00
Aurelien Jarno 0a535e58d8 getarch.c: define OPENBLAS_SUPPORTED for riscv64 2020-12-29 12:06:39 +00:00
Martin Kroeker 9ce9e295fe
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
2020-12-27 22:54:20 +01:00
Martin Kroeker 9a38592c79
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers 2020-12-27 21:55:08 +01:00
Martin Kroeker 9b3965b08c
Merge pull request #6 from xianyi/develop
rebase
2020-12-27 21:28:10 +01:00
Martin Kroeker 531cb4f673
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
2020-12-27 21:26:52 +01:00
Martin Kroeker 3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
2020-12-21 13:30:08 +01:00
Martin Kroeker 8631e2976a
Temporarily revert to the old nrm2 kernels 2020-12-21 07:45:13 +01:00
Martin Kroeker 2768bc1764
Temporarily revert to the old nrm2 kernels 2020-12-21 07:42:51 +01:00
Martin Kroeker 6f4698ee1f
Temporarily revert to the old nrm2 kernel 2020-12-21 07:41:18 +01:00
Martin Kroeker 85e5165e98
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
2020-12-20 11:55:53 +01:00
Martin Kroeker 17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers 2020-12-19 23:21:22 +01:00
Martin Kroeker 91c3f86c2b
NVIDIA compiler does not yet support POWER10 2020-12-19 23:19:05 +01:00
Martin Kroeker 75b1f3becc
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers 2020-12-19 23:17:40 +01:00
Martin Kroeker 07c5e549b2
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
2020-12-19 23:14:02 +01:00
Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 2020-12-19 22:15:58 +01:00
Martin Kroeker 005cce5507
Amend SkylakeX options to support the NVIDIA compiler 2020-12-19 22:11:49 +01:00
Martin Kroeker b859b6e79d
Add nvfortran 2020-12-19 22:09:57 +01:00
Martin Kroeker b212a2fb9f
Add/modify "PGI" compiler options for NVIDIA SDK 20.11 2020-12-19 22:08:37 +01:00
Martin Kroeker e40416567a
Add version printout for PGI/NVIDIA compiler 2020-12-19 22:06:56 +01:00
Martin Kroeker b37e5fa2f8
Merge pull request #5 from xianyi/develop
rebase
2020-12-19 20:11:06 +01:00
Martin Kroeker 326469ef4a
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
2020-12-19 20:04:19 +01:00
Martin Kroeker c73d8ee40d
Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Martin Kroeker abef2ea770
Move -fma option setting to kernel/Makefile.L1 2020-12-17 11:32:27 +01:00
Martin Kroeker b26e32c3af
Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
2020-12-16 00:05:04 +01:00
Martin Kroeker 7822eff936
Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
2020-12-16 00:04:45 +01:00
Martin Kroeker 865676682d
Add Intel Rocket Lake 2020-12-14 22:40:23 +01:00
Martin Kroeker 0f7776af0b
Add Intel Rocket Lake 2020-12-14 22:30:36 +01:00
Martin Kroeker b03dc011be
Fix undefined CC variable in clang check 2020-12-14 19:21:52 +01:00
Martin Kroeker 00ce35336e
Fix spurious removal of a trailing character from the hostarch string on x86_64 2020-12-13 21:28:01 +01:00
Martin Kroeker 723776ddf7
Merge pull request #4 from xianyi/develop
rebase
2020-12-13 21:22:41 +01:00
Martin Kroeker 5a77ec7f1c
Merge pull request #3036 from RajalakshmiSR/p10copyalign
POWER10: Improve copy performance
2020-12-13 21:21:34 +01:00
Rajalakshmi Srinivasaraghavan 2fb11f873b POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-13 10:41:45 -06:00
Joshie ad63647446
Define BLAS acronym in README 2020-12-13 09:06:14 +00:00
Martin Kroeker 87315e8a8d
Update version to 0.3.13.dev 2020-12-12 23:28:49 +01:00
Martin Kroeker 9031ebd7d5
Update version to 0.3.13.dev 2020-12-12 23:28:20 +01:00
Martin Kroeker 12b41d5598
Merge pull request #3034 from xianyi/release-0.3.0
Merge back the release branch into develop to copy tag
2020-12-12 23:27:40 +01:00
Martin Kroeker d2b11c4777
Merge pull request #3033 from xianyi/develop
Update branch from develop to release 0.3.13
2020-12-12 18:19:29 +01:00
Martin Kroeker 7bc0e4a2e0
Update version to 0.3.13 for release 2020-12-12 18:15:33 +01:00
Martin Kroeker d3ec787f77
Update version to 0.3.13 for release 2020-12-12 18:14:49 +01:00
Martin Kroeker 2c309c235d
Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
2020-12-12 18:13:23 +01:00
Martin Kroeker 3dec81200c
Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
2020-12-12 14:27:37 +01:00
Martin Kroeker 737724607f
Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
2020-12-12 10:01:45 +01:00
Martin Kroeker 77edf82c7f
Update Changelog.txt for 0.3.13 2020-12-12 01:25:20 +01:00
Martin Kroeker 6232237dba
Make fallback from P10 to P9 conditional on suitable compiler 2020-12-11 23:41:17 +01:00
Martin Kroeker 7d81acc762
Merge pull request #3 from xianyi/develop
rebase
2020-12-11 23:38:42 +01:00
Martin Kroeker 18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Martin Kroeker 3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
2020-12-10 19:42:54 +01:00