Martin Kroeker
75b1f3becc
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
2020-12-19 23:17:40 +01:00
Martin Kroeker
07c5e549b2
Merge pull request #3045 from martin-frbg/nvidiasdk
...
Support NVIDIA HPC SDK 20.11 compilers on x86_64
2020-12-19 23:14:02 +01:00
Martin Kroeker
114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
2020-12-19 22:15:58 +01:00
Martin Kroeker
005cce5507
Amend SkylakeX options to support the NVIDIA compiler
2020-12-19 22:11:49 +01:00
Martin Kroeker
b859b6e79d
Add nvfortran
2020-12-19 22:09:57 +01:00
Martin Kroeker
b212a2fb9f
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
2020-12-19 22:08:37 +01:00
Martin Kroeker
e40416567a
Add version printout for PGI/NVIDIA compiler
2020-12-19 22:06:56 +01:00
Martin Kroeker
b37e5fa2f8
Merge pull request #5 from xianyi/develop
...
rebase
2020-12-19 20:11:06 +01:00
Martin Kroeker
326469ef4a
Merge pull request #3042 from martin-frbg/develop
...
Move FMA3 option setting to the kernel makefile
2020-12-19 20:04:19 +01:00
Xianyi Zhang
a3cac9cca0
Update sgemm kernel 1x4 for C910.
2020-12-18 11:53:23 +08:00
Martin Kroeker
c73d8ee40d
Conditionally add -mfma to compiler options where needed
2020-12-17 11:34:05 +01:00
Martin Kroeker
abef2ea770
Move -fma option setting to kernel/Makefile.L1
2020-12-17 11:32:27 +01:00
Martin Kroeker
b26e32c3af
Merge pull request #3040 from martin-frbg/fixfcheck
...
Fix undefined CC variable in check for clang+gfortran combo
2020-12-16 00:05:04 +01:00
Martin Kroeker
7822eff936
Merge pull request #3038 from martin-frbg/issue3037
...
Fix spurious assumption of cross-compilation on some architectures
2020-12-16 00:04:45 +01:00
Martin Kroeker
865676682d
Add Intel Rocket Lake
2020-12-14 22:40:23 +01:00
Martin Kroeker
0f7776af0b
Add Intel Rocket Lake
2020-12-14 22:30:36 +01:00
Martin Kroeker
b03dc011be
Fix undefined CC variable in clang check
2020-12-14 19:21:52 +01:00
Martin Kroeker
00ce35336e
Fix spurious removal of a trailing character from the hostarch string on x86_64
2020-12-13 21:28:01 +01:00
Martin Kroeker
723776ddf7
Merge pull request #4 from xianyi/develop
...
rebase
2020-12-13 21:22:41 +01:00
Martin Kroeker
5a77ec7f1c
Merge pull request #3036 from RajalakshmiSR/p10copyalign
...
POWER10: Improve copy performance
2020-12-13 21:21:34 +01:00
Rajalakshmi Srinivasaraghavan
2fb11f873b
POWER10: Improve copy performance
...
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-13 10:41:45 -06:00
Joshie
ad63647446
Define BLAS acronym in README
2020-12-13 09:06:14 +00:00
Martin Kroeker
87315e8a8d
Update version to 0.3.13.dev
2020-12-12 23:28:49 +01:00
Martin Kroeker
9031ebd7d5
Update version to 0.3.13.dev
2020-12-12 23:28:20 +01:00
Martin Kroeker
12b41d5598
Merge pull request #3034 from xianyi/release-0.3.0
...
Merge back the release branch into develop to copy tag
2020-12-12 23:27:40 +01:00
Martin Kroeker
d2b11c4777
Merge pull request #3033 from xianyi/develop
...
Update branch from develop to release 0.3.13
2020-12-12 18:19:29 +01:00
Martin Kroeker
7bc0e4a2e0
Update version to 0.3.13 for release
2020-12-12 18:15:33 +01:00
Martin Kroeker
d3ec787f77
Update version to 0.3.13 for release
2020-12-12 18:14:49 +01:00
Martin Kroeker
2c309c235d
Merge pull request #3031 from martin-frbg/changelog13
...
Update Changelog.txt
2020-12-12 18:13:23 +01:00
Martin Kroeker
3dec81200c
Update Changelog.txt
...
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
2020-12-12 14:27:37 +01:00
Martin Kroeker
737724607f
Merge pull request #3030 from martin-frbg/fix2994
...
Make fallback from POWER10 to POWER9 depend on new enough compiler
2020-12-12 10:01:45 +01:00
Martin Kroeker
77edf82c7f
Update Changelog.txt for 0.3.13
2020-12-12 01:25:20 +01:00
Martin Kroeker
6232237dba
Make fallback from P10 to P9 conditional on suitable compiler
2020-12-11 23:41:17 +01:00
Martin Kroeker
7d81acc762
Merge pull request #3 from xianyi/develop
...
rebase
2020-12-11 23:38:42 +01:00
Martin Kroeker
18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
...
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker
043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
...
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Martin Kroeker
3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10
...
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
2020-12-10 19:42:54 +01:00
Rajalakshmi Srinivasaraghavan
346e30a46a
POWER10: Improve axpy performance
...
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-10 11:51:42 -06:00
Martin Kroeker
83de62c20d
Merge pull request #3026 from martin-frbg/revert747
...
Revert PR747 - SYRK parameter changes for Haswell and related targets
2020-12-10 16:29:41 +01:00
Martin Kroeker
658da9a769
Merge pull request #3027 from gxw-loongson/develop
...
Add msa support for loongson
2020-12-10 16:27:30 +01:00
gxw
be24c66a7c
Keep LOONGSON3A and LOONGSON3B for loongson
2020-12-10 10:53:13 +08:00
gxw
4b548857d6
Add msa support for loongson
...
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Martin Kroeker
d71fe4ed4e
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
2020-12-08 21:07:57 +01:00
Martin Kroeker
a554712439
remove extra/intermediate size step for min_jj introduced in PR747
2020-12-08 21:01:36 +01:00
Martin Kroeker
5d26223f4a
remove extra/intermediate size step of min_jj from PR747
2020-12-08 20:59:56 +01:00
Martin Kroeker
980ab349bc
Merge pull request #2 from xianyi/develop
...
rebase
2020-12-08 20:53:35 +01:00
gxw
d67babf345
Remove gcc unrecognized option '-msched-weight' when check msa
2020-12-08 19:16:39 +08:00
Martin Kroeker
7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop
...
MIPS: Fix two bugs
2020-12-08 09:39:27 +01:00
Xianyi Zhang
7834c10e2f
Add PingTouGe contribution credit.
2020-12-07 16:55:05 +08:00
Martin Kroeker
53e0837809
Merge pull request #3022 from jinboson/develop
...
Fix test errors reported by cblas_cgemm & cblas_ctrmm
2020-12-07 08:09:11 +01:00