Martin Kroeker
|
7e9cb39a25
|
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
|
2020-11-16 08:40:46 +01:00 |
Martin Kroeker
|
be075d53cf
|
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
|
2020-11-16 08:38:37 +01:00 |
Qiyu8
|
b00a0de132
|
remove the -mfma flag in when the host has AVX.
|
2020-11-16 09:14:56 +08:00 |
Martin Kroeker
|
d341a0fea0
|
Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
|
2020-11-13 12:35:09 +01:00 |
Martin Kroeker
|
ec4d77c47c
|
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
|
2020-11-13 09:16:34 +01:00 |
Martin Kroeker
|
02699226d0
|
Merge pull request #111 from xianyi/develop
rebase
|
2020-11-13 09:14:23 +01:00 |
Qiyu8
|
ae0b1dea19
|
modify system.cmake to enable fma flag
|
2020-11-13 10:20:24 +08:00 |
Qiyu8
|
e0dac6b53b
|
fix the CI failure of target specific option mismatch
|
2020-11-12 20:31:03 +08:00 |
Qiyu8
|
e5c2ceb675
|
fix the CI failure of lack the head
|
2020-11-12 17:35:17 +08:00 |
Qiyu8
|
a87e537b8c
|
modify macro
|
2020-11-11 15:53:48 +08:00 |
Qiyu8
|
5bc0a7583f
|
only FMA3 and vector larger than 128 have positive effects.
|
2020-11-11 15:18:01 +08:00 |
Qiyu8
|
8c0b206d4c
|
Optimize the performance of rot by using universal intrinsics
|
2020-11-11 14:33:12 +08:00 |
Qiyu8
|
c4c591ac5a
|
fix sum optimize issues
|
2020-11-10 16:16:38 +08:00 |
Martin Kroeker
|
ff16329cb7
|
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
|
2020-11-08 22:43:00 +01:00 |
Martin Kroeker
|
433637ccd8
|
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
|
2020-11-08 17:39:05 +01:00 |
Martin Kroeker
|
ec088bf33a
|
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
|
2020-11-08 13:15:40 +01:00 |
Martin Kroeker
|
110c7a6de0
|
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
|
2020-11-08 10:19:34 +01:00 |
Martin Kroeker
|
d2faa1be4e
|
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
|
2020-11-08 10:19:17 +01:00 |
Martin Kroeker
|
1c4cfdc139
|
Stay compatible with old gmake that did not support undefine
|
2020-11-08 00:12:55 +01:00 |
Martin Kroeker
|
f6a57d8f63
|
Update Makefile.system
|
2020-11-08 00:01:36 +01:00 |
Martin Kroeker
|
f4b7ba12b7
|
Update Makefile.system
|
2020-11-07 23:37:21 +01:00 |
Rajalakshmi Srinivasaraghavan
|
6e364981a8
|
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
|
2020-11-07 15:21:58 -06:00 |
Martin Kroeker
|
b976a0bf40
|
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
|
2020-11-07 20:39:56 +01:00 |
Martin Kroeker
|
a04f532edf
|
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
|
2020-11-07 20:37:03 +01:00 |
Martin Kroeker
|
ccb9731c7b
|
Fix propagation of cpu properties to compiler options
|
2020-11-07 20:30:15 +01:00 |
Martin Kroeker
|
a29338aaa6
|
Remove extraneous quotes that caused a cmake policy warning
|
2020-11-07 20:27:42 +01:00 |
Martin Kroeker
|
438a8e5624
|
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
|
2020-11-07 20:26:12 +01:00 |
Martin Kroeker
|
e5967810b7
|
Merge pull request #110 from xianyi/develop
rebase
|
2020-11-07 20:22:41 +01:00 |
Martin Kroeker
|
ff74319ea5
|
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
|
2020-11-07 14:41:34 +01:00 |
Martin Kroeker
|
28d2dfe2b3
|
Fix macro name used in ifdef
|
2020-11-07 12:17:49 +01:00 |
Gengxin Xie
|
725ffbf041
|
fix typo
|
2020-11-05 16:25:17 +08:00 |
Gengxin Xie
|
d9ba49165a
|
Improve the performance of rot by using AVX512 and AVX2 intrinsic
|
2020-11-05 15:12:36 +08:00 |
Martin Kroeker
|
60ab9c783f
|
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
|
2020-11-04 16:02:46 +01:00 |
Martin Kroeker
|
8cc73fee98
|
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
|
2020-11-03 23:47:04 +01:00 |
Martin Kroeker
|
0155cd53a3
|
Add -msse3 where needed for DYNAMIC_ARCH builds
|
2020-11-03 23:45:49 +01:00 |
Martin Kroeker
|
a9f9354296
|
Fix target test
|
2020-11-02 23:17:46 +01:00 |
Martin Kroeker
|
b9bc76aec4
|
Add files via upload
|
2020-11-02 22:43:50 +01:00 |
Martin Kroeker
|
f071245939
|
Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10: Change dgemm unroll factors
|
2020-11-02 18:54:36 +01:00 |
Martin Kroeker
|
e5f8c2bf8a
|
typo fix
|
2020-11-01 22:25:43 +01:00 |
Martin Kroeker
|
6baf8af658
|
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
|
2020-11-01 22:11:48 +01:00 |
Martin Kroeker
|
40a93c232b
|
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
|
2020-11-01 21:58:26 +01:00 |
Martin Kroeker
|
fab952bee4
|
Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
|
2020-11-01 14:24:40 +01:00 |
Martin Kroeker
|
1cf04a6f0e
|
Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
|
2020-11-01 09:14:54 +01:00 |
Rajalakshmi Srinivasaraghavan
|
dd7a9cc5bf
|
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
|
2020-10-31 18:28:57 -05:00 |
Martin Kroeker
|
7f26be4802
|
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
|
2020-11-01 00:00:43 +01:00 |
User User-User
|
9fab65e90a
|
add openbsd gfortran
|
2020-11-01 00:38:08 +02:00 |
Martin Kroeker
|
9efc3f0815
|
Merge pull request #109 from xianyi/develop
rebase
|
2020-10-31 22:33:52 +01:00 |
Martin Kroeker
|
aa21cb5217
|
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
|
2020-10-31 20:24:21 +01:00 |
Guillaume Horel
|
1f564d729b
|
fix avx2 detection
reword commits to make it clearer
|
2020-10-31 10:00:48 -04:00 |
Martin Kroeker
|
9349dcd206
|
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
|
2020-10-30 08:54:10 +01:00 |