Commit Graph

5477 Commits

Author SHA1 Message Date
Martin Kroeker
be075d53cf Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
2020-11-16 08:38:37 +01:00
Qiyu8
b00a0de132 remove the -mfma flag in when the host has AVX. 2020-11-16 09:14:56 +08:00
Martin Kroeker
d341a0fea0 Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
2020-11-13 12:35:09 +01:00
Martin Kroeker
ec4d77c47c Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well 2020-11-13 09:16:34 +01:00
Martin Kroeker
02699226d0 Merge pull request #111 from xianyi/develop
rebase
2020-11-13 09:14:23 +01:00
Gengxin Xie
d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 2020-11-13 14:20:52 +08:00
Qiyu8
ae0b1dea19 modify system.cmake to enable fma flag 2020-11-13 10:20:24 +08:00
Qiyu8
e0dac6b53b fix the CI failure of target specific option mismatch 2020-11-12 20:31:03 +08:00
Qiyu8
e5c2ceb675 fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
Qiyu8
a87e537b8c modify macro 2020-11-11 15:53:48 +08:00
Qiyu8
5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 2020-11-11 15:18:01 +08:00
Qiyu8
8c0b206d4c Optimize the performance of rot by using universal intrinsics 2020-11-11 14:33:12 +08:00
Qiyu8
c4c591ac5a fix sum optimize issues 2020-11-10 16:16:38 +08:00
Xianyi Zhang
1ea6cfefdb Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v 2020-11-10 09:38:43 +08:00
Xianyi Zhang
fc35b72ae1 Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang
913cc9a4ca Merge branch 'develop' into risc-v 2020-11-10 09:18:25 +08:00
Martin Kroeker
ff16329cb7 Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker
433637ccd8 Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 17:39:05 +01:00
Martin Kroeker
ec088bf33a Fix missing AVX2 and FMA3 capabilities in FORCE_target mode 2020-11-08 13:15:40 +01:00
Martin Kroeker
110c7a6de0 Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Martin Kroeker
d2faa1be4e Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
2020-11-08 10:19:17 +01:00
Martin Kroeker
1c4cfdc139 Stay compatible with old gmake that did not support undefine 2020-11-08 00:12:55 +01:00
Martin Kroeker
f6a57d8f63 Update Makefile.system 2020-11-08 00:01:36 +01:00
Martin Kroeker
f4b7ba12b7 Update Makefile.system 2020-11-07 23:37:21 +01:00
Rajalakshmi Srinivasaraghavan
6e364981a8 Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker
b976a0bf40 Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds 2020-11-07 20:39:56 +01:00
Martin Kroeker
a04f532edf Reset cpu property flags between build cycles in DYNAMIC_ARCH mode 2020-11-07 20:37:03 +01:00
Martin Kroeker
ccb9731c7b Fix propagation of cpu properties to compiler options 2020-11-07 20:30:15 +01:00
Martin Kroeker
a29338aaa6 Remove extraneous quotes that caused a cmake policy warning 2020-11-07 20:27:42 +01:00
Martin Kroeker
438a8e5624 Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds 2020-11-07 20:26:12 +01:00
Martin Kroeker
e5967810b7 Merge pull request #110 from xianyi/develop
rebase
2020-11-07 20:22:41 +01:00
Martin Kroeker
ff74319ea5 Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker
28d2dfe2b3 Fix macro name used in ifdef 2020-11-07 12:17:49 +01:00
Gengxin Xie
725ffbf041 fix typo 2020-11-05 16:25:17 +08:00
Gengxin Xie
d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
Martin Kroeker
60ab9c783f Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
2020-11-04 16:02:46 +01:00
Martin Kroeker
8cc73fee98 Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target 2020-11-03 23:47:04 +01:00
Martin Kroeker
0155cd53a3 Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker
a9f9354296 Fix target test 2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4 Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939 Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10:  Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Aisha Tammy
60997ddd73 allow setting soname without suffix or prefix
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
2020-11-02 13:04:53 +00:00
Martin Kroeker
e5f8c2bf8a typo fix 2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658 Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target 2020-11-01 22:11:48 +01:00
Martin Kroeker
40a93c232b Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
2020-11-01 21:58:26 +01:00
Martin Kroeker
fab952bee4 Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
2020-11-01 14:24:40 +01:00
Martin Kroeker
1cf04a6f0e Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
2020-11-01 09:14:54 +01:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature.   Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Martin Kroeker
7f26be4802 Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH 2020-11-01 00:00:43 +01:00
User User-User
9fab65e90a add openbsd gfortran 2020-11-01 00:38:08 +02:00