Commit Graph

5262 Commits

Author SHA1 Message Date
Alexander Grund
60005eb47b Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993
2020-11-19 14:51:51 +01:00
Martin Kroeker
7e9cb39a25 Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Martin Kroeker
be075d53cf Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
2020-11-16 08:38:37 +01:00
Qiyu8
b00a0de132 remove the -mfma flag in when the host has AVX. 2020-11-16 09:14:56 +08:00
Martin Kroeker
d341a0fea0 Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
2020-11-13 12:35:09 +01:00
Martin Kroeker
ec4d77c47c Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well 2020-11-13 09:16:34 +01:00
Martin Kroeker
02699226d0 Merge pull request #111 from xianyi/develop
rebase
2020-11-13 09:14:23 +01:00
Qiyu8
ae0b1dea19 modify system.cmake to enable fma flag 2020-11-13 10:20:24 +08:00
Qiyu8
e0dac6b53b fix the CI failure of target specific option mismatch 2020-11-12 20:31:03 +08:00
Qiyu8
e5c2ceb675 fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
Qiyu8
a87e537b8c modify macro 2020-11-11 15:53:48 +08:00
Qiyu8
5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 2020-11-11 15:18:01 +08:00
Qiyu8
8c0b206d4c Optimize the performance of rot by using universal intrinsics 2020-11-11 14:33:12 +08:00
Qiyu8
c4c591ac5a fix sum optimize issues 2020-11-10 16:16:38 +08:00
Martin Kroeker
ff16329cb7 Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker
433637ccd8 Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 17:39:05 +01:00
Martin Kroeker
ec088bf33a Fix missing AVX2 and FMA3 capabilities in FORCE_target mode 2020-11-08 13:15:40 +01:00
Martin Kroeker
110c7a6de0 Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Martin Kroeker
d2faa1be4e Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
2020-11-08 10:19:17 +01:00
Martin Kroeker
1c4cfdc139 Stay compatible with old gmake that did not support undefine 2020-11-08 00:12:55 +01:00
Martin Kroeker
f6a57d8f63 Update Makefile.system 2020-11-08 00:01:36 +01:00
Martin Kroeker
f4b7ba12b7 Update Makefile.system 2020-11-07 23:37:21 +01:00
Rajalakshmi Srinivasaraghavan
6e364981a8 Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker
b976a0bf40 Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds 2020-11-07 20:39:56 +01:00
Martin Kroeker
a04f532edf Reset cpu property flags between build cycles in DYNAMIC_ARCH mode 2020-11-07 20:37:03 +01:00
Martin Kroeker
ccb9731c7b Fix propagation of cpu properties to compiler options 2020-11-07 20:30:15 +01:00
Martin Kroeker
a29338aaa6 Remove extraneous quotes that caused a cmake policy warning 2020-11-07 20:27:42 +01:00
Martin Kroeker
438a8e5624 Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds 2020-11-07 20:26:12 +01:00
Martin Kroeker
e5967810b7 Merge pull request #110 from xianyi/develop
rebase
2020-11-07 20:22:41 +01:00
Martin Kroeker
ff74319ea5 Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker
28d2dfe2b3 Fix macro name used in ifdef 2020-11-07 12:17:49 +01:00
Gengxin Xie
725ffbf041 fix typo 2020-11-05 16:25:17 +08:00
Gengxin Xie
d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
Martin Kroeker
60ab9c783f Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
2020-11-04 16:02:46 +01:00
Martin Kroeker
8cc73fee98 Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target 2020-11-03 23:47:04 +01:00
Martin Kroeker
0155cd53a3 Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker
a9f9354296 Fix target test 2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4 Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939 Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10:  Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Martin Kroeker
e5f8c2bf8a typo fix 2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658 Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target 2020-11-01 22:11:48 +01:00
Martin Kroeker
40a93c232b Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
2020-11-01 21:58:26 +01:00
Martin Kroeker
fab952bee4 Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
2020-11-01 14:24:40 +01:00
Martin Kroeker
1cf04a6f0e Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
2020-11-01 09:14:54 +01:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature.   Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Martin Kroeker
7f26be4802 Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH 2020-11-01 00:00:43 +01:00
User User-User
9fab65e90a add openbsd gfortran 2020-11-01 00:38:08 +02:00
Martin Kroeker
9efc3f0815 Merge pull request #109 from xianyi/develop
rebase
2020-10-31 22:33:52 +01:00
Martin Kroeker
aa21cb5217 Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
2020-10-31 20:24:21 +01:00
Guillaume Horel
1f564d729b fix avx2 detection
reword commits to make it clearer
2020-10-31 10:00:48 -04:00