Martin Kroeker
6dd71af0c3
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
...
Don't overwrite blas_thread_buffer if already set
2020-11-20 09:42:10 +01:00
Alexander Grund
a05dc6e62b
Add reproducer test for crash after fork
...
See #2993 for an analysis
2020-11-19 15:46:37 +01:00
Alexander Grund
60005eb47b
Don't overwrite blas_thread_buffer if already set
...
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
2020-11-19 14:51:51 +01:00
Anton Blanchard
043f3d6faa
POWER10: Use POWER9 as a fallback
...
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
2020-11-19 21:04:10 +11:00
Anton Blanchard
fdf71d66b3
POWER10: Fix ld version detection
...
LDVERSIONGTEQ35 needs to escape the '>' character.
LDVERSIONGTEQ35 is checking the system ld version which may be different
to the toolchain being used to compile OpenBLAS. We don't have a path
to the linker in our Makefiles, so (ab)use gcc -Wl,--version to get the
version of ld in our toolchain.
2020-11-19 20:50:42 +11:00
Martin Kroeker
7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum
...
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Martin Kroeker
be075d53cf
Merge pull request #2983 from Qiyu8/optimize-srot
...
Optimize the performance of rot by using universal intrinsics
2020-11-16 08:38:37 +01:00
Qiyu8
b00a0de132
remove the -mfma flag in when the host has AVX.
2020-11-16 09:14:56 +08:00
Martin Kroeker
d341a0fea0
Merge pull request #2989 from martin-frbg/cmake-fma
...
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
2020-11-13 12:35:09 +01:00
Martin Kroeker
ec4d77c47c
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well
2020-11-13 09:16:34 +01:00
Martin Kroeker
02699226d0
Merge pull request #111 from xianyi/develop
...
rebase
2020-11-13 09:14:23 +01:00
Gengxin Xie
d6e7e05bb3
Improve the performance of dasum and sasum when SMP is defined
2020-11-13 14:20:52 +08:00
Qiyu8
ae0b1dea19
modify system.cmake to enable fma flag
2020-11-13 10:20:24 +08:00
Qiyu8
e0dac6b53b
fix the CI failure of target specific option mismatch
2020-11-12 20:31:03 +08:00
Qiyu8
e5c2ceb675
fix the CI failure of lack the head
2020-11-12 17:35:17 +08:00
Qiyu8
a87e537b8c
modify macro
2020-11-11 15:53:48 +08:00
Qiyu8
5bc0a7583f
only FMA3 and vector larger than 128 have positive effects.
2020-11-11 15:18:01 +08:00
Qiyu8
8c0b206d4c
Optimize the performance of rot by using universal intrinsics
2020-11-11 14:33:12 +08:00
Qiyu8
c4c591ac5a
fix sum optimize issues
2020-11-10 16:16:38 +08:00
Xianyi Zhang
1ea6cfefdb
Refs #2899 . Merge branch 'damonyu1989-openblas-open-910' into risc-v
2020-11-10 09:38:43 +08:00
Xianyi Zhang
fc35b72ae1
Refs #2899
...
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang
913cc9a4ca
Merge branch 'develop' into risc-v
2020-11-10 09:18:25 +08:00
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
...
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker
433637ccd8
Merge pull request #2980 from martin-frbg/fixgetarch
...
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 17:39:05 +01:00
Martin Kroeker
ec088bf33a
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 13:15:40 +01:00
Martin Kroeker
110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
...
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Martin Kroeker
d2faa1be4e
Merge pull request #2978 from martin-frbg/fixdynfeatures
...
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
2020-11-08 10:19:17 +01:00
Martin Kroeker
1c4cfdc139
Stay compatible with old gmake that did not support undefine
2020-11-08 00:12:55 +01:00
Martin Kroeker
f6a57d8f63
Update Makefile.system
2020-11-08 00:01:36 +01:00
Martin Kroeker
f4b7ba12b7
Update Makefile.system
2020-11-07 23:37:21 +01:00
Rajalakshmi Srinivasaraghavan
6e364981a8
Optimize sdot/ddot for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker
b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
2020-11-07 20:39:56 +01:00
Martin Kroeker
a04f532edf
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
2020-11-07 20:37:03 +01:00
Martin Kroeker
ccb9731c7b
Fix propagation of cpu properties to compiler options
2020-11-07 20:30:15 +01:00
Martin Kroeker
a29338aaa6
Remove extraneous quotes that caused a cmake policy warning
2020-11-07 20:27:42 +01:00
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
2020-11-07 20:26:12 +01:00
Martin Kroeker
e5967810b7
Merge pull request #110 from xianyi/develop
...
rebase
2020-11-07 20:22:41 +01:00
Martin Kroeker
ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
...
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
2020-11-07 12:17:49 +01:00
Gengxin Xie
725ffbf041
fix typo
2020-11-05 16:25:17 +08:00
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-05 15:12:36 +08:00
Martin Kroeker
60ab9c783f
Merge pull request #2966 from martin-frbg/issue2964
...
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
2020-11-04 16:02:46 +01:00
Martin Kroeker
8cc73fee98
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
2020-11-03 23:47:04 +01:00
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
2020-11-03 23:45:49 +01:00
Martin Kroeker
a9f9354296
Fix target test
2020-11-02 23:17:46 +01:00
Martin Kroeker
b9bc76aec4
Add files via upload
2020-11-02 22:43:50 +01:00
Martin Kroeker
f071245939
Merge pull request #2967 from RajalakshmiSR/dgemm88
...
POWER10: Change dgemm unroll factors
2020-11-02 18:54:36 +01:00
Aisha Tammy
60997ddd73
allow setting soname without suffix or prefix
...
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
2020-11-02 13:04:53 +00:00
Martin Kroeker
e5f8c2bf8a
typo fix
2020-11-01 22:25:43 +01:00
Martin Kroeker
6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
2020-11-01 22:11:48 +01:00