Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 0256294921
Fix syntax mixup 2020-11-22 17:41:44 +01:00
Martin Kroeker 2b114c3f30
Restore proper Makefile 2020-11-22 17:16:22 +01:00
Martin Kroeker 60e1fddca7
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds 2020-11-22 16:48:22 +01:00
Martin Kroeker ebb8788696
Use ifneq instead of ifdef for CROSS option 2020-11-22 16:33:34 +01:00
Martin Kroeker 857afcc41d
Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Martin Kroeker 5fa305172a
Use ifeq instead of ifdef for user-definable options 2020-11-22 16:29:56 +01:00
Martin Kroeker d3ff1f889f
Convert ifndefs to ifneq 2020-11-22 16:27:17 +01:00
Martin Kroeker 65eb7afaf4
Change ifndef CROSS to ifneq 2020-11-22 16:25:36 +01:00
Martin Kroeker 8a6b17f97d
Change ifndefs to ifneq 2020-11-22 16:19:31 +01:00
Martin Kroeker 0f863f96e4
Merge pull request #112 from xianyi/develop
rebase
2020-11-22 16:17:19 +01:00
Martin Kroeker 437702e0e1
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
2020-11-22 12:25:33 +01:00
Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Martin Kroeker 613e3b2baf
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
2020-11-22 12:22:57 +01:00
Xianyi Zhang 05a0ea2340 Merge branch 'risc-v' into develop 2020-11-22 16:05:32 +08:00
Xianyi Zhang 7037849498 Merge branch 'develop' into risc-v 2020-11-22 16:04:50 +08:00
Xianyi Zhang c6c9c24d1b Update doc for C910. 2020-11-22 16:02:19 +08:00
Martin Kroeker 6dd71af0c3
Merge pull request #2995 from Flamefire/fix_thread_buffer_init
Don't overwrite blas_thread_buffer if already set
2020-11-20 09:42:10 +01:00
Alexander Grund a05dc6e62b
Add reproducer test for crash after fork
See #2993 for an analysis
2020-11-19 15:46:37 +01:00
Alexander Grund 60005eb47b
Don't overwrite blas_thread_buffer if already set
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993
2020-11-19 14:51:51 +01:00
Anton Blanchard 043f3d6faa POWER10: Use POWER9 as a fallback
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
2020-11-19 21:04:10 +11:00
Anton Blanchard fdf71d66b3 POWER10: Fix ld version detection
LDVERSIONGTEQ35 needs to escape the '>' character.

LDVERSIONGTEQ35 is checking the system ld version which may be different
to the toolchain being used to compile OpenBLAS. We don't have a path
to the linker in our Makefiles, so (ab)use gcc -Wl,--version to get the
version of ld in our toolchain.
2020-11-19 20:50:42 +11:00
Martin Kroeker 7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Martin Kroeker be075d53cf
Merge pull request #2983 from Qiyu8/optimize-srot
Optimize the performance of rot by using universal intrinsics
2020-11-16 08:38:37 +01:00
Qiyu8 b00a0de132 remove the -mfma flag in when the host has AVX. 2020-11-16 09:14:56 +08:00
Martin Kroeker d341a0fea0
Merge pull request #2989 from martin-frbg/cmake-fma
Fix missing -mfma compiler flag in cmake builds without DYNAMIC_ARCH
2020-11-13 12:35:09 +01:00
Martin Kroeker ec4d77c47c
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well 2020-11-13 09:16:34 +01:00
Martin Kroeker 02699226d0
Merge pull request #111 from xianyi/develop
rebase
2020-11-13 09:14:23 +01:00
Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 2020-11-13 14:20:52 +08:00
Qiyu8 ae0b1dea19 modify system.cmake to enable fma flag 2020-11-13 10:20:24 +08:00
Qiyu8 e0dac6b53b fix the CI failure of target specific option mismatch 2020-11-12 20:31:03 +08:00
Qiyu8 e5c2ceb675 fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
Qiyu8 a87e537b8c modify macro 2020-11-11 15:53:48 +08:00
Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 2020-11-11 15:18:01 +08:00
Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 2020-11-11 14:33:12 +08:00
Qiyu8 c4c591ac5a fix sum optimize issues 2020-11-10 16:16:38 +08:00
Xianyi Zhang 1ea6cfefdb Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v 2020-11-10 09:38:43 +08:00
Xianyi Zhang fc35b72ae1 Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang 913cc9a4ca Merge branch 'develop' into risc-v 2020-11-10 09:18:25 +08:00
Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker 433637ccd8
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
2020-11-08 17:39:05 +01:00
Martin Kroeker ec088bf33a
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode 2020-11-08 13:15:40 +01:00
Martin Kroeker 110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Martin Kroeker d2faa1be4e
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
2020-11-08 10:19:17 +01:00
Martin Kroeker 1c4cfdc139
Stay compatible with old gmake that did not support undefine 2020-11-08 00:12:55 +01:00
Martin Kroeker f6a57d8f63
Update Makefile.system 2020-11-08 00:01:36 +01:00
Martin Kroeker f4b7ba12b7
Update Makefile.system 2020-11-07 23:37:21 +01:00
Rajalakshmi Srinivasaraghavan 6e364981a8 Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds 2020-11-07 20:39:56 +01:00
Martin Kroeker a04f532edf
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode 2020-11-07 20:37:03 +01:00
Martin Kroeker ccb9731c7b
Fix propagation of cpu properties to compiler options 2020-11-07 20:30:15 +01:00