Commit Graph

5536 Commits

Author SHA1 Message Date
Gordon Fossum 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes,
in DTRSM and STRSM, to improve performance in Power9 and Power10.
2020-12-04 17:07:06 -06:00
Martin Kroeker f21618684b
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
2020-12-04 22:08:17 +01:00
Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-04 22:07:16 +01:00
Martin Kroeker 66302b3c06
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
2020-12-04 08:54:11 +01:00
Martin Kroeker 07e9a12349
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
2020-12-04 08:50:59 +01:00
Martin Kroeker dd1adbdec4
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
2020-12-04 08:49:28 +01:00
Martin Kroeker a1eecccda2
Update f_check 2020-12-03 23:43:17 +01:00
Rajalakshmi Srinivasaraghavan 41fe6e864e POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
2020-12-03 14:40:11 -06:00
Martin Kroeker 74b5850581
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds 2020-12-03 21:28:10 +01:00
Martin Kroeker da0c94c76f
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds 2020-12-03 21:25:57 +01:00
Martin Kroeker a6692dc129
use gfortran-10 with xcode 12 2020-12-03 14:32:21 +01:00
Martin Kroeker 72a553f5bc
Update .travis.yml 2020-12-03 09:17:27 +01:00
Martin Kroeker dcbb3b5ef1
fix misplaced lines 2020-12-02 23:13:13 +01:00
Martin Kroeker 57456c248b
fix gfortran requirement in osx interface64 test 2020-12-02 15:56:21 +01:00
Martin Kroeker c361313564
Disable deprecated 32bit xcode 2020-12-02 07:49:43 +01:00
Gengxin Xie 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value 2020-12-02 09:51:52 +08:00
Martin Kroeker 77a538d4ba
Update an overlooked instance of xcode 10.0 as well 2020-12-01 22:05:35 +01:00
Martin Kroeker 9621062eba
Update OSX xcode version to 11.5 2020-12-01 12:23:30 +01:00
Gengxin Xie b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 2020-12-01 16:49:26 +08:00
Martin Kroeker 22574b474e
Suppress -mfma as well for gcc 4.6 2020-11-30 21:41:51 +01:00
Martin Kroeker f662022994
Move the version check to avoid overwriting unprocessed compiler data 2020-11-30 17:24:27 +01:00
Martin Kroeker 5e81e81478
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
POWER10:  Optimize dgemv_n
2020-11-30 08:18:24 +01:00
Rajalakshmi Srinivasaraghavan 7d46e31de1 POWER10: Optimize dgemv_n
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
2020-11-29 15:28:28 -06:00
Martin Kroeker 62a2eb884f
Add SSE flags for x86 2020-11-29 15:33:07 +01:00
Martin Kroeker 2e99e2699b
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx 2020-11-29 15:32:17 +01:00
Martin Kroeker 006b13299f
Merge pull request #3012 from martin-frbg/restore-getarch
Restore RISCV entries accidentally trashed by my PR 3005
2020-11-29 13:27:47 +01:00
Martin Kroeker ca17d3dc3d
Restore RISCV entries accidentally trashed by my PR 3005 2020-11-29 13:19:51 +01:00
Martin Kroeker 52ed2741c5
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
add Fujitsu compilers
2020-11-29 11:36:43 +01:00
cyy 3b4c016110 link math lib on FreeBSD 2020-11-29 17:17:35 +08:00
Gilles Gouaillardet 358100ec15 add Fujitsu compilers
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
2020-11-29 14:35:42 +09:00
Martin Kroeker 3788b6d156
Merge pull request #3005 from martin-frbg/ssefix
Add -msse for x86 and silence build warning in getarch
2020-11-23 08:35:32 +01:00
Martin Kroeker bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
2020-11-23 08:35:12 +01:00
Martin Kroeker 2f42d23104
Merge pull request #3002 from martin-frbg/issue3000
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
2020-11-22 22:51:26 +01:00
Martin Kroeker b72dd007dc
Merge pull request #3001 from martin-frbg/issue2996
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
2020-11-22 22:50:41 +01:00
Martin Kroeker 11ebe5fa25
Avoid redefinition warning 2020-11-22 21:16:07 +01:00
Martin Kroeker 01f01dae98
Add -msse if supported 2020-11-22 21:15:08 +01:00
Martin Kroeker e7bf8ced6c
Build fix for systems that do not support getauxval 2020-11-22 20:20:28 +01:00
Martin Kroeker 0256294921
Fix syntax mixup 2020-11-22 17:41:44 +01:00
Martin Kroeker 2b114c3f30
Restore proper Makefile 2020-11-22 17:16:22 +01:00
Martin Kroeker 60e1fddca7
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds 2020-11-22 16:48:22 +01:00
Martin Kroeker ebb8788696
Use ifneq instead of ifdef for CROSS option 2020-11-22 16:33:34 +01:00
Martin Kroeker 857afcc41d
Use ifeq instead of ifdef for user-definable build options 2020-11-22 16:31:44 +01:00
Martin Kroeker 5fa305172a
Use ifeq instead of ifdef for user-definable options 2020-11-22 16:29:56 +01:00
Martin Kroeker d3ff1f889f
Convert ifndefs to ifneq 2020-11-22 16:27:17 +01:00
Martin Kroeker 65eb7afaf4
Change ifndef CROSS to ifneq 2020-11-22 16:25:36 +01:00
Martin Kroeker 8a6b17f97d
Change ifndefs to ifneq 2020-11-22 16:19:31 +01:00
Martin Kroeker 0f863f96e4
Merge pull request #112 from xianyi/develop
rebase
2020-11-22 16:17:19 +01:00
Martin Kroeker 437702e0e1
Merge pull request #2965 from epsilon-0/develop
allow setting soname without suffix or prefix
2020-11-22 12:25:33 +01:00
Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Martin Kroeker 613e3b2baf
Merge pull request #2997 from Flamefire/reproduce_crash
Add reproducer test for crash after fork
2020-11-22 12:22:57 +01:00