Hao Chen
ad38bd0e89
Fix failed cgemv and zgemv test case after using msa optimization
...
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
2020-12-07 10:25:01 +08:00
Hao Chen
47b639cc9b
Fix failed sswap and dswap case by using msa optimization
...
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
2020-12-07 10:24:49 +08:00
Martin Kroeker
8fef5876d1
Merge pull request #3024 from martin-frbg/sparc
...
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
2020-12-06 22:34:36 +01:00
Martin Kroeker
6c7d557a16
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio
2020-12-06 19:20:50 +01:00
Martin Kroeker
b660008c7e
Work around DOT and SWAP test failures
2020-12-06 19:15:37 +01:00
Martin Kroeker
f8346603cf
Fix compilation with SolarisStudio
2020-12-06 19:14:16 +01:00
Martin Kroeker
93473174d6
Fix utest build with SolarisStudio compilers
2020-12-06 19:12:56 +01:00
Martin Kroeker
b0b14f4e9b
Change comments to C style for compatibility
2020-12-06 19:12:02 +01:00
Martin Kroeker
3a1b1b7c8c
Fix complex ABI for 32bit SolarisStudio builds
2020-12-06 19:08:43 +01:00
Martin Kroeker
da6d5d675c
Fix hostarch detection for sparc
2020-12-06 19:07:45 +01:00
Martin Kroeker
04fa17322c
Fix build options for SolarisStudio compilers
2020-12-06 19:05:27 +01:00
Martin Kroeker
3853014ea1
Merge pull request #1 from xianyi/develop
...
rebase
2020-12-06 18:52:51 +01:00
Jin Bo
65de6f5957
Fix test errors reported by cblas_cgemm & cblas_ctrmm
...
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
2020-12-05 15:08:17 +08:00
Gordon Fossum
213c0e7abb
Added special unrolled vectorized versions of "Solve" for specific sizes,
...
in DTRSM and STRSM, to improve performance in Power9 and Power10.
2020-12-04 17:07:06 -06:00
Martin Kroeker
f21618684b
Merge pull request #3018 from martin-frbg/issue3015
...
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
2020-12-04 22:08:17 +01:00
Martin Kroeker
441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
...
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-04 22:07:16 +01:00
Martin Kroeker
66302b3c06
Merge pull request #3013 from martin-frbg/gcc46
...
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
2020-12-04 08:54:11 +01:00
Martin Kroeker
07e9a12349
Merge pull request #3011 from cyyever/fix_link
...
link math lib on FreeBSD
2020-12-04 08:50:59 +01:00
Martin Kroeker
dd1adbdec4
Merge pull request #3019 from RajalakshmiSR/dgemm_param
...
POWER10: Update param.h
2020-12-04 08:49:28 +01:00
Martin Kroeker
a1eecccda2
Update f_check
2020-12-03 23:43:17 +01:00
Rajalakshmi Srinivasaraghavan
41fe6e864e
POWER10: Update param.h
...
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
2020-12-03 14:40:11 -06:00
Martin Kroeker
74b5850581
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
2020-12-03 21:28:10 +01:00
Martin Kroeker
da0c94c76f
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
2020-12-03 21:25:57 +01:00
Martin Kroeker
a6692dc129
use gfortran-10 with xcode 12
2020-12-03 14:32:21 +01:00
Martin Kroeker
72a553f5bc
Update .travis.yml
2020-12-03 09:17:27 +01:00
Martin Kroeker
dcbb3b5ef1
fix misplaced lines
2020-12-02 23:13:13 +01:00
Martin Kroeker
57456c248b
fix gfortran requirement in osx interface64 test
2020-12-02 15:56:21 +01:00
Martin Kroeker
c361313564
Disable deprecated 32bit xcode
2020-12-02 07:49:43 +01:00
Gengxin Xie
0cb7a403b2
fix error declare function blas_level1_thread_with_return_value
2020-12-02 09:51:52 +08:00
Martin Kroeker
77a538d4ba
Update an overlooked instance of xcode 10.0 as well
2020-12-01 22:05:35 +01:00
Martin Kroeker
9621062eba
Update OSX xcode version to 11.5
2020-12-01 12:23:30 +01:00
Gengxin Xie
b766c1e9bb
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-01 16:49:26 +08:00
Martin Kroeker
22574b474e
Suppress -mfma as well for gcc 4.6
2020-11-30 21:41:51 +01:00
Martin Kroeker
f662022994
Move the version check to avoid overwriting unprocessed compiler data
2020-11-30 17:24:27 +01:00
Martin Kroeker
5e81e81478
Merge pull request #3014 from RajalakshmiSR/dgemvnp10
...
POWER10: Optimize dgemv_n
2020-11-30 08:18:24 +01:00
Rajalakshmi Srinivasaraghavan
7d46e31de1
POWER10: Optimize dgemv_n
...
Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.
2020-11-29 15:28:28 -06:00
Martin Kroeker
62a2eb884f
Add SSE flags for x86
2020-11-29 15:33:07 +01:00
Martin Kroeker
2e99e2699b
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx
2020-11-29 15:32:17 +01:00
Martin Kroeker
006b13299f
Merge pull request #3012 from martin-frbg/restore-getarch
...
Restore RISCV entries accidentally trashed by my PR 3005
2020-11-29 13:27:47 +01:00
Martin Kroeker
ca17d3dc3d
Restore RISCV entries accidentally trashed by my PR 3005
2020-11-29 13:19:51 +01:00
Martin Kroeker
52ed2741c5
Merge pull request #3010 from ggouaillardet/topic/fj_compilers
...
add Fujitsu compilers
2020-11-29 11:36:43 +01:00
cyy
3b4c016110
link math lib on FreeBSD
2020-11-29 17:17:35 +08:00
Gilles Gouaillardet
358100ec15
add Fujitsu compilers
...
Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
2020-11-29 14:35:42 +09:00
Martin Kroeker
3788b6d156
Merge pull request #3005 from martin-frbg/ssefix
...
Add -msse for x86 and silence build warning in getarch
2020-11-23 08:35:32 +01:00
Martin Kroeker
bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
...
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
2020-11-23 08:35:12 +01:00
Martin Kroeker
2f42d23104
Merge pull request #3002 from martin-frbg/issue3000
...
Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size
2020-11-22 22:51:26 +01:00
Martin Kroeker
b72dd007dc
Merge pull request #3001 from martin-frbg/issue2996
...
Fix ambiguous ifdefs in tests for user-defined options in Makefiles
2020-11-22 22:50:41 +01:00
Martin Kroeker
11ebe5fa25
Avoid redefinition warning
2020-11-22 21:16:07 +01:00
Martin Kroeker
01f01dae98
Add -msse if supported
2020-11-22 21:15:08 +01:00
Martin Kroeker
e7bf8ced6c
Build fix for systems that do not support getauxval
2020-11-22 20:20:28 +01:00