Commit Graph

5876 Commits

Author SHA1 Message Date
Martin Kroeker 7bc0e4a2e0
Update version to 0.3.13 for release 2020-12-12 18:15:33 +01:00
Martin Kroeker d3ec787f77
Update version to 0.3.13 for release 2020-12-12 18:14:49 +01:00
Martin Kroeker 2c309c235d
Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
2020-12-12 18:13:23 +01:00
Martin Kroeker 3dec81200c
Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
2020-12-12 14:27:37 +01:00
Martin Kroeker 737724607f
Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
2020-12-12 10:01:45 +01:00
Martin Kroeker 77edf82c7f
Update Changelog.txt for 0.3.13 2020-12-12 01:25:20 +01:00
Martin Kroeker 6232237dba
Make fallback from P10 to P9 conditional on suitable compiler 2020-12-11 23:41:17 +01:00
Martin Kroeker 7d81acc762
Merge pull request #3 from xianyi/develop
rebase
2020-12-11 23:38:42 +01:00
Martin Kroeker 18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker 043128cbe5
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Martin Kroeker 3331ca492d
Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
2020-12-10 19:42:54 +01:00
Rajalakshmi Srinivasaraghavan 346e30a46a POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-10 11:51:42 -06:00
Martin Kroeker 83de62c20d
Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
2020-12-10 16:29:41 +01:00
Martin Kroeker 658da9a769
Merge pull request #3027 from gxw-loongson/develop
Add msa support for loongson
2020-12-10 16:27:30 +01:00
gxw be24c66a7c Keep LOONGSON3A and LOONGSON3B for loongson 2020-12-10 10:53:13 +08:00
gxw 4b548857d6 Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson

Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Martin Kroeker d71fe4ed4e
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) 2020-12-08 21:07:57 +01:00
Martin Kroeker a554712439
remove extra/intermediate size step for min_jj introduced in PR747 2020-12-08 21:01:36 +01:00
Martin Kroeker 5d26223f4a
remove extra/intermediate size step of min_jj from PR747 2020-12-08 20:59:56 +01:00
Martin Kroeker 980ab349bc
Merge pull request #2 from xianyi/develop
rebase
2020-12-08 20:53:35 +01:00
gxw d67babf345 Remove gcc unrecognized option '-msched-weight' when check msa 2020-12-08 19:16:39 +08:00
Martin Kroeker 7f11e33e8d
Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
2020-12-08 09:39:27 +01:00
Xianyi Zhang 7834c10e2f Add PingTouGe contribution credit. 2020-12-07 16:55:05 +08:00
Martin Kroeker 53e0837809
Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
2020-12-07 08:09:11 +01:00
Hao Chen ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
2020-12-07 10:25:01 +08:00
Hao Chen 47b639cc9b Fix failed sswap and dswap case by using msa optimization
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
2020-12-07 10:24:49 +08:00
Martin Kroeker 8fef5876d1
Merge pull request #3024 from martin-frbg/sparc
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
2020-12-06 22:34:36 +01:00
Martin Kroeker 6c7d557a16
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio 2020-12-06 19:20:50 +01:00
Martin Kroeker b660008c7e
Work around DOT and SWAP test failures 2020-12-06 19:15:37 +01:00
Martin Kroeker f8346603cf
Fix compilation with SolarisStudio 2020-12-06 19:14:16 +01:00
Martin Kroeker 93473174d6
Fix utest build with SolarisStudio compilers 2020-12-06 19:12:56 +01:00
Martin Kroeker b0b14f4e9b
Change comments to C style for compatibility 2020-12-06 19:12:02 +01:00
Martin Kroeker 3a1b1b7c8c
Fix complex ABI for 32bit SolarisStudio builds 2020-12-06 19:08:43 +01:00
Martin Kroeker da6d5d675c
Fix hostarch detection for sparc 2020-12-06 19:07:45 +01:00
Martin Kroeker 04fa17322c
Fix build options for SolarisStudio compilers 2020-12-06 19:05:27 +01:00
Martin Kroeker 3853014ea1
Merge pull request #1 from xianyi/develop
rebase
2020-12-06 18:52:51 +01:00
Jin Bo 65de6f5957 Fix test errors reported by cblas_cgemm & cblas_ctrmm
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
2020-12-05 15:08:17 +08:00
Gordon Fossum 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes,
in DTRSM and STRSM, to improve performance in Power9 and Power10.
2020-12-04 17:07:06 -06:00
Martin Kroeker f21618684b
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
2020-12-04 22:08:17 +01:00
Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-04 22:07:16 +01:00
Martin Kroeker 66302b3c06
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
2020-12-04 08:54:11 +01:00
Martin Kroeker 07e9a12349
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
2020-12-04 08:50:59 +01:00
Martin Kroeker dd1adbdec4
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
2020-12-04 08:49:28 +01:00
Martin Kroeker a1eecccda2
Update f_check 2020-12-03 23:43:17 +01:00
Rajalakshmi Srinivasaraghavan 41fe6e864e POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
2020-12-03 14:40:11 -06:00
Martin Kroeker 74b5850581
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds 2020-12-03 21:28:10 +01:00
Martin Kroeker da0c94c76f
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds 2020-12-03 21:25:57 +01:00
Martin Kroeker a6692dc129
use gfortran-10 with xcode 12 2020-12-03 14:32:21 +01:00
Martin Kroeker 72a553f5bc
Update .travis.yml 2020-12-03 09:17:27 +01:00
Martin Kroeker dcbb3b5ef1
fix misplaced lines 2020-12-02 23:13:13 +01:00