Commit Graph

5388 Commits

Author SHA1 Message Date
Martin Kroeker
b37e5fa2f8 Merge pull request #5 from xianyi/develop
rebase
2020-12-19 20:11:06 +01:00
Martin Kroeker
326469ef4a Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
2020-12-19 20:04:19 +01:00
Martin Kroeker
c73d8ee40d Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Martin Kroeker
abef2ea770 Move -fma option setting to kernel/Makefile.L1 2020-12-17 11:32:27 +01:00
Martin Kroeker
b26e32c3af Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
2020-12-16 00:05:04 +01:00
Martin Kroeker
7822eff936 Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
2020-12-16 00:04:45 +01:00
Martin Kroeker
b03dc011be Fix undefined CC variable in clang check 2020-12-14 19:21:52 +01:00
Martin Kroeker
00ce35336e Fix spurious removal of a trailing character from the hostarch string on x86_64 2020-12-13 21:28:01 +01:00
Martin Kroeker
723776ddf7 Merge pull request #4 from xianyi/develop
rebase
2020-12-13 21:22:41 +01:00
Martin Kroeker
5a77ec7f1c Merge pull request #3036 from RajalakshmiSR/p10copyalign
POWER10: Improve copy performance
2020-12-13 21:21:34 +01:00
Rajalakshmi Srinivasaraghavan
2fb11f873b POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-13 10:41:45 -06:00
Martin Kroeker
87315e8a8d Update version to 0.3.13.dev 2020-12-12 23:28:49 +01:00
Martin Kroeker
9031ebd7d5 Update version to 0.3.13.dev 2020-12-12 23:28:20 +01:00
Martin Kroeker
12b41d5598 Merge pull request #3034 from xianyi/release-0.3.0
Merge back the release branch into develop to copy tag
2020-12-12 23:27:40 +01:00
Martin Kroeker
d2b11c4777 Merge pull request #3033 from xianyi/develop
Update branch from develop to release 0.3.13
v0.3.13
2020-12-12 18:19:29 +01:00
Martin Kroeker
7bc0e4a2e0 Update version to 0.3.13 for release 2020-12-12 18:15:33 +01:00
Martin Kroeker
d3ec787f77 Update version to 0.3.13 for release 2020-12-12 18:14:49 +01:00
Martin Kroeker
2c309c235d Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
2020-12-12 18:13:23 +01:00
Martin Kroeker
3dec81200c Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
2020-12-12 14:27:37 +01:00
Martin Kroeker
737724607f Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
2020-12-12 10:01:45 +01:00
Martin Kroeker
77edf82c7f Update Changelog.txt for 0.3.13 2020-12-12 01:25:20 +01:00
Martin Kroeker
6232237dba Make fallback from P10 to P9 conditional on suitable compiler 2020-12-11 23:41:17 +01:00
Martin Kroeker
7d81acc762 Merge pull request #3 from xianyi/develop
rebase
2020-12-11 23:38:42 +01:00
Martin Kroeker
18d8a67485 Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker
043128cbe5 Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
2020-12-10 22:49:28 +01:00
Martin Kroeker
3331ca492d Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
2020-12-10 19:42:54 +01:00
Rajalakshmi Srinivasaraghavan
346e30a46a POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
2020-12-10 11:51:42 -06:00
Martin Kroeker
83de62c20d Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
2020-12-10 16:29:41 +01:00
Martin Kroeker
658da9a769 Merge pull request #3027 from gxw-loongson/develop
Add msa support for loongson
2020-12-10 16:27:30 +01:00
gxw
be24c66a7c Keep LOONGSON3A and LOONGSON3B for loongson 2020-12-10 10:53:13 +08:00
gxw
4b548857d6 Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson

Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Martin Kroeker
d71fe4ed4e Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) 2020-12-08 21:07:57 +01:00
Martin Kroeker
a554712439 remove extra/intermediate size step for min_jj introduced in PR747 2020-12-08 21:01:36 +01:00
Martin Kroeker
5d26223f4a remove extra/intermediate size step of min_jj from PR747 2020-12-08 20:59:56 +01:00
Martin Kroeker
980ab349bc Merge pull request #2 from xianyi/develop
rebase
2020-12-08 20:53:35 +01:00
gxw
d67babf345 Remove gcc unrecognized option '-msched-weight' when check msa 2020-12-08 19:16:39 +08:00
Martin Kroeker
7f11e33e8d Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
2020-12-08 09:39:27 +01:00
Xianyi Zhang
7834c10e2f Add PingTouGe contribution credit. 2020-12-07 16:55:05 +08:00
Martin Kroeker
53e0837809 Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
2020-12-07 08:09:11 +01:00
Hao Chen
ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
2020-12-07 10:25:01 +08:00
Hao Chen
47b639cc9b Fix failed sswap and dswap case by using msa optimization
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
2020-12-07 10:24:49 +08:00
Martin Kroeker
8fef5876d1 Merge pull request #3024 from martin-frbg/sparc
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
2020-12-06 22:34:36 +01:00
Martin Kroeker
6c7d557a16 Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio 2020-12-06 19:20:50 +01:00
Martin Kroeker
b660008c7e Work around DOT and SWAP test failures 2020-12-06 19:15:37 +01:00
Martin Kroeker
f8346603cf Fix compilation with SolarisStudio 2020-12-06 19:14:16 +01:00
Martin Kroeker
93473174d6 Fix utest build with SolarisStudio compilers 2020-12-06 19:12:56 +01:00
Martin Kroeker
b0b14f4e9b Change comments to C style for compatibility 2020-12-06 19:12:02 +01:00
Martin Kroeker
3a1b1b7c8c Fix complex ABI for 32bit SolarisStudio builds 2020-12-06 19:08:43 +01:00
Martin Kroeker
da6d5d675c Fix hostarch detection for sparc 2020-12-06 19:07:45 +01:00
Martin Kroeker
04fa17322c Fix build options for SolarisStudio compilers 2020-12-06 19:05:27 +01:00