Commit Graph

7041 Commits

Author SHA1 Message Date
Chris Sidebottom
ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Martin Kroeker
a5e1fdd525 Merge pull request #4007 from Mousius/update-contributors
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 15:45:39 +02:00
Chris Sidebottom
bfc20c2e97 Add Chris Sidebottom to CONTRIBUTORS.md 2023-04-17 11:53:31 +01:00
Martin Kroeker
a44422f0d5 Merge pull request #3983 from thrasibule/makeflags
parallel build fixes
2023-04-16 13:49:05 +02:00
Martin Kroeker
73e6fcb925 Merge pull request #4006 from martin-frbg/issue4005
Fix ?GEMMT implementation
2023-04-16 13:30:17 +02:00
Martin Kroeker
38d7a7b562 Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker
4eac244c9a Merge pull request #4004 from martin-frbg/ccheckif
fix missing blank in c_check
2023-04-14 22:57:18 +02:00
Martin Kroeker
970e611e00 fix missing blank in test 2023-04-14 19:42:34 +02:00
Martin Kroeker
6f759a9ce9 Merge pull request #4002 from imzhuhl/spr_detect
Fix x86 detection error
2023-04-13 13:18:39 +02:00
Honglin Zhu
ac650225c1 Fix x86 detection error 2023-04-13 00:08:27 +08:00
Martin Kroeker
58de28f332 Merge pull request #3999 from martin-frbg/issue3998
Convert CMAKE booleans to 0/1 values for gensymbol
2023-04-12 10:38:27 +02:00
Martin Kroeker
2ea00788c2 Add ?GEMMT 2023-04-11 22:46:51 +02:00
Martin Kroeker
6c45c98083 Add (only) the GEMMT functions 2023-04-11 22:41:18 +02:00
Martin Kroeker
cd8eb33a9c Expose BUILD_LAPACK_DEPRECATED 2023-04-11 22:39:53 +02:00
Martin Kroeker
57bdc36c84 add conditionals for BUILD_LAPACK_DEPRECATED 2023-04-11 22:38:38 +02:00
Martin Kroeker
e0f8b4fef4 Merge pull request #4000 from martin-frbg/applem2
Support Apple A15/M2 cpus through the existing VORTEX target
2023-04-11 08:28:44 +02:00
Martin Kroeker
caa2945138 Support Apple A15/M2 cpus through the existing VORTEX target 2023-04-11 00:04:09 +02:00
Martin Kroeker
d5fbec7c20 Export ?MIN/?MAX, ?AMIN/?AMAX, CDOT/ZDOT and ?GEMMT 2023-04-10 23:49:35 +02:00
Martin Kroeker
fd20a2e8c6 Convert CMAKE booleans to 0/1 values for gensymbol 2023-04-10 22:28:00 +02:00
Martin Kroeker
326b200b08 Merge pull request #3996 from martin-frbg/issue3989
Protect CROSS_SUFFIX against spurious linebreaks from isolated dashes
2023-04-07 23:31:51 +02:00
Martin Kroeker
3effdc1505 Protect CROSS_PATH against spurious addition of linebreaks from isolated dashes
fix for #3989
2023-04-07 19:32:22 +02:00
Martin Kroeker
654d87d73a Merge pull request #3994 from rgommers/fix-ssyconvf-export
Export `ssyconvf` symbol
2023-04-07 18:15:14 +02:00
Martin Kroeker
d677214570 Remove the badge for the dead drone.io service and add Cirrus CI in its place 2023-04-07 14:11:16 +02:00
Ralf Gommers
a4ee1c84f0 Export ssyconvf symbol
This was apparently missed in commit a836fe8ec when adding the
LAPACK 3.7.0 symbols. We noticed when adding wrappers for 3.7.0
routines in SciPy. For more details, see
https://github.com/rgommers/scipy/issues/143
2023-04-07 12:50:36 +01:00
Martin Kroeker
ca8544be6d Merge pull request #3991 from martin-frbg/lapack808
Refactor ?GEBAL for readability (Reference-LAPACK PR 808)
2023-04-04 15:27:17 +02:00
Martin Kroeker
d175b8f56f Refactor ?GEBAL (Reference-LAPACK PR 808) 2023-04-03 15:02:10 +02:00
Martin Kroeker
5f1fb27c40 Rename cirrus.yml to .cirrus.yml 2023-04-03 11:00:17 +02:00
Zhang Xianyi
ab0755590f Merge pull request #3990 from martin-frbg/cirrus
Add Apple M1 testing via Cirrus CI
2023-04-03 16:54:40 +08:00
Martin Kroeker
65b7bf9f3e Add Apple M1 testing via Cirrus CI 2023-04-03 10:51:38 +02:00
Martin Kroeker
516f22b8ca Update version to 0.3.23.dev 2023-04-01 22:25:55 +02:00
Martin Kroeker
3e8f51e7cf Update version to 0.3.23.dev 2023-04-01 22:25:07 +02:00
Martin Kroeker
f9a701b6dd Merge pull request #3988 from xianyi/release-0.3.0
Merge back from release branch into develop to copy tag
2023-04-01 22:24:26 +02:00
Martin Kroeker
394a9fbafe Increment version to 0.3.23 v0.3.23 2023-04-01 22:18:01 +02:00
Martin Kroeker
8f32384633 Increment version to 0.3.23 2023-04-01 22:17:27 +02:00
Martin Kroeker
af3606d9fb Merge pull request #3987 from xianyi/develop
Merge from develop branch for 0.3.23
2023-04-01 22:16:24 +02:00
Martin Kroeker
cd2e80ca2e Merge branch 'release-0.3.0' into develop 2023-04-01 22:15:52 +02:00
Martin Kroeker
e2614eb6ce Merge pull request #3986 from martin-frbg/changelog0323
Update with 0.3.23 changes
2023-04-01 22:08:43 +02:00
Martin Kroeker
1f70481384 Update with 0.3.23 changes 2023-04-01 20:33:31 +02:00
Martin Kroeker
eb0793bfd0 Merge pull request #3984 from martin-frbg/develop
Fix logic bug in single-threaded C/Z SPR
2023-04-01 11:35:52 +02:00
Martin Kroeker
36fcb52094 Fix logic - we want real OR imaginary part of X to be nonzero here 2023-04-01 00:02:54 +02:00
Guillaume Horel
397108fba2 serialize shared prerequisites 2023-03-31 09:25:51 -04:00
Guillaume Horel
281e834566 do not pass -j flag to the MAKE variable 2023-03-31 09:25:51 -04:00
Martin Kroeker
d708951375 Merge pull request #3980 from martin-frbg/fix3941-2
Split and improve test criteria in LU computation (?GETF2)
2023-03-30 06:56:05 +02:00
Martin Kroeker
6c431239da Split test condition in LU computation - non-denormal for computation, exact zero for reporting singularity 2023-03-29 22:14:21 +02:00
Martin Kroeker
23f2c4ca5b Merge pull request #3978 from martin-frbg/fix3941
fix division-by-zero guard in zgetf2
2023-03-29 16:22:27 +02:00
Martin Kroeker
12aabb9f9b fix conditional 2023-03-29 09:44:33 +02:00
Martin Kroeker
fd0614cbc0 Merge pull request #3975 from martin-frbg/issue3974
Fix build failures with NO_LAPACK
2023-03-28 22:57:27 +02:00
Martin Kroeker
912d713b52 redo lost edit 2023-03-28 18:31:04 +02:00
Martin Kroeker
dc15c18efc Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list 2023-03-28 16:33:09 +02:00
Martin Kroeker
5d9d382e36 Merge pull request #3970 from linouxis9/develop
Improve Intel Raptor Lake detection
2023-03-28 16:22:27 +02:00