Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker 479509bb37
Remove any stray trailing dash from CROSS_SUFFIX (as would result from clang -arch) 2023-04-17 21:57:25 +02:00
Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom 5b165420b5 SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker a5e1fdd525
Merge pull request #4007 from Mousius/update-contributors
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 15:45:39 +02:00
Martin Kroeker 44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
Chris Sidebottom bfc20c2e97 Add Chris Sidebottom to CONTRIBUTORS.md 2023-04-17 11:53:31 +01:00
Martin Kroeker a44422f0d5
Merge pull request #3983 from thrasibule/makeflags
parallel build fixes
2023-04-16 13:49:05 +02:00
Martin Kroeker 73e6fcb925
Merge pull request #4006 from martin-frbg/issue4005
Fix ?GEMMT implementation
2023-04-16 13:30:17 +02:00
Martin Kroeker 38d7a7b562
Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker 8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
Martin Kroeker 4eac244c9a
Merge pull request #4004 from martin-frbg/ccheckif
fix missing blank in c_check
2023-04-14 22:57:18 +02:00
Martin Kroeker 970e611e00
fix missing blank in test 2023-04-14 19:42:34 +02:00
Martin Kroeker f096a339e4
Use long value fields for cpu ident on OSX 2023-04-13 18:16:09 +02:00
Martin Kroeker 3727672a74
Improve workaround and keep compilers from optimizing it out 2023-04-13 18:07:52 +02:00
Martin Kroeker 108a21e47a
Move ALPHA out of register 18 (reserved on OSX) 2023-04-13 18:05:14 +02:00
Martin Kroeker 0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker cda29633a3
move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
Martin Kroeker 6f759a9ce9
Merge pull request #4002 from imzhuhl/spr_detect
Fix x86 detection error
2023-04-13 13:18:39 +02:00
Honglin Zhu ac650225c1 Fix x86 detection error 2023-04-13 00:08:27 +08:00
Martin Kroeker 58de28f332
Merge pull request #3999 from martin-frbg/issue3998
Convert CMAKE booleans to 0/1 values for gensymbol
2023-04-12 10:38:27 +02:00
Martin Kroeker 2ea00788c2
Add ?GEMMT 2023-04-11 22:46:51 +02:00
Martin Kroeker 6c45c98083
Add (only) the GEMMT functions 2023-04-11 22:41:18 +02:00
Martin Kroeker cd8eb33a9c
Expose BUILD_LAPACK_DEPRECATED 2023-04-11 22:39:53 +02:00
Martin Kroeker 57bdc36c84
add conditionals for BUILD_LAPACK_DEPRECATED 2023-04-11 22:38:38 +02:00
Martin Kroeker e0f8b4fef4
Merge pull request #4000 from martin-frbg/applem2
Support Apple A15/M2 cpus through the existing VORTEX target
2023-04-11 08:28:44 +02:00
Martin Kroeker caa2945138
Support Apple A15/M2 cpus through the existing VORTEX target 2023-04-11 00:04:09 +02:00
Martin Kroeker d5fbec7c20
Export ?MIN/?MAX, ?AMIN/?AMAX, CDOT/ZDOT and ?GEMMT 2023-04-10 23:49:35 +02:00
Martin Kroeker fd20a2e8c6
Convert CMAKE booleans to 0/1 values for gensymbol 2023-04-10 22:28:00 +02:00
Martin Kroeker 326b200b08
Merge pull request #3996 from martin-frbg/issue3989
Protect CROSS_SUFFIX against spurious linebreaks from isolated dashes
2023-04-07 23:31:51 +02:00
Martin Kroeker 3effdc1505
Protect CROSS_PATH against spurious addition of linebreaks from isolated dashes
fix for #3989
2023-04-07 19:32:22 +02:00
Martin Kroeker 654d87d73a
Merge pull request #3994 from rgommers/fix-ssyconvf-export
Export `ssyconvf` symbol
2023-04-07 18:15:14 +02:00
Martin Kroeker d677214570
Remove the badge for the dead drone.io service and add Cirrus CI in its place 2023-04-07 14:11:16 +02:00
Ralf Gommers a4ee1c84f0 Export `ssyconvf` symbol
This was apparently missed in commit a836fe8ec when adding the
LAPACK 3.7.0 symbols. We noticed when adding wrappers for 3.7.0
routines in SciPy. For more details, see
https://github.com/rgommers/scipy/issues/143
2023-04-07 12:50:36 +01:00
Martin Kroeker ca8544be6d
Merge pull request #3991 from martin-frbg/lapack808
Refactor ?GEBAL for readability (Reference-LAPACK PR 808)
2023-04-04 15:27:17 +02:00
Martin Kroeker d175b8f56f
Refactor ?GEBAL (Reference-LAPACK PR 808) 2023-04-03 15:02:10 +02:00
Martin Kroeker 5f1fb27c40
Rename cirrus.yml to .cirrus.yml 2023-04-03 11:00:17 +02:00
Zhang Xianyi ab0755590f
Merge pull request #3990 from martin-frbg/cirrus
Add Apple M1 testing via Cirrus CI
2023-04-03 16:54:40 +08:00
Martin Kroeker 65b7bf9f3e
Add Apple M1 testing via Cirrus CI 2023-04-03 10:51:38 +02:00
Martin Kroeker 516f22b8ca
Update version to 0.3.23.dev 2023-04-01 22:25:55 +02:00
Martin Kroeker 3e8f51e7cf
Update version to 0.3.23.dev 2023-04-01 22:25:07 +02:00
Martin Kroeker f9a701b6dd
Merge pull request #3988 from xianyi/release-0.3.0
Merge back from release branch into develop to copy tag
2023-04-01 22:24:26 +02:00
Martin Kroeker 394a9fbafe
Increment version to 0.3.23 2023-04-01 22:18:01 +02:00
Martin Kroeker 8f32384633
Increment version to 0.3.23 2023-04-01 22:17:27 +02:00
Martin Kroeker af3606d9fb
Merge pull request #3987 from xianyi/develop
Merge from develop branch for 0.3.23
2023-04-01 22:16:24 +02:00
Martin Kroeker cd2e80ca2e
Merge branch 'release-0.3.0' into develop 2023-04-01 22:15:52 +02:00
Martin Kroeker e2614eb6ce
Merge pull request #3986 from martin-frbg/changelog0323
Update with 0.3.23 changes
2023-04-01 22:08:43 +02:00
Martin Kroeker 1f70481384
Update with 0.3.23 changes 2023-04-01 20:33:31 +02:00
Martin Kroeker eb0793bfd0
Merge pull request #3984 from martin-frbg/develop
Fix logic bug in single-threaded C/Z SPR
2023-04-01 11:35:52 +02:00