Commit Graph

7061 Commits

Author SHA1 Message Date
Martin Kroeker
c2fe9cb91f Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:14 +02:00
Martin Kroeker
66b39b835c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:45 +02:00
Martin Kroeker
bb6d6735bf Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:15 +02:00
Martin Kroeker
d18efaed20 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:43:43 +02:00
Martin Kroeker
99f6d31ed5 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:55 +02:00
Martin Kroeker
7de9335c56 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:09 +02:00
Martin Kroeker
437c0bf2b4 Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Martin Kroeker
c628030669 Merge pull request #3855 from Mousius/more-switch-ratio-tuning
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
2023-04-18 22:45:51 +02:00
Martin Kroeker
efcf71255a Merge pull request #4003 from martin-frbg/issue3995
Fix instabilities in CGEMM/CTRMM/DNRM2 on Apple M1/M2 under OSX
2023-04-18 14:55:23 +02:00
Martin Kroeker
51dd1339e7 Merge pull request #4010 from martin-frbg/issue3989-2
Remove any stray trailing dash from CROSS_SUFFIX
2023-04-18 14:55:02 +02:00
Martin Kroeker
479509bb37 Remove any stray trailing dash from CROSS_SUFFIX (as would result from clang -arch) 2023-04-17 21:57:25 +02:00
Chris Sidebottom
5b165420b5 SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom
32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
a5e1fdd525 Merge pull request #4007 from Mousius/update-contributors
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 15:45:39 +02:00
Martin Kroeker
44164e3a3d revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
Chris Sidebottom
bfc20c2e97 Add Chris Sidebottom to CONTRIBUTORS.md 2023-04-17 11:53:31 +01:00
Martin Kroeker
a44422f0d5 Merge pull request #3983 from thrasibule/makeflags
parallel build fixes
2023-04-16 13:49:05 +02:00
Martin Kroeker
73e6fcb925 Merge pull request #4006 from martin-frbg/issue4005
Fix ?GEMMT implementation
2023-04-16 13:30:17 +02:00
Martin Kroeker
38d7a7b562 Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker
8be68fa7f4 move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
Martin Kroeker
4eac244c9a Merge pull request #4004 from martin-frbg/ccheckif
fix missing blank in c_check
2023-04-14 22:57:18 +02:00
Martin Kroeker
970e611e00 fix missing blank in test 2023-04-14 19:42:34 +02:00
Martin Kroeker
f096a339e4 Use long value fields for cpu ident on OSX 2023-04-13 18:16:09 +02:00
Martin Kroeker
3727672a74 Improve workaround and keep compilers from optimizing it out 2023-04-13 18:07:52 +02:00
Martin Kroeker
108a21e47a Move ALPHA out of register 18 (reserved on OSX) 2023-04-13 18:05:14 +02:00
Martin Kroeker
0b1acb0ba3 Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker
c7bbad09ad Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker
cda29633a3 move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
Martin Kroeker
6f759a9ce9 Merge pull request #4002 from imzhuhl/spr_detect
Fix x86 detection error
2023-04-13 13:18:39 +02:00
Honglin Zhu
ac650225c1 Fix x86 detection error 2023-04-13 00:08:27 +08:00
Martin Kroeker
58de28f332 Merge pull request #3999 from martin-frbg/issue3998
Convert CMAKE booleans to 0/1 values for gensymbol
2023-04-12 10:38:27 +02:00
Martin Kroeker
2ea00788c2 Add ?GEMMT 2023-04-11 22:46:51 +02:00
Martin Kroeker
6c45c98083 Add (only) the GEMMT functions 2023-04-11 22:41:18 +02:00
Martin Kroeker
cd8eb33a9c Expose BUILD_LAPACK_DEPRECATED 2023-04-11 22:39:53 +02:00
Martin Kroeker
57bdc36c84 add conditionals for BUILD_LAPACK_DEPRECATED 2023-04-11 22:38:38 +02:00
Martin Kroeker
e0f8b4fef4 Merge pull request #4000 from martin-frbg/applem2
Support Apple A15/M2 cpus through the existing VORTEX target
2023-04-11 08:28:44 +02:00
Martin Kroeker
caa2945138 Support Apple A15/M2 cpus through the existing VORTEX target 2023-04-11 00:04:09 +02:00
Martin Kroeker
d5fbec7c20 Export ?MIN/?MAX, ?AMIN/?AMAX, CDOT/ZDOT and ?GEMMT 2023-04-10 23:49:35 +02:00
Martin Kroeker
fd20a2e8c6 Convert CMAKE booleans to 0/1 values for gensymbol 2023-04-10 22:28:00 +02:00
Martin Kroeker
326b200b08 Merge pull request #3996 from martin-frbg/issue3989
Protect CROSS_SUFFIX against spurious linebreaks from isolated dashes
2023-04-07 23:31:51 +02:00
Martin Kroeker
3effdc1505 Protect CROSS_PATH against spurious addition of linebreaks from isolated dashes
fix for #3989
2023-04-07 19:32:22 +02:00
Martin Kroeker
654d87d73a Merge pull request #3994 from rgommers/fix-ssyconvf-export
Export `ssyconvf` symbol
2023-04-07 18:15:14 +02:00
Martin Kroeker
d677214570 Remove the badge for the dead drone.io service and add Cirrus CI in its place 2023-04-07 14:11:16 +02:00
Ralf Gommers
a4ee1c84f0 Export ssyconvf symbol
This was apparently missed in commit a836fe8ec when adding the
LAPACK 3.7.0 symbols. We noticed when adding wrappers for 3.7.0
routines in SciPy. For more details, see
https://github.com/rgommers/scipy/issues/143
2023-04-07 12:50:36 +01:00
Martin Kroeker
ca8544be6d Merge pull request #3991 from martin-frbg/lapack808
Refactor ?GEBAL for readability (Reference-LAPACK PR 808)
2023-04-04 15:27:17 +02:00
Martin Kroeker
d175b8f56f Refactor ?GEBAL (Reference-LAPACK PR 808) 2023-04-03 15:02:10 +02:00
Martin Kroeker
5f1fb27c40 Rename cirrus.yml to .cirrus.yml 2023-04-03 11:00:17 +02:00
Zhang Xianyi
ab0755590f Merge pull request #3990 from martin-frbg/cirrus
Add Apple M1 testing via Cirrus CI
2023-04-03 16:54:40 +08:00
Martin Kroeker
65b7bf9f3e Add Apple M1 testing via Cirrus CI 2023-04-03 10:51:38 +02:00
Martin Kroeker
516f22b8ca Update version to 0.3.23.dev 2023-04-01 22:25:55 +02:00