Martin Kroeker
84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-20 23:24:52 +02:00
Martin Kroeker
30a0ccbd14
Merge pull request #4014 from martin-frbg/issue4013
...
Generally disable gcc's tree-vectorizer in x86_64 SGEMV,SSYMV,ZGEMV,C/ZDOT
2023-04-20 10:45:15 +02:00
Martin Kroeker
c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:44 +02:00
Martin Kroeker
c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:14 +02:00
Martin Kroeker
66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:45 +02:00
Martin Kroeker
bb6d6735bf
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:15 +02:00
Martin Kroeker
d18efaed20
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:43:43 +02:00
Martin Kroeker
99f6d31ed5
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:42:55 +02:00
Martin Kroeker
7de9335c56
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:42:09 +02:00
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
...
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Martin Kroeker
c628030669
Merge pull request #3855 from Mousius/more-switch-ratio-tuning
...
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
2023-04-18 22:45:51 +02:00
Martin Kroeker
efcf71255a
Merge pull request #4003 from martin-frbg/issue3995
...
Fix instabilities in CGEMM/CTRMM/DNRM2 on Apple M1/M2 under OSX
2023-04-18 14:55:23 +02:00
Martin Kroeker
51dd1339e7
Merge pull request #4010 from martin-frbg/issue3989-2
...
Remove any stray trailing dash from CROSS_SUFFIX
2023-04-18 14:55:02 +02:00
Martin Kroeker
479509bb37
Remove any stray trailing dash from CROSS_SUFFIX (as would result from clang -arch)
2023-04-17 21:57:25 +02:00
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
...
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom
5b165420b5
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
...
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
...
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
a5e1fdd525
Merge pull request #4007 from Mousius/update-contributors
...
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 15:45:39 +02:00
Martin Kroeker
44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
2023-04-17 14:23:13 +02:00
Chris Sidebottom
bfc20c2e97
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 11:53:31 +01:00
Martin Kroeker
a44422f0d5
Merge pull request #3983 from thrasibule/makeflags
...
parallel build fixes
2023-04-16 13:49:05 +02:00
Martin Kroeker
73e6fcb925
Merge pull request #4006 from martin-frbg/issue4005
...
Fix ?GEMMT implementation
2023-04-16 13:30:17 +02:00
Martin Kroeker
38d7a7b562
Fix ?GEMMT
2023-04-16 00:07:58 +02:00
Martin Kroeker
8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now)
2023-04-15 12:02:39 +02:00
Martin Kroeker
4eac244c9a
Merge pull request #4004 from martin-frbg/ccheckif
...
fix missing blank in c_check
2023-04-14 22:57:18 +02:00
Martin Kroeker
970e611e00
fix missing blank in test
2023-04-14 19:42:34 +02:00
Martin Kroeker
f096a339e4
Use long value fields for cpu ident on OSX
2023-04-13 18:16:09 +02:00
Martin Kroeker
3727672a74
Improve workaround and keep compilers from optimizing it out
2023-04-13 18:07:52 +02:00
Martin Kroeker
108a21e47a
Move ALPHA out of register 18 (reserved on OSX)
2023-04-13 18:05:14 +02:00
Martin Kroeker
0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 18:03:35 +02:00
Martin Kroeker
c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 18:00:47 +02:00
Martin Kroeker
cda29633a3
move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 17:59:48 +02:00
Martin Kroeker
6f759a9ce9
Merge pull request #4002 from imzhuhl/spr_detect
...
Fix x86 detection error
2023-04-13 13:18:39 +02:00
Honglin Zhu
ac650225c1
Fix x86 detection error
2023-04-13 00:08:27 +08:00
Martin Kroeker
58de28f332
Merge pull request #3999 from martin-frbg/issue3998
...
Convert CMAKE booleans to 0/1 values for gensymbol
2023-04-12 10:38:27 +02:00
Martin Kroeker
2ea00788c2
Add ?GEMMT
2023-04-11 22:46:51 +02:00
Martin Kroeker
6c45c98083
Add (only) the GEMMT functions
2023-04-11 22:41:18 +02:00
Martin Kroeker
cd8eb33a9c
Expose BUILD_LAPACK_DEPRECATED
2023-04-11 22:39:53 +02:00
Martin Kroeker
57bdc36c84
add conditionals for BUILD_LAPACK_DEPRECATED
2023-04-11 22:38:38 +02:00
Martin Kroeker
e0f8b4fef4
Merge pull request #4000 from martin-frbg/applem2
...
Support Apple A15/M2 cpus through the existing VORTEX target
2023-04-11 08:28:44 +02:00
Martin Kroeker
caa2945138
Support Apple A15/M2 cpus through the existing VORTEX target
2023-04-11 00:04:09 +02:00
Martin Kroeker
d5fbec7c20
Export ?MIN/?MAX, ?AMIN/?AMAX, CDOT/ZDOT and ?GEMMT
2023-04-10 23:49:35 +02:00
Martin Kroeker
fd20a2e8c6
Convert CMAKE booleans to 0/1 values for gensymbol
2023-04-10 22:28:00 +02:00
Martin Kroeker
326b200b08
Merge pull request #3996 from martin-frbg/issue3989
...
Protect CROSS_SUFFIX against spurious linebreaks from isolated dashes
2023-04-07 23:31:51 +02:00
Martin Kroeker
3effdc1505
Protect CROSS_PATH against spurious addition of linebreaks from isolated dashes
...
fix for #3989
2023-04-07 19:32:22 +02:00
Martin Kroeker
654d87d73a
Merge pull request #3994 from rgommers/fix-ssyconvf-export
...
Export `ssyconvf` symbol
2023-04-07 18:15:14 +02:00
Martin Kroeker
d677214570
Remove the badge for the dead drone.io service and add Cirrus CI in its place
2023-04-07 14:11:16 +02:00
Ralf Gommers
a4ee1c84f0
Export `ssyconvf` symbol
...
This was apparently missed in commit a836fe8ec
when adding the
LAPACK 3.7.0 symbols. We noticed when adding wrappers for 3.7.0
routines in SciPy. For more details, see
https://github.com/rgommers/scipy/issues/143
2023-04-07 12:50:36 +01:00
Martin Kroeker
ca8544be6d
Merge pull request #3991 from martin-frbg/lapack808
...
Refactor ?GEBAL for readability (Reference-LAPACK PR 808)
2023-04-04 15:27:17 +02:00
Martin Kroeker
d175b8f56f
Refactor ?GEBAL (Reference-LAPACK PR 808)
2023-04-03 15:02:10 +02:00