Commit Graph

7433 Commits

Author SHA1 Message Date
Martin Kroeker c74ee11376
Add an M1-based OSX crossbuild and a NeoverseN1 build to CIRRUS CI (#3997)
* Add an M1-based OSX crossbuild and a NeoverseN1 build (plus Windows//LLVM commented out for now)
2023-05-08 14:24:38 +02:00
Martin Kroeker 65a7941aa5
Merge pull request #4036 from martin-frbg/issue4020
Mark cblas_xerbla's arguments as const in cblas.h
2023-05-08 12:54:30 +02:00
Martin Kroeker c2078b2356
Mark xerbla's arguments as const 2023-05-07 20:15:13 +02:00
Martin Kroeker d6a42ed574
Merge pull request #4035 from martin-frbg/issue4034
Fix (redundant) lapack-runtest target in toplevel Makefile
2023-05-06 15:51:07 +02:00
Martin Kroeker 60226b35e1
Fix (redundant) lapack-runtest target 2023-05-06 12:44:38 +02:00
Martin Kroeker 4e597ae00b
Merge pull request #4031 from martin-frbg/issue4026
Add suggestions to NUM_THREADS/auxiliary buffer message
2023-05-05 09:32:32 +02:00
Martin Kroeker e5538a62cb
Add suggestions to NUM_THREADS/auxiliary buffer message 2023-05-04 22:56:39 +02:00
Martin Kroeker 6f38a946e8
Merge pull request #4028 from catap/mktemp-fix
Do not requires GNU mktemp
2023-05-03 11:25:25 +02:00
Martin Kroeker 29c717050f
Merge pull request #4022 from martin-frbg/gemmtm
fix cblas_?gemmt
2023-05-03 11:24:54 +02:00
Kirill A. Korinsky b1781ad338
Do not requires GNU mktemp
Historically the GNU mktemp was the first one which doesn't requires
`-t` to create a directory.

Here I've introduced a fallback when `-t` is required.

For example MacPorts contains similar patch: bbe8abfe26/math/OpenBLAS/files/patch-MacOSX-mktemp.diff
2023-04-29 11:13:26 +02:00
Han Gao 7b16c4c051 CI (C910V): add test
Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
2023-04-28 04:32:06 +00:00
Martin Kroeker 1f6f7328eb
remove redundant declaration 2023-04-27 09:14:12 +02:00
Martin Kroeker 7152d6b06d
fix cblas_gemmt 2023-04-27 08:36:20 +02:00
Martin Kroeker e9a8d5b45f
Merge pull request #4015 from martin-frbg/issue4013-2
[WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV
2023-04-23 18:51:12 +02:00
Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker d1b631899b
Merge pull request #4018 from mmuetzel/ci
Adapt CI rules for MSYS2 for updated ccache
2023-04-21 23:52:13 +02:00
Markus Mützel e27e9a50b1 CI (MSYS2): Save ccache before running tests. 2023-04-21 14:10:40 +02:00
Markus Mützel 67d33e5b98 CI (MSYS2): Update location of compiler cache. 2023-04-21 13:02:23 +02:00
Martin Kroeker 84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-20 23:24:52 +02:00
Martin Kroeker 30a0ccbd14
Merge pull request #4014 from martin-frbg/issue4013
Generally disable gcc's tree-vectorizer in x86_64 SGEMV,SSYMV,ZGEMV,C/ZDOT
2023-04-20 10:45:15 +02:00
Martin Kroeker c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:44 +02:00
Martin Kroeker c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:14 +02:00
Martin Kroeker 66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:45 +02:00
Martin Kroeker bb6d6735bf
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:15 +02:00
Martin Kroeker d18efaed20
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:43:43 +02:00
Martin Kroeker 99f6d31ed5
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:55 +02:00
Martin Kroeker 7de9335c56
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:09 +02:00
Martin Kroeker 437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Martin Kroeker c628030669
Merge pull request #3855 from Mousius/more-switch-ratio-tuning
SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
2023-04-18 22:45:51 +02:00
Martin Kroeker efcf71255a
Merge pull request #4003 from martin-frbg/issue3995
Fix instabilities in CGEMM/CTRMM/DNRM2 on Apple M1/M2 under OSX
2023-04-18 14:55:23 +02:00
Martin Kroeker 51dd1339e7
Merge pull request #4010 from martin-frbg/issue3989-2
Remove any stray trailing dash from CROSS_SUFFIX
2023-04-18 14:55:02 +02:00
Martin Kroeker 479509bb37
Remove any stray trailing dash from CROSS_SUFFIX (as would result from clang -arch) 2023-04-17 21:57:25 +02:00
Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom 5b165420b5 SWITCH_RATIO for Arm(R) Neoverse(TM) architecture
This seems like a good balance of values for reasonably sized matrices. With `SWITCH_RATIO=16` the DGEMM scales better to bigger sizes but the better solution would be some kind of
thread throttling so I've gone with `SWITCH_RATIO=8`.
2023-04-17 15:42:55 +01:00
Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker a5e1fdd525
Merge pull request #4007 from Mousius/update-contributors
Add Chris Sidebottom to CONTRIBUTORS.md
2023-04-17 15:45:39 +02:00
Martin Kroeker 44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
Chris Sidebottom bfc20c2e97 Add Chris Sidebottom to CONTRIBUTORS.md 2023-04-17 11:53:31 +01:00
Martin Kroeker a44422f0d5
Merge pull request #3983 from thrasibule/makeflags
parallel build fixes
2023-04-16 13:49:05 +02:00
Martin Kroeker 73e6fcb925
Merge pull request #4006 from martin-frbg/issue4005
Fix ?GEMMT implementation
2023-04-16 13:30:17 +02:00
Martin Kroeker 38d7a7b562
Fix ?GEMMT 2023-04-16 00:07:58 +02:00
Martin Kroeker 8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
Martin Kroeker 4eac244c9a
Merge pull request #4004 from martin-frbg/ccheckif
fix missing blank in c_check
2023-04-14 22:57:18 +02:00
Martin Kroeker 970e611e00
fix missing blank in test 2023-04-14 19:42:34 +02:00
Martin Kroeker f096a339e4
Use long value fields for cpu ident on OSX 2023-04-13 18:16:09 +02:00
Martin Kroeker 3727672a74
Improve workaround and keep compilers from optimizing it out 2023-04-13 18:07:52 +02:00
Martin Kroeker 108a21e47a
Move ALPHA out of register 18 (reserved on OSX) 2023-04-13 18:05:14 +02:00
Martin Kroeker 0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker cda29633a3
move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00