Commit Graph

1964 Commits

Author SHA1 Message Date
Martin Kroeker
c3a2d407a0 Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Manjul Mohan
58b88aa5f0 POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.

Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
Honglin Zhu
9e80a194d6 Fix dynamic_list build and gcc version check error 2023-05-21 19:52:58 +08:00
Honglin Zhu
a76afdc047 Compatible with older version of GNU make 2023-05-20 13:58:23 +08:00
Honglin Zhu
90f041e348 Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00
Honglin Zhu
0b83088887 spr dynamic arch support 2023-05-19 10:48:18 +08:00
Honglin Zhu
f249ccb741 Fix spr sbgemm error 2023-05-19 10:48:18 +08:00
Martin Kroeker
e9a8d5b45f Merge pull request #4015 from martin-frbg/issue4013-2
[WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV
2023-04-23 18:51:12 +02:00
Martin Kroeker
72caceb324 Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker
84bcf6639f Disable gcc's tree-vectorizer pass on all operating systems 2023-04-20 23:24:52 +02:00
Martin Kroeker
c9174ae8d7 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:44 +02:00
Martin Kroeker
c2fe9cb91f Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:14 +02:00
Martin Kroeker
66b39b835c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:45 +02:00
Martin Kroeker
bb6d6735bf Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:15 +02:00
Martin Kroeker
d18efaed20 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:43:43 +02:00
Martin Kroeker
99f6d31ed5 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:55 +02:00
Martin Kroeker
7de9335c56 Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:09 +02:00
Martin Kroeker
437c0bf2b4 Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom
32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
44164e3a3d revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
Martin Kroeker
8be68fa7f4 move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
Martin Kroeker
3727672a74 Improve workaround and keep compilers from optimizing it out 2023-04-13 18:07:52 +02:00
Martin Kroeker
108a21e47a Move ALPHA out of register 18 (reserved on OSX) 2023-04-13 18:05:14 +02:00
Martin Kroeker
0b1acb0ba3 Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker
c7bbad09ad Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker
cda29633a3 move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
Martin Kroeker
09ace3cf23 Merge pull request #3846 from lilh9598/sbgemm_opt
Improve the performance of sbgemm_tcopy on neoversen2
2023-03-26 19:04:57 +02:00
Sergei Lewis
cb0a70e0e2 dot.c early bail fix 2023-03-02 09:51:10 +00:00
Martin Kroeker
38d6fb4225 Fix dependencies in builds with specified subsets of precision types 2023-02-23 23:12:06 +01:00
Martin Kroeker
e412bee313 fix GEMM kernel dependencies in builds that use only a subset of precisions 2023-02-22 00:37:14 +01:00
Martin Kroeker
d80adf253e make SSYMV available to BUILD_DOUBLE-only builds 2023-02-22 00:30:20 +01:00
Martin Kroeker
5481c328e8 fix DYNAMIC_ARCH builds that use only a subset of precisions 2023-02-22 00:28:25 +01:00
Martin Kroeker
5a9cd87794 Merge pull request #3868 from Mousius/sve-prefetch
Remove prefetches from SVE kernels
2022-12-24 10:52:29 +01:00
Chris Sidebottom
1361229291 Remove prefetches from SVE kernels
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
2022-12-16 14:43:09 +00:00
Bart Oldeman
60e49b851c Fix typo in clobber list, should be xmm14 instead of ymm14. 2022-12-06 16:30:46 -05:00
Bart Oldeman
4afe1439a1 Fix skylake fallback kernel name for old compilers. 2022-12-06 16:09:54 -05:00
Bart Oldeman
5ceca1a4d8 Add sscal.c + microkernels for Haswell, Zen, Skylake and newer.
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
2022-12-06 14:05:49 -05:00
lilianhuang
729af6406f bugfix for sbgemm_ncopy_8_neoversen2 2022-12-05 05:10:18 -05:00
Martin Kroeker
042e3c0e7c Merge pull request #3848 from bartoldeman/dscal-haswell-ymm
dscal: use ymm registers in Haswell microkernel
2022-12-05 08:56:08 +01:00
Bart Oldeman
5c3169ecd8 dscal: use ymm registers in Haswell microkernel
Using 256-bit registers in dscal makes this microkernel consistent with
cscal and zscal, and generally doubles performance if the vector fits in
L1 cache.
2022-12-01 07:48:05 -05:00
Chris Sidebottom
eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom
fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
lilianhuang
fdac8a97c1 Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2022-11-29 04:46:14 -05:00
lilianhuang
135718eafc Improve the performance of sbgemm_tcopy on neoversen2 2022-11-28 04:17:54 -05:00
Chris Sidebottom
4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Martin Kroeker
f73cfb7e2c change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
Martin Kroeker
1688c7da43 change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Bart Oldeman
6c1043eb41 Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.

Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker
c9d78dc3b2 Remove excess initializer (leftover from rework of PR 3793) 2022-10-31 16:57:03 +01:00