Commit Graph

2007 Commits

Author SHA1 Message Date
Ian McInerney
8a8a8479be Fix cooperlake and sapphire rapids march flags on clang
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.

Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00
Martin Kroeker
34da1a067d Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
Martin Kroeker
07e32c4cb8 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:00:18 +02:00
Martin Kroeker
c211da0688 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:58:57 +02:00
Martin Kroeker
a34a0a7abc Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:56:52 +02:00
Martin Kroeker
54d3246fc6 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:55:17 +02:00
Martin Kroeker
7dd441d5db Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:53:33 +02:00
Martin Kroeker
f692178792 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:52:09 +02:00
Martin Kroeker
d15ffb7fdf Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:50:44 +02:00
Martin Kroeker
a2d867f4d1 Allow negative iNCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:49:05 +02:00
Martin Kroeker
afdc56a421 Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
gxw
e8b571d245 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2 2023-08-07 11:20:42 +08:00
gxw
71fcee6eef LoongArch64: Update dgemm kernel 2023-08-07 11:06:52 +08:00
Martin Kroeker
0f521ece25 Merge pull request #4183 from martin-frbg/issue4181
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2023-08-06 18:59:50 +02:00
Martin Kroeker
41c31bc1d4 Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S" 2023-08-06 16:00:03 +02:00
Martin Kroeker
61d803547a Apply USE_TRMM to MIPS64_GENERIC as to GENERIC 2023-08-06 15:17:38 +02:00
Martin Kroeker
f8ee309402 Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00
gxw
ec1e96aac8 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S 2023-08-05 10:24:17 +08:00
gxw
d46772e037 LoongArch64: Add compiler feature checks 2023-08-05 10:21:43 +08:00
Martin Kroeker
4664b57e6e use shortcut only when both incx and incy are zero 2023-08-04 12:25:34 +02:00
Martin Kroeker
09131f79a6 Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker
6a428b5629 Update casum_microk_skylakex-2.c 2023-07-29 12:24:30 +02:00
Martin Kroeker
ebb447e32e Update zasum_microk_skylakex-2.c 2023-07-29 12:23:57 +02:00
Martin Kroeker
9f6847583a nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:50:16 +02:00
Martin Kroeker
fe54ee3d15 nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:48:38 +02:00
Martin Kroeker
5720fa02c5 Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Chris Sidebottom
84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom
730ca04b48 Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2023-07-27 13:27:28 +01:00
Martin Kroeker
2a62d2df96 Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
Martin Kroeker
849c8806b8 Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2023-07-26 15:49:40 +02:00
Chris Sidebottom
24586bc4ff Disambiguate whilelt 2023-07-25 20:15:44 +01:00
Chris Sidebottom
aea2a4622b Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2023-07-25 14:12:26 +01:00
martin-frbg
7976deff80 Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
Martin Kroeker
76ef1672f8 Override DSDOT with generic code to get rid of qemu precision error 2023-07-19 22:31:07 +02:00
Martin Kroeker
49077e7bde Merge pull request #4145 from martin-frbg/issue4144
Restore zero-initialization of variables in generic ztrsm_utcopy
2023-07-14 12:44:05 +02:00
Martin Kroeker
3d31191b0f Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH

* add casts to disambiguate svwhilelt for clang
2023-07-14 11:06:48 +02:00
Martin Kroeker
cfa0a80664 Restore initialization of data variables 2023-07-13 23:23:12 +02:00
Martin Kroeker
9567305e4c Restore initialization of data01,data02 2023-07-13 23:21:18 +02:00
Xianyi Zhang
e14a025bb1 Temporily walk around zaxpy vector kernel bug. 2023-06-28 11:17:38 +00:00
Martin Kroeker
772b0cc715 Fix early bailout 2023-06-27 16:12:27 +02:00
Martin Kroeker
d6be5036d7 Fix IDAMAX 2023-06-26 21:19:33 +02:00
Martin Kroeker
1fe96f8da7 Fix failures to handle increments of zero 2023-06-25 22:36:57 +02:00
Martin Kroeker
73b30b1dec Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1 2023-06-18 17:46:29 +02:00
Martin Kroeker
c3a2d407a0 Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Manjul Mohan
58b88aa5f0 POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.

Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
Honglin Zhu
9e80a194d6 Fix dynamic_list build and gcc version check error 2023-05-21 19:52:58 +08:00
Honglin Zhu
a76afdc047 Compatible with older version of GNU make 2023-05-20 13:58:23 +08:00
Honglin Zhu
90f041e348 Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00
Honglin Zhu
0b83088887 spr dynamic arch support 2023-05-19 10:48:18 +08:00
Honglin Zhu
f249ccb741 Fix spr sbgemm error 2023-05-19 10:48:18 +08:00