OpenBLAS/kernel/arm64
Martin Kroeker fc8894dd98
Workaround miscompilation by NVIDIA nvc
2023-08-26 00:30:17 +02:00
..
KERNEL
KERNEL.A64FX add sve ztrsm 2022-01-15 22:27:25 +01:00
KERNEL.ARMV8 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 2021-01-12 16:38:51 +01:00
KERNEL.ARMV8SVE Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
KERNEL.CORTEXA53 optimize cgemm on ARM cortex A53 & cortex A55 2021-12-12 17:22:52 +08:00
KERNEL.CORTEXA55 optimize cgemm on ARM cortex A53 & cortex A55 2021-12-12 17:22:52 +08:00
KERNEL.CORTEXA57 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 2021-01-12 16:39:35 +01:00
KERNEL.CORTEXA72
KERNEL.CORTEXA73
KERNEL.CORTEXA510 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
KERNEL.CORTEXA710 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
KERNEL.CORTEXX1 CortexX1 is ARMV8 like A7x 2022-03-28 17:28:29 +02:00
KERNEL.CORTEXX2 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
KERNEL.EMAG8180
KERNEL.FALKOR
KERNEL.FT2000 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
KERNEL.NEOVERSEN1 Add SVE implementation for sdot/ddot 2022-12-01 12:07:50 +00:00
KERNEL.NEOVERSEN2 Merge pull request #3846 from lilh9598/sbgemm_opt 2023-03-26 19:04:57 +02:00
KERNEL.NEOVERSEV1 Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
KERNEL.THUNDERX Add workaround for NVIDIA HPC 2021-01-12 16:49:39 +01:00
KERNEL.THUNDERX2T99 Add SVE implementation for sdot/ddot 2022-12-01 12:07:50 +00:00
KERNEL.THUNDERX3T110 Add SVE implementation for sdot/ddot 2022-12-01 12:07:50 +00:00
KERNEL.TSV110 Add workaround for NVIDIA HPC 2021-01-12 16:51:35 +01:00
KERNEL.VORTEX Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 2021-10-06 11:06:43 +02:00
KERNEL.generic Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Makefile
amax.S
asum.S
axpy.S
casum.S
casum_thunderx2t99.c
cgemm_kernel_4x4.S
cgemm_kernel_8x4.S move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
cgemm_kernel_8x4_cortexa53.c optimize cgemm on ARM cortex A53 & cortex A55 2021-12-12 17:22:52 +08:00
cgemm_kernel_8x4_thunderx2t99.S Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
cgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
cgemm_ncopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
cgemm_tcopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
copy.S
copy_thunderx2t99.c
csum.S
ctrmm_kernel_4x4.S
ctrmm_kernel_8x4.S Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
ctrmm_kernel_sve_v1x4.S add cgemm ctrmm sve kernels 2022-01-05 09:09:18 +01:00
dasum_thunderx2t99.c
daxpy_thunderx.c aarch64 fix std=c18 compilation 2020-10-03 18:00:34 +03:00
daxpy_thunderx2t99.S
ddot_thunderx.c
dgemm_beta.S
dgemm_kernel_4x4.S
dgemm_kernel_4x4_cortexa53.c MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 2021-11-18 21:14:43 +08:00
dgemm_kernel_4x8.S
dgemm_kernel_8x4.S
dgemm_kernel_8x4_thunderx2t99.S
dgemm_kernel_sve_v1x8.S some clean-up & commentary 2021-11-21 14:56:27 +01:00
dgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 2022-12-16 14:43:09 +00:00
dgemm_ncopy_4.S
dgemm_ncopy_8.S
dgemm_tcopy_4.S
dgemm_tcopy_8.S Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 2021-09-17 09:18:25 +02:00
dot.S
dot.c Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
dot_kernel_asimd.c Add SVE implementation for sdot/ddot 2022-12-01 12:07:50 +00:00
dot_kernel_sve.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
dot_thunderx.c
dtrmm_kernel_4x4.S
dtrmm_kernel_4x8.S
dtrmm_kernel_8x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 2021-09-17 09:24:11 +02:00
dtrmm_kernel_sve_v1x8.S some clean-up & commentary 2021-11-21 14:56:27 +01:00
dznrm2_thunderx2t99.c move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
dznrm2_thunderx2t99_fast.c
gemm_ncopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
gemm_ncopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
gemm_tcopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
gemm_tcopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
gemv_n.S
gemv_t.S
iamax.S
iamax_thunderx2t99.c
izamax.S
izamax_thunderx2t99.c
nrm2.S
rot.S
sasum_thunderx2t99.c
sbgemm_beta_neoversen2.c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
sbgemm_kernel_8x4_neoversen2.c Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
sbgemm_kernel_8x4_neoversen2_impl.c Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
sbgemm_ncopy_4_neoversen2.c Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
sbgemm_ncopy_8_neoversen2.c bugfix for sbgemm_ncopy_8_neoversen2 2022-12-05 05:10:18 -05:00
sbgemm_tcopy_4_neoversen2.c Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2022-11-29 04:46:14 -05:00
sbgemm_tcopy_8_neoversen2.c Improve the performance of sbgemm_tcopy on neoversen2 2022-11-28 04:17:54 -05:00
scal.S
scnrm2_thunderx2t99.c
sgemm_beta.S Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
sgemm_kernel_4x4.S
sgemm_kernel_8x8.S
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 2020-06-10 01:01:16 +08:00
sgemm_kernel_16x4.S
sgemm_kernel_16x4_thunderx2t99.S
sgemm_kernel_sve_v1x8.S add sgemm kernel and copy functions for sgemm and ssymm 2021-11-28 18:12:47 +01:00
sgemm_kernel_sve_v2x8.S revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
sgemm_ncopy_4.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
sgemm_ncopy_8.S
sgemm_tcopy_8.S
sgemm_tcopy_16.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
strmm_kernel_4x4.S
strmm_kernel_8x8.S
strmm_kernel_8x8_cortexa53.S
strmm_kernel_16x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 2021-09-17 09:28:19 +02:00
strmm_kernel_sve_v1x8.S strmm sve v1x8 kernel 2021-12-05 14:03:08 +01:00
sum.S
swap.S
swap_thunderx2t99.S
symm_lcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
symm_ucopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
trmm_lncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
trmm_ltcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
trmm_uncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
trmm_utcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
trsm_kernel_LN_sve.c add sve ztrsm 2022-01-15 22:27:25 +01:00
trsm_kernel_LT_sve.c add sve ztrsm 2022-01-15 22:27:25 +01:00
trsm_kernel_RN_sve.c add sve ztrsm 2022-01-15 22:27:25 +01:00
trsm_kernel_RT_sve.c add sve ztrsm 2022-01-15 22:27:25 +01:00
trsm_lncopy_sve.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2023-07-14 11:06:48 +02:00
trsm_ltcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
trsm_uncopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
trsm_utcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
zamax.S
zasum.S
zasum_thunderx2t99.c
zaxpy.S
zdot.S
zdot_thunderx2t99.c Workaround miscompilation by NVIDIA nvc 2023-08-26 00:30:17 +02:00
zgemm_kernel_4x4.S move alpha to x19/x20 to leave x18 unused for OSX 2021-09-17 09:42:17 +02:00
zgemm_kernel_4x4_cortexa53.c MOD: add comments to a53 zgemm kernel 2021-11-25 22:48:48 +08:00
zgemm_kernel_4x4_thunderx2t99.S
zgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2023-07-27 14:12:20 +01:00
zgemm_ncopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
zgemm_tcopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
zgemv_n.S
zgemv_t.S
zhemm_ltcopy_sve.c Fix ZHEMM copy for SVE 2023-07-27 13:27:28 +01:00
zhemm_utcopy_sve.c Fix ZHEMM copy for SVE 2023-07-27 13:27:28 +01:00
znrm2.S
zrot.S
zscal.S
zsum.S
zsymm_lcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
zsymm_ucopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrmm_kernel_4x4.S Move alphaI to x22 to leave x18 unused (reserved on OSX) 2021-09-17 09:53:18 +02:00
ztrmm_kernel_sve_v1x4.S fix sve ztrmm kernel 2022-01-04 14:42:07 +01:00
ztrmm_lncopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrmm_ltcopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrmm_uncopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrmm_utcopy_sve_v1.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrsm_lncopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrsm_ltcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrsm_uncopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00
ztrsm_utcopy_sve.c Disambiguate whilelt 2023-07-25 20:15:44 +01:00