.. |
KERNEL
|
fix assignment of default CSUM kernel
|
2024-02-25 17:57:11 +01:00 |
KERNEL.A64FX
|
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
|
2024-07-16 17:31:33 +09:00 |
KERNEL.ARMV8
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:38:51 +01:00 |
KERNEL.ARMV8SVE
|
Small GEMM for AArch64
|
2024-03-04 15:48:47 +00:00 |
KERNEL.CORTEXA53
|
optimize cgemm on ARM cortex A53 & cortex A55
|
2021-12-12 17:22:52 +08:00 |
KERNEL.CORTEXA55
|
Reduce duplication in kernel definitions
|
2023-12-23 12:39:53 +00:00 |
KERNEL.CORTEXA57
|
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
|
2021-01-12 16:39:35 +01:00 |
KERNEL.CORTEXA72
|
Simplifying ARMv8 build parameters
|
2018-11-19 16:41:49 +00:00 |
KERNEL.CORTEXA73
|
Simplifying ARMv8 build parameters
|
2018-11-19 16:41:49 +00:00 |
KERNEL.CORTEXA76
|
Add support for Cortex-A76
|
2024-04-02 19:41:44 +02:00 |
KERNEL.CORTEXA510
|
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
|
2023-11-03 14:55:31 +01:00 |
KERNEL.CORTEXA710
|
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
|
2023-11-03 14:55:31 +01:00 |
KERNEL.CORTEXX1
|
CortexX1 is ARMV8 like A7x
|
2022-03-28 17:28:29 +02:00 |
KERNEL.CORTEXX2
|
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE
|
2023-11-03 14:55:31 +01:00 |
KERNEL.EMAG8180
|
Add preliminary support for EMAG8180
|
2020-02-19 19:00:28 +01:00 |
KERNEL.FALKOR
|
Simplifying ARMv8 build parameters
|
2018-11-19 16:41:49 +00:00 |
KERNEL.FT2000
|
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
|
2022-03-27 15:29:20 +02:00 |
KERNEL.NEOVERSEN1
|
revert the C/Z NRM2 kernels to the base NEON kernel as well
|
2024-04-12 15:34:04 +02:00 |
KERNEL.NEOVERSEN2
|
Merge pull request #3846 from lilh9598/sbgemm_opt
|
2023-03-26 19:04:57 +02:00 |
KERNEL.NEOVERSEV1
|
Add accumulators to AArch64 GEMV Kernels
|
2024-07-31 13:09:14 +01:00 |
KERNEL.NEOVERSEV2
|
Correctly detect ARM Neoverse V2 CPUs.
|
2024-05-16 09:59:52 +00:00 |
KERNEL.THUNDERX
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:49:39 +01:00 |
KERNEL.THUNDERX2T99
|
Add SVE implementation for sdot/ddot
|
2022-12-01 12:07:50 +00:00 |
KERNEL.THUNDERX3T110
|
Reduce duplication in kernel definitions
|
2023-12-23 12:39:53 +00:00 |
KERNEL.TSV110
|
Add workaround for NVIDIA HPC
|
2021-01-12 16:51:35 +01:00 |
KERNEL.VORTEX
|
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well
|
2021-10-06 11:06:43 +02:00 |
KERNEL.generic
|
Fix MSVC ARM64 build. Add generic kernel for ARM64
|
2022-06-02 16:53:54 +02:00 |
Makefile
|
added experimental support for ARMV8
|
2013-11-24 15:47:00 +01:00 |
amax.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
asum.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
axpy.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
casum.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
casum_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
cgemm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
cgemm_kernel_8x4.S
|
move ALPHA_I out of register 18 (reserved on OSX)
|
2023-04-13 17:59:48 +02:00 |
cgemm_kernel_8x4_cortexa53.c
|
optimize cgemm on ARM cortex A53 & cortex A55
|
2021-12-12 17:22:52 +08:00 |
cgemm_kernel_8x4_thunderx2t99.S
|
Move ALPHA_I out of register 18 (reserved on OSX)
|
2023-04-13 18:00:47 +02:00 |
cgemm_kernel_sve_v1x4.S
|
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
|
2023-07-27 14:12:20 +01:00 |
cgemm_ncopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
cgemm_tcopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
copy.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
copy_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
csum.S
|
Add ARM64 implementations of ?sum
|
2019-03-30 22:13:36 +01:00 |
csum_thunderx2t99.c
|
add csum/zsum kernels (trivially derived from the asum ones)s)
|
2024-02-25 17:55:36 +01:00 |
ctrmm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
ctrmm_kernel_8x4.S
|
Move ALPHA_I out of register 18 (reserved on OSX)
|
2023-04-13 18:03:35 +02:00 |
ctrmm_kernel_sve_v1x4.S
|
add cgemm ctrmm sve kernels
|
2022-01-05 09:09:18 +01:00 |
dasum_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
daxpy_thunderx.c
|
aarch64 fix std=c18 compilation
|
2020-10-03 18:00:34 +03:00 |
daxpy_thunderx2t99.S
|
ARM64: Improve DAXPY for ThunderX2
|
2020-05-07 09:22:50 -07:00 |
ddot_thunderx.c
|
ARM64: Rename kernel files to have consistent naming
|
2017-01-24 14:53:34 +05:30 |
dgemm_beta.S
|
Fix zero initialization for beta=0 case
|
2020-03-31 00:21:02 +02:00 |
dgemm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_kernel_4x4_cortexa53.c
|
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
|
2021-11-18 21:14:43 +08:00 |
dgemm_kernel_4x8.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_kernel_8x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_kernel_8x4_thunderx2t99.S
|
ARM64: Move parameters from parameter.c to param.h
|
2018-10-22 01:45:51 -07:00 |
dgemm_kernel_sve_v1x8.S
|
some clean-up & commentary
|
2021-11-21 14:56:27 +01:00 |
dgemm_kernel_sve_v2x8.S
|
Remove prefetches from SVE kernels
|
2022-12-16 14:43:09 +00:00 |
dgemm_ncopy_4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_ncopy_8.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_small_kernel_nn_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
dgemm_small_kernel_nt_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
dgemm_small_kernel_tn_sve.c
|
Improve TN case with further unrolling
|
2024-09-02 22:22:49 +05:30 |
dgemm_small_kernel_tt_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
dgemm_tcopy_4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dgemm_tcopy_8.S
|
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)
|
2021-09-17 09:18:25 +02:00 |
dot.S
|
ARM64: Fix utest dsdot errors
|
2018-02-27 10:47:55 +00:00 |
dot.c
|
Wrap SVE header with __has_include check
|
2022-12-01 12:07:55 +00:00 |
dot_kernel_asimd.c
|
Add SVE implementation for sdot/ddot
|
2022-12-01 12:07:50 +00:00 |
dot_kernel_sve.c
|
add clobber list
|
2024-06-14 22:07:44 +02:00 |
dot_thunderx.c
|
ARM64: Rename kernel files to have consistent naming
|
2017-01-24 14:53:34 +05:30 |
dtrmm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dtrmm_kernel_4x8.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
dtrmm_kernel_8x4.S
|
Move temp to x21 to leave x18 unused (reserved on OSX)
|
2021-09-17 09:24:11 +02:00 |
dtrmm_kernel_sve_v1x8.S
|
some clean-up & commentary
|
2021-11-21 14:56:27 +01:00 |
dznrm2_thunderx2t99.c
|
remove another early exit for incx < 0
|
2024-03-12 18:47:00 +01:00 |
dznrm2_thunderx2t99_fast.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
gemm_ncopy_complex_sve_v1x4.c
|
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
|
2023-07-27 14:12:20 +01:00 |
gemm_ncopy_sve_v1x8.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
gemm_small_kernel_permit_sve.c
|
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
|
2024-07-18 17:37:18 +01:00 |
gemm_tcopy_complex_sve_v1x4.c
|
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
|
2023-07-27 14:12:20 +01:00 |
gemm_tcopy_sve_v1x8.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
gemv_n.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
gemv_n_sve.c
|
Fix ambiguous error on Mac OS
|
2024-07-25 22:43:13 +09:00 |
gemv_t.S
|
Add accumulators to AArch64 GEMV Kernels
|
2024-07-31 13:09:14 +01:00 |
gemv_t_sve.c
|
Add accumulators to AArch64 GEMV Kernels
|
2024-07-31 13:09:14 +01:00 |
iamax.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
iamax_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
izamax.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
izamax_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
nrm2.S
|
Fix accidental duplication of jump instruction
|
2019-10-08 08:09:26 +02:00 |
rot.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
sasum_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
sbgemm_beta_neoversen2.c
|
neoverse n2 sbgemm: init file
|
2022-06-29 10:14:21 +08:00 |
sbgemm_kernel_8x4_neoversen2.c
|
Change file name to match the norm and delete useless code.
|
2022-10-28 17:09:39 +08:00 |
sbgemm_kernel_8x4_neoversen2_impl.c
|
Change file name to match the norm and delete useless code.
|
2022-10-28 17:09:39 +08:00 |
sbgemm_ncopy_4_neoversen2.c
|
Change file name to match the norm and delete useless code.
|
2022-10-28 17:09:39 +08:00 |
sbgemm_ncopy_8_neoversen2.c
|
bugfix for sbgemm_ncopy_8_neoversen2
|
2022-12-05 05:10:18 -05:00 |
sbgemm_tcopy_4_neoversen2.c
|
Add sbgemm_ncopy_8 and sbgemm_tcopy_4
|
2022-11-29 04:46:14 -05:00 |
sbgemm_tcopy_8_neoversen2.c
|
Improve the performance of sbgemm_tcopy on neoversen2
|
2022-11-28 04:17:54 -05:00 |
scal.S
|
make NAN handling depend on the dummy2 parameter
|
2024-07-17 23:24:19 +02:00 |
scnrm2_thunderx2t99.c
|
remove another early exit for incx < 0
|
2024-03-12 18:49:27 +01:00 |
sgemm_beta.S
|
Fix file permissions (issue 4095)
|
2023-07-23 20:37:07 +02:00 |
sgemm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
sgemm_kernel_8x8.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
sgemm_kernel_8x8_cortexa53.S
|
fix INIT8x4
|
2020-06-10 01:01:16 +08:00 |
sgemm_kernel_16x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
sgemm_kernel_16x4_thunderx2t99.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
sgemm_kernel_sve_v1x8.S
|
add sgemm kernel and copy functions for sgemm and ssymm
|
2021-11-28 18:12:47 +01:00 |
sgemm_kernel_sve_v2x8.S
|
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
|
2023-04-17 14:23:13 +02:00 |
sgemm_ncopy_4.S
|
change line endings from CRLF to LF
|
2022-11-16 22:24:01 +01:00 |
sgemm_ncopy_8.S
|
sgemm copy source init
|
2020-06-04 02:10:45 +08:00 |
sgemm_small_kernel_nn_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
sgemm_small_kernel_nt_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
sgemm_small_kernel_tn_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
sgemm_small_kernel_tt_sve.c
|
Better header guard around bridge
|
2024-07-20 14:39:57 +01:00 |
sgemm_tcopy_8.S
|
sgemm copy source init
|
2020-06-04 02:10:45 +08:00 |
sgemm_tcopy_16.S
|
change line endings from CRLF to LF
|
2022-11-16 22:24:01 +01:00 |
strmm_kernel_4x4.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
strmm_kernel_8x8.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
strmm_kernel_8x8_cortexa53.S
|
use general register to speedup
|
2020-05-20 22:26:58 +08:00 |
strmm_kernel_16x4.S
|
Move temp to x21 to leave x18 unused (reserved on OSX)
|
2021-09-17 09:28:19 +02:00 |
strmm_kernel_sve_v1x8.S
|
strmm sve v1x8 kernel
|
2021-12-05 14:03:08 +01:00 |
sum.S
|
Add ARM64 implementations of ?sum
|
2019-03-30 22:13:36 +01:00 |
swap.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
swap_thunderx2t99.S
|
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
|
2017-02-03 03:55:06 -08:00 |
symm_lcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
symm_ucopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
trmm_lncopy_sve_v1.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
trmm_ltcopy_sve_v1.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
trmm_uncopy_sve_v1.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
trmm_utcopy_sve_v1.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
trsm_kernel_LN_sve.c
|
add sve ztrsm
|
2022-01-15 22:27:25 +01:00 |
trsm_kernel_LT_sve.c
|
add sve ztrsm
|
2022-01-15 22:27:25 +01:00 |
trsm_kernel_RN_sve.c
|
add sve ztrsm
|
2022-01-15 22:27:25 +01:00 |
trsm_kernel_RT_sve.c
|
add sve ztrsm
|
2022-01-15 22:27:25 +01:00 |
trsm_lncopy_sve.c
|
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
|
2023-07-14 11:06:48 +02:00 |
trsm_ltcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
trsm_uncopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
trsm_utcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
zamax.S
|
Fix the functional bugs for zamax.
|
2020-03-09 15:36:50 +08:00 |
zasum.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zasum_thunderx2t99.c
|
Fixed a few more unnecessary calls to num_cpu_avail.
|
2018-06-11 10:17:16 +01:00 |
zaxpy.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zdot.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zdot_thunderx2t99.c
|
Add a clobber list to fix utest errors seen with gcc13 on Apple M
|
2024-06-20 16:19:32 +02:00 |
zgemm_kernel_4x4.S
|
move alpha to x19/x20 to leave x18 unused for OSX
|
2021-09-17 09:42:17 +02:00 |
zgemm_kernel_4x4_cortexa53.c
|
MOD: add comments to a53 zgemm kernel
|
2021-11-25 22:48:48 +08:00 |
zgemm_kernel_4x4_thunderx2t99.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zgemm_kernel_sve_v1x4.S
|
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
|
2023-07-27 14:12:20 +01:00 |
zgemm_ncopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
zgemm_tcopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
zgemv_n.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zgemv_t.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zhemm_ltcopy_sve.c
|
Fix ZHEMM copy for SVE
|
2023-07-27 13:27:28 +01:00 |
zhemm_utcopy_sve.c
|
Fix ZHEMM copy for SVE
|
2023-07-27 13:27:28 +01:00 |
znrm2.S
|
Remove automatic label postfixes from macro included only once
|
2019-10-08 08:37:50 +02:00 |
zrot.S
|
ARM64: Convert all labels to local labels
|
2017-10-24 11:40:05 +00:00 |
zscal.S
|
Fix handling of NAN
|
2024-01-07 17:49:40 +01:00 |
zsum.S
|
Add ARM64 implementations of ?sum
|
2019-03-30 22:13:36 +01:00 |
zsum_thunderx2t99.c
|
add csum/zsum kernels (trivially derived from the asum ones)s)
|
2024-02-25 17:55:36 +01:00 |
zsymm_lcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
zsymm_ucopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrmm_kernel_4x4.S
|
Move alphaI to x22 to leave x18 unused (reserved on OSX)
|
2021-09-17 09:53:18 +02:00 |
ztrmm_kernel_sve_v1x4.S
|
fix sve ztrmm kernel
|
2022-01-04 14:42:07 +01:00 |
ztrmm_lncopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrmm_ltcopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrmm_uncopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrmm_utcopy_sve_v1.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrsm_lncopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrsm_ltcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrsm_uncopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |
ztrsm_utcopy_sve.c
|
Disambiguate whilelt
|
2023-07-25 20:15:44 +01:00 |