..
KERNEL
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
2019-12-20 10:11:50 +08:00
KERNEL.A64FX
add sve ztrsm
2022-01-15 22:27:25 +01:00
KERNEL.ARMV8
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:38:51 +01:00
KERNEL.ARMV8SVE
update armv8sve + contributors
2022-01-18 08:28:31 +01:00
KERNEL.CORTEXA53
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 17:22:52 +08:00
KERNEL.CORTEXA55
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 17:22:52 +08:00
KERNEL.CORTEXA57
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
2021-01-12 16:39:35 +01:00
KERNEL.CORTEXA72
Simplifying ARMv8 build parameters
2018-11-19 16:41:49 +00:00
KERNEL.CORTEXA73
Simplifying ARMv8 build parameters
2018-11-19 16:41:49 +00:00
KERNEL.CORTEXA510
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
2022-03-27 15:29:20 +02:00
KERNEL.CORTEXA710
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
2022-03-27 15:29:20 +02:00
KERNEL.CORTEXX1
CortexX1 is ARMV8 like A7x
2022-03-28 17:28:29 +02:00
KERNEL.CORTEXX2
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
2022-03-27 15:29:20 +02:00
KERNEL.EMAG8180
Add preliminary support for EMAG8180
2020-02-19 19:00:28 +01:00
KERNEL.FALKOR
Simplifying ARMv8 build parameters
2018-11-19 16:41:49 +00:00
KERNEL.FT2000
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
2022-03-27 15:29:20 +02:00
KERNEL.NEOVERSEN1
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
KERNEL.NEOVERSEN2
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
KERNEL.NEOVERSEV1
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
KERNEL.THUNDERX
Add workaround for NVIDIA HPC
2021-01-12 16:49:39 +01:00
KERNEL.THUNDERX2T99
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
KERNEL.THUNDERX3T110
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
KERNEL.TSV110
Add workaround for NVIDIA HPC
2021-01-12 16:51:35 +01:00
KERNEL.VORTEX
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well
2021-10-06 11:06:43 +02:00
KERNEL.generic
Fix MSVC ARM64 build. Add generic kernel for ARM64
2022-06-02 16:53:54 +02:00
Makefile
added experimental support for ARMV8
2013-11-24 15:47:00 +01:00
amax.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
asum.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
axpy.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
casum.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
casum_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
cgemm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
cgemm_kernel_8x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
cgemm_kernel_8x4_cortexa53.c
optimize cgemm on ARM cortex A53 & cortex A55
2021-12-12 17:22:52 +08:00
cgemm_kernel_8x4_thunderx2t99.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
cgemm_kernel_sve_v1x4.S
add cgemm ctrmm sve kernels
2022-01-05 09:09:18 +01:00
cgemm_ncopy_sve_v1.c
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
cgemm_tcopy_sve_v1.c
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
copy.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
copy_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
csum.S
Add ARM64 implementations of ?sum
2019-03-30 22:13:36 +01:00
ctrmm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
ctrmm_kernel_8x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
ctrmm_kernel_sve_v1x4.S
add cgemm ctrmm sve kernels
2022-01-05 09:09:18 +01:00
dasum_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
daxpy_thunderx.c
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
daxpy_thunderx2t99.S
ARM64: Improve DAXPY for ThunderX2
2020-05-07 09:22:50 -07:00
ddot_thunderx.c
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
dgemm_beta.S
Fix zero initialization for beta=0 case
2020-03-31 00:21:02 +02:00
dgemm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_kernel_4x4_cortexa53.c
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
2021-11-18 21:14:43 +08:00
dgemm_kernel_4x8.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_kernel_8x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_kernel_8x4_thunderx2t99.S
ARM64: Move parameters from parameter.c to param.h
2018-10-22 01:45:51 -07:00
dgemm_kernel_sve_v1x8.S
some clean-up & commentary
2021-11-21 14:56:27 +01:00
dgemm_kernel_sve_v2x8.S
Remove prefetches from SVE kernels
2022-12-16 14:43:09 +00:00
dgemm_ncopy_4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_ncopy_8.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_ncopy_sve_v1.c
some clean-up & commentary
2021-11-21 14:56:27 +01:00
dgemm_tcopy_4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dgemm_tcopy_8.S
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)
2021-09-17 09:18:25 +02:00
dgemm_tcopy_sve_v1.c
some clean-up & commentary
2021-11-21 14:56:27 +01:00
dot.S
ARM64: Fix utest dsdot errors
2018-02-27 10:47:55 +00:00
dot.c
Wrap SVE header with __has_include check
2022-12-01 12:07:55 +00:00
dot_kernel_asimd.c
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
dot_kernel_sve.c
Add SVE implementation for sdot/ddot
2022-12-01 12:07:50 +00:00
dot_thunderx.c
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
dtrmm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dtrmm_kernel_4x8.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
dtrmm_kernel_8x4.S
Move temp to x21 to leave x18 unused (reserved on OSX)
2021-09-17 09:24:11 +02:00
dtrmm_kernel_sve_v1x8.S
some clean-up & commentary
2021-11-21 14:56:27 +01:00
dznrm2_thunderx2t99.c
workaround fault with ssq=inf,scale=0
2022-07-02 23:47:17 +02:00
dznrm2_thunderx2t99_fast.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
gemv_n.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
gemv_t.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
iamax.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
iamax_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
izamax.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
izamax_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
nrm2.S
Fix accidental duplication of jump instruction
2019-10-08 08:09:26 +02:00
rot.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
sasum_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
sbgemm_beta_neoversen2.c
neoverse n2 sbgemm: init file
2022-06-29 10:14:21 +08:00
sbgemm_kernel_8x4_neoversen2.c
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
sbgemm_kernel_8x4_neoversen2_impl.c
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
sbgemm_ncopy_4_neoversen2.c
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
sbgemm_tcopy_8_neoversen2.c
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
scal.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
scnrm2_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
sgemm_beta.S
fix initialization to zero in the NEON SGEMM_BETA kernel as well
2020-03-31 16:53:56 +02:00
sgemm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
sgemm_kernel_8x8.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
sgemm_kernel_8x8_cortexa53.S
fix INIT8x4
2020-06-10 01:01:16 +08:00
sgemm_kernel_16x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
sgemm_kernel_16x4_thunderx2t99.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
sgemm_kernel_sve_v1x8.S
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
sgemm_kernel_sve_v2x8.S
Remove prefetches from SVE kernels
2022-12-16 14:43:09 +00:00
sgemm_ncopy_4.S
change line endings from CRLF to LF
2022-11-16 22:24:01 +01:00
sgemm_ncopy_8.S
sgemm copy source init
2020-06-04 02:10:45 +08:00
sgemm_ncopy_sve_v1.c
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
sgemm_tcopy_8.S
sgemm copy source init
2020-06-04 02:10:45 +08:00
sgemm_tcopy_16.S
change line endings from CRLF to LF
2022-11-16 22:24:01 +01:00
sgemm_tcopy_sve_v1.c
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
strmm_kernel_4x4.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
strmm_kernel_8x8.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
strmm_kernel_8x8_cortexa53.S
use general register to speedup
2020-05-20 22:26:58 +08:00
strmm_kernel_16x4.S
Move temp to x21 to leave x18 unused (reserved on OSX)
2021-09-17 09:28:19 +02:00
strmm_kernel_sve_v1x8.S
strmm sve v1x8 kernel
2021-12-05 14:03:08 +01:00
sum.S
Add ARM64 implementations of ?sum
2019-03-30 22:13:36 +01:00
swap.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
swap_thunderx2t99.S
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
2017-02-03 03:55:06 -08:00
symm_lcopy_sve.c
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
symm_ucopy_sve.c
add sgemm kernel and copy functions for sgemm and ssymm
2021-11-28 18:12:47 +01:00
trmm_lncopy_sve_v1.c
trmm sve copy fucntions for single precision
2021-11-29 21:25:05 +01:00
trmm_ltcopy_sve_v1.c
trmm sve copy fucntions for single precision
2021-11-29 21:25:05 +01:00
trmm_uncopy_sve_v1.c
trmm sve copy fucntions for single precision
2021-11-29 21:25:05 +01:00
trmm_utcopy_sve_v1.c
trmm sve copy fucntions for single precision
2021-11-29 21:25:05 +01:00
trsm_kernel_LN_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_kernel_LT_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_kernel_RN_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_kernel_RT_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_lncopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_ltcopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_uncopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
trsm_utcopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
zamax.S
Fix the functional bugs for zamax.
2020-03-09 15:36:50 +08:00
zasum.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zasum_thunderx2t99.c
Fixed a few more unnecessary calls to num_cpu_avail.
2018-06-11 10:17:16 +01:00
zaxpy.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zdot.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zdot_thunderx2t99.c
Eliminate uses of CREAL on left-hand side of assignments
2022-07-05 00:01:09 +02:00
zgemm_kernel_4x4.S
move alpha to x19/x20 to leave x18 unused for OSX
2021-09-17 09:42:17 +02:00
zgemm_kernel_4x4_cortexa53.c
MOD: add comments to a53 zgemm kernel
2021-11-25 22:48:48 +08:00
zgemm_kernel_4x4_thunderx2t99.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zgemm_kernel_sve_v1x4.S
fix zgemm kernel
2021-12-29 11:42:04 +01:00
zgemm_ncopy_sve_v1.c
modify sve zgemmcopy kernels
2022-01-05 09:07:28 +01:00
zgemm_tcopy_sve_v1.c
modify sve zgemmcopy kernels
2022-01-05 09:07:28 +01:00
zgemv_n.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zgemv_t.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zhemm_ltcopy_sve.c
combine zchemm into single file
2022-01-05 14:42:37 +01:00
zhemm_utcopy_sve.c
combine zchemm into single file
2022-01-05 14:42:37 +01:00
znrm2.S
Remove automatic label postfixes from macro included only once
2019-10-08 08:37:50 +02:00
zrot.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zscal.S
ARM64: Convert all labels to local labels
2017-10-24 11:40:05 +00:00
zsum.S
Add ARM64 implementations of ?sum
2019-03-30 22:13:36 +01:00
zsymm_lcopy_sve.c
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
zsymm_ucopy_sve.c
sve copy functions for cgemm chemm zsymm
2022-01-05 09:12:22 +01:00
ztrmm_kernel_4x4.S
Move alphaI to x22 to leave x18 unused (reserved on OSX)
2021-09-17 09:53:18 +02:00
ztrmm_kernel_sve_v1x4.S
fix sve ztrmm kernel
2022-01-04 14:42:07 +01:00
ztrmm_lncopy_sve_v1.c
ztrmm sve copy functions
2022-01-04 14:40:59 +01:00
ztrmm_ltcopy_sve_v1.c
ztrmm sve copy functions
2022-01-04 14:40:59 +01:00
ztrmm_uncopy_sve_v1.c
ztrmm sve copy functions
2022-01-04 14:40:59 +01:00
ztrmm_utcopy_sve_v1.c
ztrmm sve copy functions
2022-01-04 14:40:59 +01:00
ztrsm_lncopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
ztrsm_ltcopy_sve.c
fix ztrsm lt/ut copy
2022-01-16 21:39:57 +01:00
ztrsm_uncopy_sve.c
add sve ztrsm
2022-01-15 22:27:25 +01:00
ztrsm_utcopy_sve.c
fix ztrsm lt/ut copy
2022-01-16 21:39:57 +01:00