OpenBLAS/kernel/arm64
Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
..
KERNEL declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL 2019-12-20 10:11:50 +08:00
KERNEL.ARMV8 Adapt ARM architect 2020-09-29 16:36:14 +08:00
KERNEL.CORTEXA53 Adapt ARM architect 2020-09-29 16:36:14 +08:00
KERNEL.CORTEXA57 Adapt ARM architect 2020-09-29 16:36:14 +08:00
KERNEL.CORTEXA72 Simplifying ARMv8 build parameters 2018-11-19 16:41:49 +00:00
KERNEL.CORTEXA73 Simplifying ARMv8 build parameters 2018-11-19 16:41:49 +00:00
KERNEL.EMAG8180 Add preliminary support for EMAG8180 2020-02-19 19:00:28 +01:00
KERNEL.FALKOR Simplifying ARMv8 build parameters 2018-11-19 16:41:49 +00:00
KERNEL.NEOVERSEN1 Add Neoverse-N1 core 2020-02-29 03:22:04 +00:00
KERNEL.THUNDERX
KERNEL.THUNDERX2T99
KERNEL.THUNDERX3T110 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
KERNEL.TSV110 update 2020-01-02 11:01:57 +08:00
KERNEL.VORTEX Rename KERNEL.SILICON to KERNEL.VORTEX 2020-09-03 08:44:20 +02:00
Makefile
amax.S
asum.S
axpy.S
casum.S
casum_thunderx2t99.c
cgemm_kernel_4x4.S
cgemm_kernel_8x4.S
cgemm_kernel_8x4_thunderx2t99.S
copy.S
copy_thunderx2t99.c
csum.S Add ARM64 implementations of ?sum 2019-03-30 22:13:36 +01:00
ctrmm_kernel_4x4.S
ctrmm_kernel_8x4.S
dasum_thunderx2t99.c
daxpy_thunderx.c aarch64 fix std=c18 compilation 2020-10-03 18:00:34 +03:00
daxpy_thunderx2t99.S ARM64: Improve DAXPY for ThunderX2 2020-05-07 09:22:50 -07:00
ddot_thunderx.c
dgemm_beta.S Fix zero initialization for beta=0 case 2020-03-31 00:21:02 +02:00
dgemm_kernel_4x4.S
dgemm_kernel_4x8.S
dgemm_kernel_8x4.S
dgemm_kernel_8x4_thunderx2t99.S
dgemm_ncopy_4.S
dgemm_ncopy_8.S
dgemm_tcopy_4.S
dgemm_tcopy_8.S
dot.S
dot_thunderx.c
dot_thunderx2t99.c
dtrmm_kernel_4x4.S
dtrmm_kernel_4x8.S
dtrmm_kernel_8x4.S
dznrm2_thunderx2t99.c
dznrm2_thunderx2t99_fast.c
gemv_n.S
gemv_t.S
iamax.S
iamax_thunderx2t99.c
izamax.S
izamax_thunderx2t99.c
nrm2.S Fix accidental duplication of jump instruction 2019-10-08 08:09:26 +02:00
rot.S
sasum_thunderx2t99.c
scal.S
scnrm2_thunderx2t99.c
sgemm_beta.S fix initialization to zero in the NEON SGEMM_BETA kernel as well 2020-03-31 16:53:56 +02:00
sgemm_kernel_4x4.S
sgemm_kernel_8x8.S
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 2020-06-10 01:01:16 +08:00
sgemm_kernel_16x4.S
sgemm_kernel_16x4_thunderx2t99.S
sgemm_ncopy_4.S Use arm neon instructions to optimize ncopy operation 2019-12-31 17:06:35 +08:00
sgemm_ncopy_8.S sgemm copy source init 2020-06-04 02:10:45 +08:00
sgemm_tcopy_8.S sgemm copy source init 2020-06-04 02:10:45 +08:00
sgemm_tcopy_16.S [WIP] Use arm neon instructions to optimize tcopy operation 2019-12-31 10:21:23 +08:00
strmm_kernel_4x4.S
strmm_kernel_8x8.S
strmm_kernel_8x8_cortexa53.S use general register to speedup 2020-05-20 22:26:58 +08:00
strmm_kernel_16x4.S
sum.S Add ARM64 implementations of ?sum 2019-03-30 22:13:36 +01:00
swap.S
swap_thunderx2t99.S
zamax.S Fix the functional bugs for zamax. 2020-03-09 15:36:50 +08:00
zasum.S
zasum_thunderx2t99.c
zaxpy.S
zdot.S
zdot_thunderx2t99.c
zgemm_kernel_4x4.S
zgemm_kernel_4x4_thunderx2t99.S
zgemv_n.S
zgemv_t.S
znrm2.S Remove automatic label postfixes from macro included only once 2019-10-08 08:37:50 +02:00
zrot.S
zscal.S
zsum.S Add ARM64 implementations of ?sum 2019-03-30 22:13:36 +01:00
ztrmm_kernel_4x4.S