OpenBLAS/kernel/x86_64
Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
..
KERNEL Remove premature entry for DOMATCOPY_RT 2021-03-18 21:53:50 +01:00
KERNEL.ATOM Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.BARCELONA Bugfix for ztrmv 2016-03-07 09:39:34 +01:00
KERNEL.BOBCAT Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.BULLDOZER Add trivially optimized dsdot based on sdot 2017-11-24 20:02:28 +01:00
KERNEL.COOPERLAKE sbgemm: cooperlake: change kernel size to 16x4 2021-09-07 21:30:45 +08:00
KERNEL.CORE2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.DUNNINGTON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.EXCAVATOR Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00
KERNEL.HASWELL Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. 2022-12-06 14:05:49 -05:00
KERNEL.NANO Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.NEHALEM Add trivially optimized dsdot based on sdot 2017-11-24 19:59:28 +01:00
KERNEL.OPTERON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.OPTERON_SSE3 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.PENRYN Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.PILEDRIVER Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
KERNEL.PRESCOTT fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
KERNEL.SANDYBRIDGE Add trivially optimized dsdot based on sdot 2017-11-24 20:00:23 +01:00
KERNEL.SAPPHIRERAPIDS Compatible with older version of GNU make 2023-05-20 13:58:23 +08:00
KERNEL.SKYLAKEX Add [cz]scal microkernels for SKYLAKEX 2022-11-09 08:57:03 -05:00
KERNEL.STEAMROLLER Add trivially optimized dsdot based on sdot 2017-11-24 20:01:42 +01:00
KERNEL.ZEN Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. 2022-12-06 14:05:49 -05:00
KERNEL.generic Add ?sum definitions for generic kernel 2019-03-31 13:55:49 +02:00
Makefile Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
amax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
amax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
asum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
bf16_common_macros.h x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
bf16to.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
builtin_stinit.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cabs.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
casum.c Fix casum fallback kernel. 2023-11-17 23:53:56 +00:00
casum_microk_skylakex-2.c Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum 2023-11-19 21:28:35 +00:00
caxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
caxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
cdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
cdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_haswell-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cgemm3m_kernel_8x4_haswell.c Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
cgemm_kernel_4x2_bulldozer.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
cgemm_kernel_4x2_piledriver.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
cgemm_kernel_4x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
cgemm_kernel_8x2_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
cgemm_kernel_8x2_haswell.c Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
cgemm_kernel_8x2_sandy.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
cgemm_kernel_8x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
cgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_n_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-20 23:24:52 +02:00
cgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
cgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_t_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-20 23:24:52 +02:00
cgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
copy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
copy_sse.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
copy_sse2.S Convert aligned moves to unaligned 2020-04-13 14:58:52 +02:00
cscal.c Add [cz]scal microkernels for SKYLAKEX 2022-11-09 08:57:03 -05:00
cscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
cscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
ctrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
dasum.c Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 2023-11-04 22:10:06 +01:00
dasum_microk_haswell-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dasum_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
daxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
daxpy_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
daxpy_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
daxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_piledriver-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
daxpy_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
dcopy_bulldozer.S added dcopy_bulldozer.S 2013-06-21 16:06:51 +02:00
ddot.c fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
ddot_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ddot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
ddot_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemm_beta_skylakex.c Fix thinko in skylake beta handling 2018-12-24 18:49:50 +00:00
dgemm_kernel_4x4_haswell.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_4x8_haswell.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_4x8_sandy.S Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
dgemm_kernel_4x8_skylakex.c Use p2align instead of align for OSX compatibility 2018-12-03 13:06:43 +01:00
dgemm_kernel_4x8_skylakex_2.c change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_6x4_piledriver.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_kernel_8x2_bulldozer.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_8x2_piledriver.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_8x8_skylakex.c Update dgemm_kernel_8x8_skylakex.c 2019-10-18 15:00:17 +08:00
dgemm_kernel_16x2_haswell.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
dgemm_kernel_16x2_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
dgemm_kernel_16x2_skylakex.c GEMM: skylake: improve the performance when m is small 2021-04-28 13:56:06 +00:00
dgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_skylakex.c Fix warnings 2022-09-15 09:19:19 +02:00
dgemm_small_kernel_nn_skylakex.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dgemm_small_kernel_nt_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
dgemm_small_kernel_permit_skylakex.c Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel 2021-08-02 07:06:54 +00:00
dgemm_small_kernel_tn_skylakex.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dgemm_small_kernel_tt_skylakex.c Small Matrix: skylakex: fix build error in old compiler 2021-08-05 04:43:47 +00:00
dgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_skylakex.c Add optimized *copy versions for skylakex 2018-10-06 13:51:44 +00:00
dgemm_tcopy_16_skylakex.c Fix build with -Werror=return-type 2020-10-21 08:43:39 +02:00
dgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
dgemv_n_microk_piledriver-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_skylakex-4.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dger.c optimized dger kernel for sandybridge 2015-04-28 16:58:11 +02:00
dger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
dot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
dot_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
drot.c fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
drot_microk_haswell-2.c replace spurious avx512 requirement with fma check 2021-04-26 21:55:30 +02:00
drot_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dscal.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_haswell-2.c dscal: use ymm registers in Haswell microkernel 2022-12-01 07:48:05 -05:00
dscal_microk_sandy-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dsymv_L.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dsymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dsymv_U.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dsymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dtobf16_microk_cooperlake.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
dtrmm_kernel_4x8_haswell.c Replace vpermpd with vpermilpd in the Haswell DTRMM kernel 2019-07-28 23:17:28 +02:00
dtrsm_kernel_LN_bulldozer.c Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
dtrsm_kernel_LT_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_haswell.c Replace most vpermpd calls in the Haswell DTRSM_RN kernel 2019-08-03 12:40:13 +02:00
dtrsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
gemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x8_nano.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_sse.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
iamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
iamax_sse.S Silence a redefinition warning 2020-10-15 19:08:12 +02:00
iamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
izamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lsame.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
mcount.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
nrm2.S Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
nrm2_sse.S Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
omatcopy_rt.c Fix warnings 2022-09-15 09:19:19 +02:00
qconjg.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemm_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LN_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_RT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
rot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sasum.c Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 2023-11-04 22:10:06 +01:00
sasum_microk_haswell-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sasum_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
saxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
saxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sbdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbdot_microk_cooperlake.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sbgemm_block_microk_cooperlake.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
sbgemm_kernel_16x4_cooperlake.c Prevent compiler attempts to use k0 as mask register 2022-02-23 20:12:20 +01:00
sbgemm_kernel_16x16_spr.c sbgemm: spr: kernel handle alpha != 1.0 2021-10-17 19:08:03 -07:00
sbgemm_kernel_16x16_spr_tmpl.c Fix spr sbgemm error 2023-05-19 10:48:18 +08:00
sbgemm_microk_cooperlake_template.c really fix definition of SHUFFLE_MAGIC_NO 2022-02-25 15:36:02 +01:00
sbgemm_ncopy_4_cooperlake.c sbgemm: cooperlake: kernel works for NN 2021-09-07 21:30:45 +08:00
sbgemm_ncopy_16_cooperlake.c Fix non-portable u_int64_t 2022-02-23 20:10:59 +01:00
sbgemm_oncopy_16_spr.c sbgemm: spr: oncopy: use tile load/store instead 2021-10-17 19:08:03 -07:00
sbgemm_otcopy_16_spr.c sbgemm: spr: implement otcopy_16 2021-10-17 19:08:03 -07:00
sbgemm_small_kernel_nn_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_nt_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_permit_cooperlake.c sbgemm: cooperlake: tuning for small matrix 2021-09-07 21:30:46 +08:00
sbgemm_small_kernel_permit_spr.c sbgemm: spr: disable small matrix path by default 2021-10-17 19:08:03 -07:00
sbgemm_small_kernel_template_cooperlake.c sbgemm: cooperlake: make sure hot buffer aligned to 64 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_tn_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_tt_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_tcopy_4_cooperlake.c sbgemm: cooperlake: add n24 kernel for tcopy_4 2021-09-07 21:30:46 +08:00
sbgemm_tcopy_16_cooperlake.c sbgemm: cooperlake: implement tcopy_4 2021-09-07 21:30:46 +08:00
sbgemv_n.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbgemv_n_microk_cooperlake.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sbgemv_n_microk_cooperlake_template.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
sbgemv_t.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbgemv_t_microk_cooperlake.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sbgemv_t_microk_cooperlake_template.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
scal.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
scal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sgemm_beta_skylakex.c sbgemm: cooperlake: add dummy source files 2021-09-07 21:30:45 +08:00
sgemm_direct_performant.c [WIP] Refactor the driver code for direct SGEMM (#2782) 2020-08-19 14:51:09 +02:00
sgemm_direct_skylakex.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sgemm_kernel_8x4_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemm_kernel_8x4_haswell.c Update sgemm_kernel_8x4_haswell.c 2020-02-06 01:47:46 +00:00
sgemm_kernel_8x4_haswell_2.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
sgemm_kernel_8x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
sgemm_kernel_16x2_bulldozer.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
sgemm_kernel_16x2_piledriver.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
sgemm_kernel_16x4_haswell.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
sgemm_kernel_16x4_sandy.S change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
sgemm_kernel_16x4_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
sgemm_kernel_16x4_skylakex.c make skylakex sgemm code more friendly for readers 2020-01-13 16:28:41 +08:00
sgemm_kernel_16x4_skylakex_2.c AVX512 STRMM kernel 2020-02-16 22:58:00 +08:00
sgemm_kernel_16x4_skylakex_3.c Use "old" compute(24) function with clang due to register limitations 2021-04-06 19:58:32 +02:00
sgemm_ncopy_4_skylakex.c Use sgemm_ncopy_4_skylakex.c also for Haswell 2018-12-15 13:49:19 +00:00
sgemm_small_kernel_nn_skylakex.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sgemm_small_kernel_nt_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
sgemm_small_kernel_permit_skylakex.c Small Matrix: skylakex: add sgemm tt kernel 2021-08-02 07:06:54 +00:00
sgemm_small_kernel_tn_skylakex.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sgemm_small_kernel_tt_skylakex.c Small Matrix: skylakex: fix build error in old compiler 2021-08-05 04:43:47 +00:00
sgemm_tcopy_16_skylakex.c Add a C+intrinsics version of the SGEMM/skylakex kernel 2018-10-10 01:49:22 +00:00
sgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_n.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_n_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:09 +02:00
sgemv_n_microk_bulldozer-4.c Fix inline assembly constraints 2019-02-16 18:46:17 +01:00
sgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_n_microk_nehalem-4.c Fix inline assembly constraints 2019-02-16 18:24:11 +01:00
sgemv_n_microk_sandy-4.c Fix inline assembly constraints 2019-02-16 18:36:39 +01:00
sgemv_n_microk_skylakex-8.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_t.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_t_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:42:55 +02:00
sgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
sgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_t_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
sgemv_t_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
sgemv_t_microk_skylakex.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sgemv_t_microk_skylakex_template.c sgemv: skylakex: fix build warning 2021-08-25 07:13:00 +00:00
sger.c added optimized sger kernel for sandybridge 2015-04-28 15:33:38 +02:00
sger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
srot.c fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
srot_microk_haswell-2.c Remove spurious AVX512 requirement and add AVX2/FMA3 guard 2021-03-06 14:35:49 +01:00
srot_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
sscal.c Add sscal.c + microkernels for Haswell, Zen, Skylake and newer. 2022-12-06 14:05:49 -05:00
sscal_microk_haswell-2.c Fix typo in clobber list, should be xmm14 instead of ymm14. 2022-12-06 16:30:46 -05:00
sscal_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
ssymv_L.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:43:43 +02:00
ssymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_U.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:44:15 +02:00
ssymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
staticbuffer.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
stobf16_microk_cooperlake.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
strsm_kernel_8x4_haswell_LN.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_LT.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_L_common.h Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_RN.c change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
strsm_kernel_8x4_haswell_RT.c change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
strsm_kernel_8x4_haswell_R_common.h change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
strsm_kernel_LN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_LT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
sum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
swap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symv_L_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_L_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_U_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_U_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
tobf16.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
trsm_kernel_LN_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm3m_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm_kernel_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xtrsm_kernel_LT_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zasum.c Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 2023-11-04 22:10:06 +01:00
zasum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_microk_skylakex-2.c Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum 2023-11-19 21:28:35 +00:00
zasum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zaxpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_sse2.S use shortcut only when both incx and incy are zero 2023-08-04 12:25:34 +02:00
zcopy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zdot.c fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
zdot_atom.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_haswell-2.c Replace vpermpd with vpermilpd 2019-07-22 08:28:16 +02:00
zdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zdot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_haswell.c Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
zgemm3m_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_bulldozer.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
zgemm_kernel_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_piledriver.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
zgemm_kernel_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_haswell.S change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
zgemm_kernel_4x2_haswell.c Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
zgemm_kernel_4x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
zgemm_kernel_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x4_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
zgemm_ncopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:14 +02:00
zgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
zgemv_n_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
zgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_4.c Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:44 +02:00
zgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
znrm2.S Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
znrm2_sse.S Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
zrot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zscal.c Add [cz]scal microkernels for SKYLAKEX 2022-11-09 08:57:03 -05:00
zscal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_skylakex-2.c Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
zscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zsum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zswap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zsymv_L_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zsymv_L_sse2.S fix unsafe read of Y in assembly kernel 2022-03-11 11:56:33 -06:00
zsymv_U_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zsymv_U_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
ztrsm_kernel_LN_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_LT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00