OpenBLAS/kernel/x86_64
Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
..
KERNEL Merge pull request #2890 from martin-frbg/s-d-sum 2020-10-14 09:02:03 +02:00
KERNEL.ATOM Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.BARCELONA Bugfix for ztrmv 2016-03-07 09:39:34 +01:00
KERNEL.BOBCAT Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.BULLDOZER Add trivially optimized dsdot based on sdot 2017-11-24 20:02:28 +01:00
KERNEL.COOPERLAKE Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
KERNEL.CORE2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.DUNNINGTON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.EXCAVATOR Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00
KERNEL.HASWELL Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic 2020-08-31 11:44:08 +08:00
KERNEL.NANO Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.NEHALEM Add trivially optimized dsdot based on sdot 2017-11-24 19:59:28 +01:00
KERNEL.OPTERON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.OPTERON_SSE3 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.PENRYN Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.PILEDRIVER Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
KERNEL.PRESCOTT fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
KERNEL.SANDYBRIDGE Add trivially optimized dsdot based on sdot 2017-11-24 20:00:23 +01:00
KERNEL.SKYLAKEX AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
KERNEL.STEAMROLLER Add trivially optimized dsdot based on sdot 2017-11-24 20:01:42 +01:00
KERNEL.ZEN Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
KERNEL.generic Add ?sum definitions for generic kernel 2019-03-31 13:55:49 +02:00
Makefile Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
amax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
amax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
asum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
bf16to.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
builtin_stinit.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cabs.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
caxpy.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
caxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
cdot.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
cdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_haswell-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cgemm3m_kernel_8x4_haswell.c Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
cgemm_kernel_4x2_bulldozer.S bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
cgemm_kernel_4x2_piledriver.S bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
cgemm_kernel_4x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
cgemm_kernel_8x2_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
cgemm_kernel_8x2_haswell.c Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
cgemm_kernel_8x2_sandy.S optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
cgemm_kernel_8x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
cgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_n_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
cgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
cgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_t_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
cgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
copy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
copy_sse.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
copy_sse2.S Convert aligned moves to unaligned 2020-04-13 14:58:52 +02:00
cscal.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
cscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
ctrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
dasum.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
dasum_microk_haswell-2.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
dasum_microk_skylakex-2.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
daxpy.c Add double precision universal intrinsics for X86/ARM 2020-10-15 10:29:42 +08:00
daxpy_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
daxpy_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
daxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_piledriver-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_skylakex-2.c Add a AVX512 enabled SAXPY/DAXPY functions 2018-08-10 02:58:32 +00:00
daxpy_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
dcopy_bulldozer.S added dcopy_bulldozer.S 2013-06-21 16:06:51 +02:00
ddot.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
ddot_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ddot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_skylakex-2.c Add an AVX512 enabled DDOT function 2018-08-09 03:55:52 +00:00
ddot_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemm_beta_skylakex.c Fix thinko in skylake beta handling 2018-12-24 18:49:50 +00:00
dgemm_kernel_4x4_haswell.S small optimization on dgemm_kernel for N=1 2014-12-18 20:35:51 +01:00
dgemm_kernel_4x8_haswell.S Add files via upload 2019-07-28 07:39:09 +08:00
dgemm_kernel_4x8_sandy.S Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
dgemm_kernel_4x8_skylakex.c Use p2align instead of align for OSX compatibility 2018-12-03 13:06:43 +01:00
dgemm_kernel_4x8_skylakex_2.c Update dgemm_kernel_4x8_skylakex_2.c 2019-11-28 19:56:35 +08:00
dgemm_kernel_6x4_piledriver.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_kernel_8x2_bulldozer.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
dgemm_kernel_8x2_piledriver.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
dgemm_kernel_8x8_skylakex.c Update dgemm_kernel_8x8_skylakex.c 2019-10-18 15:00:17 +08:00
dgemm_kernel_16x2_haswell.S Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
dgemm_kernel_16x2_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
dgemm_kernel_16x2_skylakex.c Add files via upload 2020-06-06 14:56:57 +08:00
dgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_skylakex.c Add vector optimizations for ncopy as well for dgemm/skylakex 2018-10-06 21:18:12 +00:00
dgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_skylakex.c Add optimized *copy versions for skylakex 2018-10-06 13:51:44 +00:00
dgemm_tcopy_16_skylakex.c AVX512 dgemm tcopy_16 function 2020-06-20 00:07:43 +08:00
dgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
dgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
dgemv_n_microk_piledriver-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_skylakex-4.c Add an AVX512 enabled DGEMV (n) function 2018-08-11 17:38:12 +00:00
dgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
dgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dger.c optimized dger kernel for sandybridge 2015-04-28 16:58:11 +02:00
dger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
dot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
dot_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dscal.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
dscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_sandy-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_skylakex-2.c Add an AVX512 enabled DSCAL function 2018-08-11 17:14:57 +00:00
dsymv_L.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
dsymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_skylakex-2.c Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version 2020-05-05 10:44:50 +02:00
dsymv_U.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
dsymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dtobf16_microk_cooperlake.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
dtrmm_kernel_4x8_haswell.c Replace vpermpd with vpermilpd in the Haswell DTRMM kernel 2019-07-28 23:17:28 +02:00
dtrsm_kernel_LN_bulldozer.c Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
dtrsm_kernel_LT_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_haswell.c Replace most vpermpd calls in the Haswell DTRSM_RN kernel 2019-08-03 12:40:13 +02:00
dtrsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
gemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x8_nano.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_sse.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
iamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
iamax_sse.S Silence a redefinition warning 2020-10-15 19:08:12 +02:00
iamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
izamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lsame.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
mcount.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
nrm2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
nrm2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
qconjg.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemm_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LN_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_RT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
rot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sasum.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
sasum_microk_haswell-2.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
sasum_microk_skylakex-2.c align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
saxpy.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
saxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_skylakex-2.c Add a AVX512 enabled SAXPY/DAXPY functions 2018-08-10 02:58:32 +00:00
sbdot.c Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:42:07 +02:00
sbdot_microk_cooperlake.c Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:42:07 +02:00
scal.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
scal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sdot.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
sdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_skylakex-2.c Fix typo in sdot function 2018-08-11 17:16:45 +00:00
sdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sgemm_beta_skylakex.c Fix thinko in skylake beta handling 2018-12-24 18:49:50 +00:00
sgemm_direct_performant.c [WIP] Refactor the driver code for direct SGEMM (#2782) 2020-08-19 14:51:09 +02:00
sgemm_direct_skylakex.c sgemm_direct_skylakex: fix 75eeb26 regression. 2020-10-18 19:58:07 +00:00
sgemm_kernel_8x4_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemm_kernel_8x4_haswell.c Update sgemm_kernel_8x4_haswell.c 2020-02-06 01:47:46 +00:00
sgemm_kernel_8x4_haswell_2.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
sgemm_kernel_8x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
sgemm_kernel_16x2_bulldozer.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
sgemm_kernel_16x2_piledriver.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
sgemm_kernel_16x4_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
sgemm_kernel_16x4_sandy.S Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel. 2015-04-08 03:55:49 +08:00
sgemm_kernel_16x4_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
sgemm_kernel_16x4_skylakex.c make skylakex sgemm code more friendly for readers 2020-01-13 16:28:41 +08:00
sgemm_kernel_16x4_skylakex_2.c AVX512 STRMM kernel 2020-02-16 22:58:00 +08:00
sgemm_kernel_16x4_skylakex_3.c [WIP] Refactor the driver code for direct SGEMM (#2782) 2020-08-19 14:51:09 +02:00
sgemm_ncopy_4_skylakex.c Use sgemm_ncopy_4_skylakex.c also for Haswell 2018-12-15 13:49:19 +00:00
sgemm_tcopy_16_skylakex.c Add a C+intrinsics version of the SGEMM/skylakex kernel 2018-10-10 01:49:22 +00:00
sgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_n.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_n_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
sgemv_n_microk_bulldozer-4.c Fix inline assembly constraints 2019-02-16 18:46:17 +01:00
sgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_n_microk_nehalem-4.c Fix inline assembly constraints 2019-02-16 18:24:11 +01:00
sgemv_n_microk_sandy-4.c Fix inline assembly constraints 2019-02-16 18:36:39 +01:00
sgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_t.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_t_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
sgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
sgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_t_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
sgemv_t_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
sger.c added optimized sger kernel for sandybridge 2015-04-28 15:33:38 +02:00
sger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
ssymv_L.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
ssymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_U.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
ssymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
staticbuffer.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
stobf16_microk_cooperlake.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
strsm_kernel_8x4_haswell_LN.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_LT.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_L_common.h Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_RN.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_RT.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_R_common.h AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_LN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_LT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
sum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
swap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symv_L_sse.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
symv_L_sse2.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
symv_U_sse.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
symv_U_sse2.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
tobf16.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
trsm_kernel_LN_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm3m_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm_kernel_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xtrsm_kernel_LT_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zasum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zaxpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zdot.c Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function 2020-08-23 15:08:16 +02:00
zdot_atom.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_haswell-2.c Replace vpermpd with vpermilpd 2019-07-22 08:28:16 +02:00
zdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zdot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_haswell.c Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
zgemm3m_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_bulldozer.S bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
zgemm_kernel_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_piledriver.S bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
zgemm_kernel_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
zgemm_kernel_4x2_haswell.c Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
zgemm_kernel_4x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
zgemm_kernel_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x4_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
zgemm_ncopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
zgemv_n_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
zgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_4.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
znrm2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
znrm2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zscal.c Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zscal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zsum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zswap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zsymv_L_sse.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zsymv_L_sse2.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zsymv_U_sse.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
zsymv_U_sse2.S Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
ztrsm_kernel_LN_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_LT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00