OpenBLAS/kernel/x86_64
Bart Oldeman e7e3aa2948 x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.

Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.

This uses the same syntax as used in 22aa81f for zarch (s390x).
2022-10-27 18:16:43 -04:00
..
KERNEL Remove premature entry for DOMATCOPY_RT 2021-03-18 21:53:50 +01:00
KERNEL.ATOM Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.BARCELONA Bugfix for ztrmv 2016-03-07 09:39:34 +01:00
KERNEL.BOBCAT Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.BULLDOZER Add trivially optimized dsdot based on sdot 2017-11-24 20:02:28 +01:00
KERNEL.COOPERLAKE sbgemm: cooperlake: change kernel size to 16x4 2021-09-07 21:30:45 +08:00
KERNEL.CORE2 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.DUNNINGTON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.EXCAVATOR Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00
KERNEL.HASWELL Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
KERNEL.NANO Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.NEHALEM Add trivially optimized dsdot based on sdot 2017-11-24 19:59:28 +01:00
KERNEL.OPTERON Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.OPTERON_SSE3 Fixed #395. Enable optimized cgemm for Sandybridge. Added optimized sdot kernel. 2014-06-29 10:34:51 +08:00
KERNEL.PENRYN Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
KERNEL.PILEDRIVER Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
KERNEL.PRESCOTT fallback to zgemm_kernel_4x2_sse.S 2014-07-06 11:05:28 +02:00
KERNEL.SANDYBRIDGE Add trivially optimized dsdot based on sdot 2017-11-24 20:00:23 +01:00
KERNEL.SAPPHIRERAPIDS sbgemm: spr: disable small matrix path by default 2021-10-17 19:08:03 -07:00
KERNEL.SKYLAKEX Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH" 2022-05-20 11:23:30 +02:00
KERNEL.STEAMROLLER Add trivially optimized dsdot based on sdot 2017-11-24 20:01:42 +01:00
KERNEL.ZEN Enable optimized srot/drot kernels from Haswell 2021-02-11 09:23:05 +01:00
KERNEL.generic Add ?sum definitions for generic kernel 2019-03-31 13:55:49 +02:00
Makefile Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
amax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
amax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
amax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
asum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
asum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
bf16_common_macros.h x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
bf16to.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
builtin_stinit.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cabs.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
casum.c fix function typecast 2021-12-24 20:00:50 +01:00
casum_microk_skylakex-2.c Initialize abs_mask1 with itself to silence a gcc warning 2021-09-15 22:10:43 +02:00
caxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
caxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
caxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
cdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
cdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_haswell-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
cgemm3m_kernel_8x4_haswell.c Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
cgemm_kernel_4x2_bulldozer.S bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
cgemm_kernel_4x2_piledriver.S bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
cgemm_kernel_4x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
cgemm_kernel_8x2_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
cgemm_kernel_8x2_haswell.c Update cgemm_kernel_8x2_haswell.c 2020-02-27 22:26:15 +08:00
cgemm_kernel_8x2_sandy.S optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
cgemm_kernel_8x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
cgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_n_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
cgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
cgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cgemv_t_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
cgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
cgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
copy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
copy_sse.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
copy_sse2.S Convert aligned moves to unaligned 2020-04-13 14:58:52 +02:00
cscal.c x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal. 2022-10-27 18:16:43 -04:00
cscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
cscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
ctrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ctrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
dasum.c fix function typecasts 2021-12-21 18:45:28 +01:00
dasum_microk_haswell-2.c Add casts 2021-09-11 13:38:28 +02:00
dasum_microk_skylakex-2.c Add casts 2021-09-14 21:41:53 +02:00
daxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
daxpy_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
daxpy_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
daxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_piledriver-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
daxpy_microk_skylakex-2.c Add a AVX512 enabled SAXPY/DAXPY functions 2018-08-10 02:58:32 +00:00
daxpy_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
dcopy_bulldozer.S added dcopy_bulldozer.S 2013-06-21 16:06:51 +02:00
ddot.c fix function typecasts 2021-12-21 18:45:28 +01:00
ddot_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ddot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
ddot_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
ddot_microk_skylakex-2.c Add an AVX512 enabled DDOT function 2018-08-09 03:55:52 +00:00
ddot_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemm_beta_skylakex.c Fix thinko in skylake beta handling 2018-12-24 18:49:50 +00:00
dgemm_kernel_4x4_haswell.S small optimization on dgemm_kernel for N=1 2014-12-18 20:35:51 +01:00
dgemm_kernel_4x8_haswell.S Add files via upload 2019-07-28 07:39:09 +08:00
dgemm_kernel_4x8_sandy.S Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
dgemm_kernel_4x8_skylakex.c Use p2align instead of align for OSX compatibility 2018-12-03 13:06:43 +01:00
dgemm_kernel_4x8_skylakex_2.c Update dgemm_kernel_4x8_skylakex_2.c 2019-11-28 19:56:35 +08:00
dgemm_kernel_6x4_piledriver.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_kernel_8x2_bulldozer.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
dgemm_kernel_8x2_piledriver.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
dgemm_kernel_8x8_skylakex.c Update dgemm_kernel_8x8_skylakex.c 2019-10-18 15:00:17 +08:00
dgemm_kernel_16x2_haswell.S Refs #330. Fixed the compatible issue with clang on Mac OSX. 2013-12-16 20:31:17 +08:00
dgemm_kernel_16x2_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
dgemm_kernel_16x2_skylakex.c GEMM: skylake: improve the performance when m is small 2021-04-28 13:56:06 +00:00
dgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_ncopy_8_skylakex.c Fix warnings 2022-09-15 09:19:19 +02:00
dgemm_small_kernel_nn_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
dgemm_small_kernel_nt_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
dgemm_small_kernel_permit_skylakex.c Small Matrix: skylakex: add DGEMM_SMALL_M_PERMIT and tune for TN kernel 2021-08-02 07:06:54 +00:00
dgemm_small_kernel_tn_skylakex.c Fix compilation of Skylake AVX512 kernels with GCC 6 2022-02-23 22:51:59 +00:00
dgemm_small_kernel_tt_skylakex.c Small Matrix: skylakex: fix build error in old compiler 2021-08-05 04:43:47 +00:00
dgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemm_tcopy_8_skylakex.c Add optimized *copy versions for skylakex 2018-10-06 13:51:44 +00:00
dgemm_tcopy_16_skylakex.c Fix build with -Werror=return-type 2020-10-21 08:43:39 +02:00
dgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
dgemv_n_microk_piledriver-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dgemv_n_microk_skylakex-4.c Add an AVX512 enabled DGEMV (n) function 2018-08-11 17:38:12 +00:00
dgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_4.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
dger.c optimized dger kernel for sandybridge 2015-04-28 16:58:11 +02:00
dger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
dot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
dot_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
drot.c fix function typecasts 2021-12-21 18:45:28 +01:00
drot_microk_haswell-2.c replace spurious avx512 requirement with fma check 2021-04-26 21:55:30 +02:00
drot_microk_skylakex-2.c Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
dscal.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_sandy-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
dscal_microk_skylakex-2.c Add an AVX512 enabled DSCAL function 2018-08-11 17:14:57 +00:00
dsymv_L.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dsymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
dsymv_L_microk_skylakex-2.c Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version 2020-05-05 10:44:50 +02:00
dsymv_U.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
dsymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dsymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
dtobf16_microk_cooperlake.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
dtrmm_kernel_4x8_haswell.c Replace vpermpd with vpermilpd in the Haswell DTRMM kernel 2019-07-28 23:17:28 +02:00
dtrsm_kernel_LN_bulldozer.c Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
dtrsm_kernel_LT_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_8x2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
dtrsm_kernel_RN_haswell.c Replace most vpermpd calls in the Haswell DTRSM_RN kernel 2019-08-03 12:40:13 +02:00
dtrsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
gemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_4x8_nano.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_kernel_8x4_sse.S Fix crash in sgemm SSE/nano kernel on x86_64 2019-03-07 16:55:13 +01:00
gemm_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_ncopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_2_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gemm_tcopy_4_opteron.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
iamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
iamax_sse.S Silence a redefinition warning 2020-10-15 19:08:12 +02:00
iamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
izamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
izamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lsame.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
mcount.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
nrm2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
nrm2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
omatcopy_rt.c Fix warnings 2022-09-15 09:19:19 +02:00
qconjg.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemm_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LN_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_LT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
qtrsm_kernel_RT_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
rot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sasum.c fix function typecasts 2021-12-21 18:45:28 +01:00
sasum_microk_haswell-2.c Add casts 2021-09-11 13:38:28 +02:00
sasum_microk_skylakex-2.c Add casts 2021-09-14 21:41:53 +02:00
saxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
saxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_piledriver-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
saxpy_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
saxpy_microk_skylakex-2.c Add a AVX512 enabled SAXPY/DAXPY functions 2018-08-10 02:58:32 +00:00
sbdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbdot_microk_cooperlake.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
sbgemm_block_microk_cooperlake.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
sbgemm_kernel_16x4_cooperlake.c Prevent compiler attempts to use k0 as mask register 2022-02-23 20:12:20 +01:00
sbgemm_kernel_16x16_spr.c sbgemm: spr: kernel handle alpha != 1.0 2021-10-17 19:08:03 -07:00
sbgemm_kernel_16x16_spr_tmpl.c sbgemm: spr: optimization for tmp_c buffer 2021-10-17 19:08:03 -07:00
sbgemm_microk_cooperlake_template.c really fix definition of SHUFFLE_MAGIC_NO 2022-02-25 15:36:02 +01:00
sbgemm_ncopy_4_cooperlake.c sbgemm: cooperlake: kernel works for NN 2021-09-07 21:30:45 +08:00
sbgemm_ncopy_16_cooperlake.c Fix non-portable u_int64_t 2022-02-23 20:10:59 +01:00
sbgemm_oncopy_16_spr.c sbgemm: spr: oncopy: use tile load/store instead 2021-10-17 19:08:03 -07:00
sbgemm_otcopy_16_spr.c sbgemm: spr: implement otcopy_16 2021-10-17 19:08:03 -07:00
sbgemm_small_kernel_nn_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_nt_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_permit_cooperlake.c sbgemm: cooperlake: tuning for small matrix 2021-09-07 21:30:46 +08:00
sbgemm_small_kernel_permit_spr.c sbgemm: spr: disable small matrix path by default 2021-10-17 19:08:03 -07:00
sbgemm_small_kernel_template_cooperlake.c sbgemm: cooperlake: make sure hot buffer aligned to 64 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_tn_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_small_kernel_tt_cooperlake.c sbgemm: cooperlake: enable SBGEMM by small matrix path 2021-08-30 17:40:30 +08:00
sbgemm_tcopy_4_cooperlake.c sbgemm: cooperlake: add n24 kernel for tcopy_4 2021-09-07 21:30:46 +08:00
sbgemm_tcopy_16_cooperlake.c sbgemm: cooperlake: implement tcopy_4 2021-09-07 21:30:46 +08:00
sbgemv_n.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbgemv_n_microk_cooperlake.c Implementation of BF16 based gemv 2020-10-29 02:08:23 +08:00
sbgemv_n_microk_cooperlake_template.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
sbgemv_t.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sbgemv_t_microk_cooperlake.c Implementation of BF16 based gemv 2020-10-29 02:08:23 +08:00
sbgemv_t_microk_cooperlake_template.c x86_64: BFLOAT16: fix build warning 2021-09-28 18:30:06 +08:00
scal.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
scal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
scal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sdot.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_nehalem-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sdot_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sdot_microk_skylakex-2.c Fix typo in sdot function 2018-08-11 17:16:45 +00:00
sdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
sgemm_beta_skylakex.c sbgemm: cooperlake: add dummy source files 2021-09-07 21:30:45 +08:00
sgemm_direct_performant.c [WIP] Refactor the driver code for direct SGEMM (#2782) 2020-08-19 14:51:09 +02:00
sgemm_direct_skylakex.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
sgemm_kernel_8x4_bulldozer.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemm_kernel_8x4_haswell.c Update sgemm_kernel_8x4_haswell.c 2020-02-06 01:47:46 +00:00
sgemm_kernel_8x4_haswell_2.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
sgemm_kernel_8x8_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
sgemm_kernel_16x2_bulldozer.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
sgemm_kernel_16x2_piledriver.S Ref #380: lowered stack usage for piledriver and bulldozer kernels 2014-06-19 14:02:14 +02:00
sgemm_kernel_16x4_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
sgemm_kernel_16x4_sandy.S Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel. 2015-04-08 03:55:49 +08:00
sgemm_kernel_16x4_skylakex.S Use AVX512 also for DGEMM 2018-06-03 22:17:27 +00:00
sgemm_kernel_16x4_skylakex.c make skylakex sgemm code more friendly for readers 2020-01-13 16:28:41 +08:00
sgemm_kernel_16x4_skylakex_2.c AVX512 STRMM kernel 2020-02-16 22:58:00 +08:00
sgemm_kernel_16x4_skylakex_3.c Use "old" compute(24) function with clang due to register limitations 2021-04-06 19:58:32 +02:00
sgemm_ncopy_4_skylakex.c Use sgemm_ncopy_4_skylakex.c also for Haswell 2018-12-15 13:49:19 +00:00
sgemm_small_kernel_nn_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
sgemm_small_kernel_nt_skylakex.c Small Matrix: use proper inline asm input constraint for AVX512 mask 2022-02-28 03:22:31 +00:00
sgemm_small_kernel_permit_skylakex.c Small Matrix: skylakex: add sgemm tt kernel 2021-08-02 07:06:54 +00:00
sgemm_small_kernel_tn_skylakex.c Fix compilation of Skylake AVX512 kernels with GCC 6 2022-02-23 22:51:59 +00:00
sgemm_small_kernel_tt_skylakex.c Small Matrix: skylakex: fix build error in old compiler 2021-08-05 04:43:47 +00:00
sgemm_tcopy_16_skylakex.c Add a C+intrinsics version of the SGEMM/skylakex kernel 2018-10-10 01:49:22 +00:00
sgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_n.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_n_4.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
sgemv_n_microk_bulldozer-4.c Fix inline assembly constraints 2019-02-16 18:46:17 +01:00
sgemv_n_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_n_microk_nehalem-4.c Fix inline assembly constraints 2019-02-16 18:24:11 +01:00
sgemv_n_microk_sandy-4.c Fix inline assembly constraints 2019-02-16 18:36:39 +01:00
sgemv_n_microk_skylakex-8.c optimize on sgemv_n for small n 2021-04-30 12:14:58 -04:00
sgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
sgemv_t.c removed obsolete gemv kernel files 2014-09-14 11:00:53 +02:00
sgemv_t_4.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
sgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
sgemv_t_microk_haswell-4.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
sgemv_t_microk_nehalem-4.c Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
sgemv_t_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
sgemv_t_microk_skylakex.c Optimized sgemv_t for small N based on AVX512 2021-06-08 15:08:28 -04:00
sgemv_t_microk_skylakex_template.c sgemv: skylakex: fix build warning 2021-08-25 07:13:00 +00:00
sger.c added optimized sger kernel for sandybridge 2015-04-28 15:33:38 +02:00
sger_microk_sandy-2.c Fix declaration of input arguments in the Sandybridge GER microkernels (#1967) 2019-01-18 08:11:39 +01:00
srot.c fix function typecasts 2021-12-21 18:45:28 +01:00
srot_microk_haswell-2.c Remove spurious AVX512 requirement and add AVX2/FMA3 guard 2021-03-06 14:35:49 +01:00
srot_microk_skylakex-2.c Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
ssymv_L.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
ssymv_L_microk_bulldozer-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_haswell-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_nehalem-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_L_microk_sandy-2.c Fix declaration of arguments in inline assembly 2019-02-12 16:14:02 +01:00
ssymv_U.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
ssymv_U_microk_bulldozer-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_haswell-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_nehalem-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
ssymv_U_microk_sandy-2.c Fix declaration of assembly arguments in SSYMV and DSYMV microkernels 2019-02-12 16:00:18 +01:00
staticbuffer.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
stobf16_microk_cooperlake.c Add bfloat16 based dot and conversion with single/double 2020-09-04 02:31:25 +08:00
strsm_kernel_8x4_haswell_LN.c Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_LT.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_L_common.h Strip UTF8 byte order marker from source 2020-06-26 09:00:43 +02:00
strsm_kernel_8x4_haswell_RN.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_RT.c AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_8x4_haswell_R_common.h AVX2 STRSM kernel 2020-03-17 00:34:08 +08:00
strsm_kernel_LN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_LT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RN_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
strsm_kernel_RT_bulldozer.c Fix inline assembly constraints in Bulldozer TRSM kernels 2019-02-16 20:06:48 +01:00
sum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
swap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symv_L_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_L_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_U_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
symv_U_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
tobf16.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
trsm_kernel_LN_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LN_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_LT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trsm_kernel_RT_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm3m_kernel_2x2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemm_kernel_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_n.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xgemv_t.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
xtrsm_kernel_LT_1x1.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zamax_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zamax_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zasum.c fix function typecast 2021-12-24 20:01:52 +01:00
zasum_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_microk_skylakex-2.c Initialize abs_mask1 with itself to silence a gcc warning 2021-09-15 22:11:35 +02:00
zasum_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zasum_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy.c initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zaxpy_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_microk_bulldozer-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_haswell-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_sandy-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_microk_steamroller-2.c x86_64: clobber all xmm registers after vzeroupper 2020-10-20 02:16:47 +00:00
zaxpy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zaxpy_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zcopy_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zdot.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
zdot_atom.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zdot_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_haswell-2.c Replace vpermpd with vpermilpd 2019-07-22 08:28:16 +02:00
zdot_microk_sandy-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965) 2019-01-17 23:20:32 +01:00
zdot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zdot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_2x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x2_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_haswell.c Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
zgemm3m_kernel_4x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_4x8_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm3m_kernel_8x4_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_beta.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_bulldozer.S bugfix for bulldozer cgemm-, zgemm- and zgemv-kernel 2014-06-28 12:16:20 +02:00
zgemm_kernel_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_piledriver.S bugfix for piledriver cgemm-, zgemm- and zgemv-kernel 2014-06-28 11:46:58 +02:00
zgemm_kernel_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_barcelona.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_haswell.S modification for clang compiler 2014-08-27 09:00:20 +02:00
zgemm_kernel_4x2_haswell.c Update zgemm_kernel_4x2_haswell.c 2020-02-27 22:25:19 +08:00
zgemm_kernel_4x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_skylakex.c AVX512 CGEMM & ZGEMM kernels 2019-11-11 20:04:52 +08:00
zgemm_kernel_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_kernel_4x4_sandy.S Update organization info. 2014-11-25 15:28:58 +08:00
zgemm_ncopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_ncopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_1.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemm_tcopy_2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_4.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
zgemv_n_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_n_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_n_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
zgemv_n_microk_sandy-4.c Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
zgemv_t.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_4.c Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745) 2022-09-03 15:01:22 +02:00
zgemv_t_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_dup.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zgemv_t_microk_bulldozer-4.c Tag %1 and %2 as both input and output operands 2017-12-31 18:03:36 +01:00
zgemv_t_microk_haswell-4.c Tag %1 and %2 as both input and output 2017-12-29 23:56:41 +01:00
znrm2.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
znrm2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrot_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zscal.c x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal. 2022-10-27 18:16:43 -04:00
zscal_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_microk_bulldozer-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_haswell-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_microk_steamroller-2.c Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966) 2019-01-18 08:11:07 +01:00
zscal_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zscal_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zsum.S use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
zswap.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap_sse2.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
zsymv_L_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zsymv_L_sse2.S fix unsafe read of Y in assembly kernel 2022-03-11 11:56:33 -06:00
zsymv_U_sse.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
zsymv_U_sse2.S initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
ztrsm_kernel_LN_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_LT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x1_atom.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_LT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RN_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00
ztrsm_kernel_RT_1x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_core2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_penryn.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse2.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x2_sse3.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_2x4_nehalem.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_4x2_sse.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztrsm_kernel_RT_bulldozer.c added optimized trsm_kernels 2016-01-05 13:05:05 +01:00