Commit Graph

648 Commits

Author SHA1 Message Date
Martin Kroeker 3d511f0e66
replace spurious avx512 requirement with fma check 2021-04-26 21:55:30 +02:00
Martin Kroeker 2dfb24730d
Use "old" compute(24) function with clang due to register limitations 2021-04-06 19:58:32 +02:00
Martin Kroeker 7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
2021-03-19 15:22:48 +01:00
Martin Kroeker 0f5e86a0d9
Remove premature entry for DOMATCOPY_RT 2021-03-18 21:53:50 +01:00
Martin Kroeker 7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time 2021-03-18 21:28:19 +01:00
Martin Kroeker 0934568d9c
Move includes under the ifdef for compilers w/o intrinsics support 2021-03-12 12:42:05 +01:00
Martin Kroeker a9f6f7ad39
Remove spurious AVX512 requirement and add AVX2/FMA3 guard 2021-03-06 14:35:49 +01:00
Martin Kroeker 292d1af1a0
Update omatcopy_rt.c 2021-02-24 09:34:14 +01:00
Martin Kroeker 325b398e3c
Update omatcopy_rt.c 2021-02-24 09:13:12 +01:00
Martin Kroeker 6f5667b4d4
Enable optimized S/D OMATCOPY_RT 2021-02-24 09:03:41 +01:00
Martin Kroeker cceeee7806
Add optimized omatcopy_rt 2021-02-24 09:00:54 +01:00
Martin Kroeker 47691c031f
Use Haswell optimizations for Zen as well 2021-02-11 09:26:15 +01:00
Martin Kroeker ce7ddd8921
Use Haswell optimizations for Zen as well 2021-02-11 09:25:36 +01:00
Martin Kroeker 950c047b49
Use Haswell optimizations for Zen as well 2021-02-11 09:24:51 +01:00
Martin Kroeker 46509953a9
Use Haswell optimizations for Zen as well 2021-02-11 09:24:16 +01:00
Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 2021-02-11 09:23:05 +01:00
Martin Kroeker 69a5558203
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
2021-01-23 19:08:05 +01:00
Alex Henrie 202fc9e8ed Fix uninitialized argument value in dasum_k 2021-01-14 19:40:31 -07:00
Chen, Guobing b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel 2021-01-11 02:15:21 +08:00
Martin Kroeker 114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA 2020-12-19 22:15:58 +01:00
Martin Kroeker 441c08c9ff
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
2020-12-04 22:07:16 +01:00
Gengxin Xie 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value 2020-12-02 09:51:52 +08:00
Gengxin Xie b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic 2020-12-01 16:49:26 +08:00
Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 2020-11-13 14:20:52 +08:00
Qiyu8 a87e537b8c modify macro 2020-11-11 15:53:48 +08:00
Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 2020-11-11 15:18:01 +08:00
Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 2020-11-11 14:33:12 +08:00
Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Gengxin Xie 725ffbf041 fix typo 2020-11-05 16:25:17 +08:00
Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
İsmail Dönmez 4a1d00f589
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Bart Oldeman 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker c339c40c01
Silence a redefinition warning 2020-10-15 19:08:12 +02:00
Qiyu8 bfdf4b56da Add double precision universal intrinsics for X86/ARM 2020-10-15 10:29:42 +08:00
Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker 8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 2020-10-13 00:14:29 +02:00
Martin Kroeker 08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker 0c84ffe05f
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Matti Picus 403eb513a0 use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
Martin Kroeker dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:53:50 +02:00
Martin Kroeker fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:42:07 +02:00
Martin Kroeker 68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c 2020-10-11 23:41:13 +02:00
Martin Kroeker d7dd9b396c
Rename shdot.c to sbdot.c 2020-10-11 23:40:43 +02:00
Martin Kroeker 7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug 2020-10-06 21:33:16 +02:00
Matti Picus a5b164946c add fninit to reset fpu registers before assembler routines 2020-10-05 22:13:25 +03:00
Qiyu8 14f7dad3b7 performance improved 2020-09-22 16:52:15 +08:00
Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 2020-09-22 10:38:35 +08:00