10 Commits

Author SHA1 Message Date
Bart Oldeman
5ceca1a4d8 Add sscal.c + microkernels for Haswell, Zen, Skylake and newer.
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
2022-12-06 14:05:49 -05:00
Martin Kroeker
db348dcff2 Enable optimized srot/drot kernels from Haswell 2021-02-11 09:23:05 +01:00
wjc404
cdc0e9011e Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
wjc404
a2ff577a30 Update KERNEL.ZEN 2020-02-22 23:39:43 +08:00
wjc404
92b10212de optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404
f60840c420 Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404
3a66c8cac1 Update KERNEL.ZEN 2019-12-27 18:04:08 +08:00
wjc404
2cd9306bb5 Update KERNEL.ZEN 2019-12-23 23:42:30 +08:00
Martin Kroeker
c92cd6d162 Add trivially optimized dsdot based on sdot 2017-11-24 20:05:27 +01:00
Denis Steckelmacher
c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00