Commit Graph

10 Commits

Author SHA1 Message Date
Bart Oldeman 5ceca1a4d8 Add sscal.c + microkernels for Haswell, Zen, Skylake and newer.
Unlike [dcz]scal, sscal still used the original GotoBLAS SSE code from scal_sse.S.
This code follows dscal as closely as possible, except for the inc_x > 1 code
for which a plain C loop is used much like the one in cscal.c, instead of an
adaptation of the SSE2 asm code of dscal.c (I tried but the performance wasn't
better than the plain C loop).
2022-12-06 14:05:49 -05:00
Martin Kroeker db348dcff2
Enable optimized srot/drot kernels from Haswell 2021-02-11 09:23:05 +01:00
wjc404 cdc0e9011e
Update KERNEL.ZEN 2020-03-16 16:39:37 +00:00
wjc404 a2ff577a30
Update KERNEL.ZEN 2020-02-22 23:39:43 +08:00
wjc404 92b10212de
optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404 f60840c420
Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404 3a66c8cac1
Update KERNEL.ZEN 2019-12-27 18:04:08 +08:00
wjc404 2cd9306bb5
Update KERNEL.ZEN 2019-12-23 23:42:30 +08:00
Martin Kroeker c92cd6d162
Add trivially optimized dsdot based on sdot 2017-11-24 20:05:27 +01:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00