Commit Graph

4 Commits

Author SHA1 Message Date
Bart Oldeman 5c3169ecd8 dscal: use ymm registers in Haswell microkernel
Using 256-bit registers in dscal makes this microkernel consistent with
cscal and zscal, and generally doubles performance if the vector fits in
L1 cache.
2022-12-01 07:48:05 -05:00
Martin Kroeker b495e54310
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966)
* Tag arguments 0 and 1 as both input and output (see #1964)
2019-01-18 08:11:07 +01:00
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Werner Saar 02e772c7e4 added optimized dscal kernel for haswell 2015-05-12 17:19:58 +02:00