OpenBLAS

Author	SHA1	Message	Date
Bart Oldeman	5c3169ecd8	dscal: use ymm registers in Haswell microkernel Using 256-bit registers in dscal makes this microkernel consistent with cscal and zscal, and generally doubles performance if the vector fits in L1 cache.	2022-12-01 07:48:05 -05:00
Martin Kroeker	b495e54310	Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966 ) * Tag arguments 0 and 1 as both input and output (see #1964)	2019-01-18 08:11:07 +01:00
Martin Kroeker	2359c7c1a9	Use .p2align instead of .align for portability The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance as observed in #730, #901 and most recently #1470	2018-02-24 17:50:13 +01:00
Werner Saar	02e772c7e4	added optimized dscal kernel for haswell	2015-05-12 17:19:58 +02:00