OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Bart Oldeman	b073d759d0	x86_64: clobber all xmm registers after vzeroupper As observed using GCC 10 using -march=native -ftree-vectorize on Knights Landing, it is now smart enough to find clobbers inside non-inlined static functions. In particular, sgemv counted on a kernel to preserve the whole %ymm2 register (since it was not in the clobber list), but the top part was destroyed by vzeroupper. This caused many tests to fail. This patch makes sure all xmm (and ymm/zmm by extension) registers are listed as clobbered to avoid this happening, as most kernels already did correctly in fact.	2020-10-20 02:16:47 +00:00
Martin Kroeker	4255a58cd2	Rename operands to put lda on the input/output constraint list	2019-02-15 10:10:04 +01:00
Martin Kroeker	46e415b140	Save and restore input argument 8 (lda4) Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)	2019-02-14 22:43:18 +01:00
Martin Kroeker	723f396a20	Tag %1 and %2 as both input and output The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292	2017-12-29 23:56:41 +01:00
Zhang Xianyi	6e7be06e07	Refs JuliaLang/julia#5728 . Fix gemv performance bug on Haswell Mac OSX. On Mac OS X, it should use .align 4 (equal to .align 16 on Linux). I didn't get the performance benefit from .align. Thus, I deleted it.	2016-02-19 17:56:07 -05:00
Werner Saar	bc5fff7085	changed inline assembler labels to short form	2014-12-07 12:38:54 +01:00
wernsaar	7c0a94ff47	bugfix in sgemv_n_microk_haswell-4.c	2014-09-08 10:54:33 +02:00
wernsaar	cbbc80aad3	added optimized sgemv_t kernel for haswell	2014-09-08 10:13:39 +02:00
wernsaar	cf5544b417	optimization for small size	2014-09-06 13:17:56 +02:00
wernsaar	d143f84dd2	added optimized sgemv_n kernel for haswell	2014-09-06 12:08:48 +02:00

10 Commits