Martin Kroeker
|
4255a58cd2
|
Rename operands to put lda on the input/output constraint list
|
2019-02-15 10:10:04 +01:00 |
Martin Kroeker
|
46e415b140
|
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
|
2019-02-14 22:43:18 +01:00 |
Martin Kroeker
|
723f396a20
|
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
|
2017-12-29 23:56:41 +01:00 |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
2016-02-19 17:56:07 -05:00 |
Werner Saar
|
bc5fff7085
|
changed inline assembler labels to short form
|
2014-12-07 12:38:54 +01:00 |
wernsaar
|
7c0a94ff47
|
bugfix in sgemv_n_microk_haswell-4.c
|
2014-09-08 10:54:33 +02:00 |
wernsaar
|
cbbc80aad3
|
added optimized sgemv_t kernel for haswell
|
2014-09-08 10:13:39 +02:00 |
wernsaar
|
cf5544b417
|
optimization for small size
|
2014-09-06 13:17:56 +02:00 |
wernsaar
|
d143f84dd2
|
added optimized sgemv_n kernel for haswell
|
2014-09-06 12:08:48 +02:00 |