On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
This commit is contained in:
parent
44acb7503e
commit
36016fe349
|
@ -13,6 +13,9 @@ common:
|
|||
* Imported GotoBLAS2 1.13 BSD version
|
||||
|
||||
x86/x86 64:
|
||||
* On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps
|
||||
in zdot_sse2.S line 191. This would casue zdotu & zdotc failures.
|
||||
Instead,Walk around it. (Refs issue #8 #9 on github)
|
||||
* Modified ?axpy functions to return same netlib BLAS results
|
||||
when incx==0 or incy==0 (Refs issue #7 on github)
|
||||
* Modified ?swap functions to return same netlib BLAS results
|
||||
|
|
|
@ -1188,8 +1188,8 @@
|
|||
testl $1, N
|
||||
jle .L48
|
||||
|
||||
movlps -16 * SIZE(X), %xmm4
|
||||
movlps -16 * SIZE(Y), %xmm6
|
||||
movlpd -16 * SIZE(X), %xmm4
|
||||
movlpd -16 * SIZE(Y), %xmm6
|
||||
|
||||
pshufd $0x4e, %xmm6, %xmm3
|
||||
mulpd %xmm4, %xmm6
|
||||
|
|
Loading…
Reference in New Issue