On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.

This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
This commit is contained in:
Xianyi 2011-03-02 18:45:30 +08:00
parent 44acb7503e
commit 36016fe349
2 changed files with 5 additions and 2 deletions

View File

@ -13,6 +13,9 @@ common:
* Imported GotoBLAS2 1.13 BSD version * Imported GotoBLAS2 1.13 BSD version
x86/x86 64: x86/x86 64:
* On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps
in zdot_sse2.S line 191. This would casue zdotu & zdotc failures.
Instead,Walk around it. (Refs issue #8 #9 on github)
* Modified ?axpy functions to return same netlib BLAS results * Modified ?axpy functions to return same netlib BLAS results
when incx==0 or incy==0 (Refs issue #7 on github) when incx==0 or incy==0 (Refs issue #7 on github)
* Modified ?swap functions to return same netlib BLAS results * Modified ?swap functions to return same netlib BLAS results

View File

@ -1188,8 +1188,8 @@
testl $1, N testl $1, N
jle .L48 jle .L48
movlps -16 * SIZE(X), %xmm4 movlpd -16 * SIZE(X), %xmm4
movlps -16 * SIZE(Y), %xmm6 movlpd -16 * SIZE(Y), %xmm6
pshufd $0x4e, %xmm6, %xmm3 pshufd $0x4e, %xmm6, %xmm3
mulpd %xmm4, %xmm6 mulpd %xmm4, %xmm6