On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
This commit is contained in:
@@ -1188,8 +1188,8 @@
|
||||
testl $1, N
|
||||
jle .L48
|
||||
|
||||
movlps -16 * SIZE(X), %xmm4
|
||||
movlps -16 * SIZE(Y), %xmm6
|
||||
movlpd -16 * SIZE(X), %xmm4
|
||||
movlpd -16 * SIZE(Y), %xmm6
|
||||
|
||||
pshufd $0x4e, %xmm6, %xmm3
|
||||
mulpd %xmm4, %xmm6
|
||||
|
||||
Reference in New Issue
Block a user