Commit Graph

51 Commits

Author SHA1 Message Date
Martin Kroeker
100d94f94e Add ?sum 2019-03-31 13:55:05 +02:00
Martin Kroeker
e3bc83f2a8 Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
2019-03-30 22:26:10 +01:00
Martin Kroeker
0023515733 Typo fix (misplaced parenthesis) 2018-06-03 13:22:59 +02:00
Arjan van de Ven
99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker
7df8c4f76f typo fix 2018-05-31 17:23:08 +02:00
Martin Kroeker
2fc748bf72 Restore optimized swap kernel now that we have a proper fix 2018-05-31 13:41:12 +02:00
Martin Kroeker
d1b7be14aa Handle INCX=0,INCY=0 case
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
2018-05-31 12:52:04 +02:00
Martin Kroeker
28ac9ea5a6 Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker
e7366a4161 Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Denis Steckelmacher
c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Zhang Xianyi
53b6023a6c Fix cmake bug on MSVC 32-bit. 2015-10-26 14:52:13 -05:00
Zhang Xianyi
7df0820160 Use C kernels for s/dgemv on x86. 2015-08-19 08:07:47 -05:00
Zhang Xianyi
1cf2b10224 Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC

?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
wernsaar
0884b73c69 Lapack-test Windows 32bit now error free 2014-07-10 11:01:47 +02:00
wernsaar
9bd9472ae9 Lapack-test: cleanup of x86 32bit KERNEL file 2014-07-09 16:08:19 +02:00
Timothy Gu
6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
793509a3b5 replaced files for sdot, sgemv_n and sgemv_t for bug #348 2014-05-06 15:29:39 +02:00
wernsaar
9423f980f6 modified trsm kernel 2013-12-02 10:08:14 +01:00
wernsaar
c6156b2ef2 added trsm kernels from origin 2013-12-01 22:39:39 +01:00
wernsaar
6216ab8a7e removed obsolete gemm_kernels from haswell branch 2013-11-04 08:33:04 +01:00
Zhang Xianyi
f51a849d91 Merge pull request #278 from wernsaar/haswell
Merge wernsaar's Haswell gemm kernels.
2013-08-17 08:24:37 -07:00
wernsaar
4070d9a123 added dgemm_kernel_16x2_haswell.S 2013-08-15 19:17:20 +02:00
wernsaar
0b90c0ec64 added sgemm_kernel_16x4_haswell.S 2013-08-15 18:46:14 +02:00
Zhang Xianyi
2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
Zhang Xianyi
886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi
fa916a0fac Fixed #238 bug in lsame on x86. 2013-06-28 22:43:41 +08:00
wangqian
6a72840945 Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86. 2013-05-29 13:23:12 +08:00
Zhang Xianyi
5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi
0b08f7479e Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. 2013-01-20 21:22:12 +08:00
Zhang Xianyi
69200884e1 Refs #173. Fixed overflow internal buffer bug of gemv_n on x86 2012-12-25 09:27:49 +08:00
Zhang Xianyi
0d1518add9 Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86 2012-12-25 09:10:17 +08:00
Zhang Xianyi
91ed4e4450 Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel. 2012-12-23 23:14:17 +08:00
Zhang Xianyi
fd3046b32a Refs #173. Fixed overflow internal buffer bug of gemv_t on x86. 2012-12-23 21:47:22 +08:00
Zhang Xianyi
bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi
b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi
2573311308 refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
2012-09-24 20:34:33 +08:00
Zhang Xianyi
d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 2012-05-31 18:17:45 +08:00
Xianyi Zhang
a53c6e2440 Merge branch 'develop' into sandybridge 2012-05-25 23:16:44 +08:00
Xianyi Zhang
5d657c6e67 Fixed #96 a SEGFAULT bug in samax on x86. 2012-04-26 16:50:57 +08:00
Xianyi Zhang
03b0eb19f7 Refs #86. Test alpha=Nan in x86/x86_64 dscale. 2012-04-05 18:16:18 +08:00
Xianyi Zhang
19a48b82cf Init Sandybridge codes based on Nehalem. 2012-03-30 20:01:03 +08:00
unknown
dff146e306 refs #80. Used GEMV SSE2 kernels on x86. 2012-03-19 17:56:22 +08:00
Zhang Xiianyi
7b410b7f0e Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.
In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).
2011-09-14 23:52:51 +08:00
traits
b1fe26c45a refs #55. Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes. 2011-09-06 14:14:07 +08:00
Xianyi
31040e4d80 Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack. 2011-06-03 13:19:54 +08:00
Xianyi
272f62a2b6 Changed movlps macro name in capital in x86/zdot_sse2.S file. 2011-03-03 00:46:39 +08:00
Xianyi
36016fe349 On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
2011-03-02 18:45:43 +08:00
Xianyi
12214e1d0f Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch. 2011-02-23 20:08:34 +08:00
Xianyi
bfaa80c316 fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch. 2011-02-18 03:00:58 +08:00