OpenBLAS

Author	SHA1	Message	Date
Arjan van de Ven	32bec8afbb	add a skylakex optimized dgemm beta function	2018-10-06 16:36:26 +00:00
Arjan van de Ven	20c5d668fe	dgemm/avx512 simplify and speed up the 4x4 kernel	2018-10-06 14:12:32 +00:00
Arjan van de Ven	6d43c51ccf	undo slow dgemm/skylake microoptimization the compare is more costly than the work	2018-10-06 14:00:37 +00:00
Arjan van de Ven	d74dc39b0f	Add optimized *copy versions for skylakex Add optimized n/t copy versions for skylakex; in the patch the tcopy is also rewritten using intrinsics; the ncopy file will be worked on in a future commit	2018-10-06 13:51:44 +00:00
Arjan van de Ven	66b43affbc	Add a 24x8 kernel to the skylakex dgemm implementation Minor gains for small matrixes, but at 512x512 and above the gain gets more significant.	2018-10-05 13:22:21 +00:00
Arjan van de Ven	1938819c25	skylake dgemm: Add a 16x8 kernel The next step for the avx512 dgemm code is adding a 16x8 kernel. In the 8x8 kernel, each FMA has a matching load (the broadcast); in the 16x8 kernel we can reuse this load for 2 FMAs, which in turn reduces pressure on the load ports of the CPU and gives a nice performance boost (in the 25% range).	2018-10-05 13:11:35 +00:00
Martin Kroeker	b7496c3638	Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch	2018-10-04 19:14:59 +02:00
Arjan van de Ven	45fe8cb0c5	Create a AVX512 enabled version of DGEMM This patch adds dgemm_kernel_4x8_skylakex.c which is * dgemm_kernel_4x8_haswell.s converted to C + intrinsics * 8x8 support added * 8x8 kernel implemented using AVX512 Performance is a work in progress, but already shows a 10% - 20% increase for a wide range of matrix sizes.	2018-10-03 14:45:25 +00:00
Martin Kroeker	544b069e85	Merge pull request #1780 from martin-frbg/issue1774-2 Convert fldmia/fstmia instructions to UAL syntax for clang7	2018-09-29 09:27:47 +02:00
Martin Kroeker	9b2a7ad40d	Convert fldmia/fstmia instructions to UAL syntax for clang7 second part of fix for #1774, containing files missed in #1775	2018-09-28 23:05:15 +02:00
fengruilin	6fc85a6359	test_axpy work error on LOONGSON3A platform #1777	2018-09-26 15:14:04 +08:00
Martin Kroeker	7e5df34e6a	Convert fldmia/fstmia instructions to UAL syntax for clang7 fixes #1774	2018-09-25 09:41:58 +02:00
Andrew	1e531701b7	fix small typo	2018-09-09 16:52:25 +02:00
Martin Kroeker	ba4f433321	Merge pull request #1749 from martin-frbg/issue1531 Fix ARMV8 cross-compilation for IOS	2018-09-07 11:02:01 +02:00
Martin Kroeker	1cb7b9015e	Conditional compilation of assembly files that IOS does not like	2018-09-04 11:06:51 +02:00
Martin Kroeker	a4bd41e9f2	Fix paths to C kernels for nrm2	2018-09-04 10:51:19 +02:00
Martin Kroeker	e11126b26a	Merge pull request #1745 from martin-frbg/issue1743 Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…	2018-08-29 07:43:58 +02:00
Martin Kroeker	f3fd44a731	Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic fixes #1743	2018-08-28 21:34:07 +02:00
Martin Kroeker	375dff54fc	Merge pull request #1733 from fenrus75/dsymv Add an AVX512 enabled DSYMV (L) function	2018-08-12 18:18:36 +02:00
Martin Kroeker	a5f165275a	Merge pull request #1732 from fenrus75/dgemv Add an AVX512 enabled DGEMV (n) function	2018-08-12 18:17:42 +02:00
Martin Kroeker	8c13aa495a	Merge pull request #1730 from fenrus75/fix-sdot Fix typo in sdot function	2018-08-12 18:17:01 +02:00
Arjan van de Ven	9bec34cb67	Add an AVX512 enabled DSYMV (L) function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:46:24 +00:00
Arjan van de Ven	87bebdbd8a	Add an AVX512 enabled DGEMV (n) function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:38:12 +00:00
Arjan van de Ven	36add7570a	Fix typo in sdot function it looks like my previous pull request was short the final commit; fix a typo in sdot	2018-08-11 17:16:45 +00:00
Arjan van de Ven	cacacc8007	Add an AVX512 enabled DSCAL function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:14:57 +00:00
Martin Kroeker	1a00ef3d27	Merge pull request #1725 from fenrus75/axpy Add a AVX512 enabled SAXPY/DAXPY functions	2018-08-11 11:01:20 +02:00
Arjan van de Ven	2e99873ff7	Add a AVX512 enabled SAXPY/DAXPY functions written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-10 02:58:32 +00:00
Arjan van de Ven	00abaa865b	Add an AVX512 enabled SDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-10 02:33:43 +00:00
Arjan van de Ven	7932ff3ea9	Add an AVX512 enabled DDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-09 03:55:52 +00:00
Martin Kroeker	4e103c822c	typo fix	2018-07-16 12:56:39 +02:00
Martin Kroeker	d2142760e0	Fix precision problem in DSDOT	2018-07-15 17:11:40 +02:00
Martin Kroeker	2fbfc64da8	Use C kernels for default c/zAXPY, xROT, c/zSWAP	2018-07-15 17:09:55 +02:00
Martin Kroeker	ba8388cee0	Merge pull request #1651 from martin-frbg/avx512-nodgemm Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:48:03 +02:00
Martin Kroeker	6e54b0a027	Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:31:06 +02:00
Martin Kroeker	40c8cbc3bf	Merge pull request #1650 from martin-frbg/avx512-nodgemm Disable the AVX512 DGEMM kernel for now	2018-06-30 13:05:46 +02:00
Martin Kroeker	f0a8dc2eec	Disable the AVX512 DGEMM kernel for now due to #1643	2018-06-30 11:34:48 +02:00
Martin Kroeker	b83e4c60c7	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:46:42 +02:00
Martin Kroeker	e344db269b	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:45:57 +02:00
Martin Kroeker	545b82efd3	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:45:00 +02:00
Martin Kroeker	e322a951fe	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:44:13 +02:00
Martin Kroeker	c628c6fa59	Merge pull request #1612 from oon3m0oo/cpus Fixed a few more unnecessary calls to num_cpu_avail.	2018-06-14 16:51:31 +02:00
Martin Kroeker	6f71c0fce4	Return a somewhat sane default value for L2 cache size if cpuid retur… (#1611 ) * Return a somewhat sane default value for L2 cache size if cpuid returned something unexpected Fixes #1610, the KVM hypervisor on Google Chromebooks returning zero for CPUID 0x80000006, causing DYNAMIC_ARCH builds of OpenBLAS to hang	2018-06-11 13:26:19 +02:00
Craig Donner	c2545b0fd6	Fixed a few more unnecessary calls to num_cpu_avail. I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.	2018-06-11 10:17:16 +01:00
Arjan van de Ven	89372e0993	Use AVX512 also for DGEMM this required switching to the generic gemm_beta code (which is faster anyway on SKX) for both DGEMM and SGEMM Performance for the not-retuned version is in the 30% range	2018-06-03 22:17:27 +00:00
Martin Kroeker	0023515733	Typo fix (misplaced parenthesis)	2018-06-03 13:22:59 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	8562d5787a	Merge pull request #1583 from martin-frbg/issue1575 Handle INCX=0,INCY=0 case	2018-05-31 21:55:26 +02:00
Martin Kroeker	7df8c4f76f	typo fix	2018-05-31 17:23:08 +02:00
Martin Kroeker	2fc748bf72	Restore optimized swap kernel now that we have a proper fix	2018-05-31 13:41:12 +02:00
Martin Kroeker	d1b7be14aa	Handle INCX=0,INCY=0 case Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.	2018-05-31 12:52:04 +02:00

1 2 3 4 5 ...

1026 Commits