OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Ashwin Sekhar T K	d5aeff636f	ARM64: Enable DYNAMIC_ARCH Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid feature in linux kernel to detect the core type at runtime (https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt). If this feature is missing in kernel, then the user should use the OPENBLAS_CORETYPE env variable to select the desired core type.	2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K	e7b66cd36e	ARM64: Fix DYNAMIC_ARCH compilation for cores which dont use GEMM3M	2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K	d50abc8903	ARM64: Move parameters from parameter.c to param.h Remove the runtime setting of P, Q, R parameters for targets ARMV8, THUNDERX2T99. Instead set them as constants in param.h at compile time.	2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K	351a0c777c	ARM64: Remove XGENE1 references Remove XGENE1 target as the implementation for the same is incomplete. Moreover whoever wishes to use on XGENE1 can use the generic ARMV8 target as there are no XGENE1 specific optimizations in OpenBLAS.	2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K	21f46a1cf2	ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Currently the generic ARMV8 target uses C implementations for many routines. Replace these with the neon implementations written for THUNDERX2T99 target which are upto 6x faster for certain routines.	2018-10-17 10:44:37 -07:00
Ashwin Sekhar T K	caf339412f	ARM64: Remove dependency of THUNDERX2T99 Makefile on CORTEXA57 Makefile	2018-10-17 08:02:40 -07:00
Ashwin Sekhar T K	8001fdcd2a	ARM64: Remove dependency of THUNDERX Makefile on ARMV8 Makefile	2018-10-17 08:02:16 -07:00
Ashwin Sekhar T K	162e312832	ARM64: Remove dependency of CORTEXA57 Makefile on ARMV8 Makefile	2018-10-17 08:01:45 -07:00
Ashwin Sekhar T K	c3d93caa8d	ARM64: Remove dependency of XGENE1 Makefile on ARMV8 Makefile	2018-10-17 08:01:27 -07:00
Arjan van de Ven	55b244ca0d	enable the SGEMM/SKX C based kernel In QA the final bug was found so now the sklyakex sgemm C based kernel can be activated....	2018-10-12 09:30:35 +00:00
Arjan van de Ven	d4bad73834	Add a C+intrinsics version of the SGEMM/skylakex kernel for most sizes this is 1.2x to 1.4x faster than the current code	2018-10-10 01:49:22 +00:00
Arjan van de Ven	582c589727	dgemm/skylakex: replace discrete mul/add with fma very minor gains since it's not super hot code, but general principles	2018-10-06 23:13:26 +00:00
Arjan van de Ven	adbf6afa25	Add vector optimizations for ncopy as well for dgemm/skylakex	2018-10-06 21:18:12 +00:00
Arjan van de Ven	32bec8afbb	add a skylakex optimized dgemm beta function	2018-10-06 16:36:26 +00:00
Arjan van de Ven	20c5d668fe	dgemm/avx512 simplify and speed up the 4x4 kernel	2018-10-06 14:12:32 +00:00
Arjan van de Ven	6d43c51ccf	undo slow dgemm/skylake microoptimization the compare is more costly than the work	2018-10-06 14:00:37 +00:00
Arjan van de Ven	d74dc39b0f	Add optimized *copy versions for skylakex Add optimized n/t copy versions for skylakex; in the patch the tcopy is also rewritten using intrinsics; the ncopy file will be worked on in a future commit	2018-10-06 13:51:44 +00:00
Arjan van de Ven	66b43affbc	Add a 24x8 kernel to the skylakex dgemm implementation Minor gains for small matrixes, but at 512x512 and above the gain gets more significant.	2018-10-05 13:22:21 +00:00
Arjan van de Ven	1938819c25	skylake dgemm: Add a 16x8 kernel The next step for the avx512 dgemm code is adding a 16x8 kernel. In the 8x8 kernel, each FMA has a matching load (the broadcast); in the 16x8 kernel we can reuse this load for 2 FMAs, which in turn reduces pressure on the load ports of the CPU and gives a nice performance boost (in the 25% range).	2018-10-05 13:11:35 +00:00
Martin Kroeker	b7496c3638	Function name needs to be CNAME, set from outside to allow suffixing for dynamic_arch	2018-10-04 19:14:59 +02:00
Arjan van de Ven	45fe8cb0c5	Create a AVX512 enabled version of DGEMM This patch adds dgemm_kernel_4x8_skylakex.c which is * dgemm_kernel_4x8_haswell.s converted to C + intrinsics * 8x8 support added * 8x8 kernel implemented using AVX512 Performance is a work in progress, but already shows a 10% - 20% increase for a wide range of matrix sizes.	2018-10-03 14:45:25 +00:00
Martin Kroeker	544b069e85	Merge pull request #1780 from martin-frbg/issue1774-2 Convert fldmia/fstmia instructions to UAL syntax for clang7	2018-09-29 09:27:47 +02:00
Martin Kroeker	9b2a7ad40d	Convert fldmia/fstmia instructions to UAL syntax for clang7 second part of fix for #1774, containing files missed in #1775	2018-09-28 23:05:15 +02:00
fengruilin	6fc85a6359	test_axpy work error on LOONGSON3A platform #1777	2018-09-26 15:14:04 +08:00
Martin Kroeker	7e5df34e6a	Convert fldmia/fstmia instructions to UAL syntax for clang7 fixes #1774	2018-09-25 09:41:58 +02:00
Andrew	1e531701b7	fix small typo	2018-09-09 16:52:25 +02:00
Martin Kroeker	ba4f433321	Merge pull request #1749 from martin-frbg/issue1531 Fix ARMV8 cross-compilation for IOS	2018-09-07 11:02:01 +02:00
Martin Kroeker	1cb7b9015e	Conditional compilation of assembly files that IOS does not like	2018-09-04 11:06:51 +02:00
Martin Kroeker	a4bd41e9f2	Fix paths to C kernels for nrm2	2018-09-04 10:51:19 +02:00
Martin Kroeker	e11126b26a	Merge pull request #1745 from martin-frbg/issue1743 Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…	2018-08-29 07:43:58 +02:00
Martin Kroeker	f3fd44a731	Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic fixes #1743	2018-08-28 21:34:07 +02:00
Martin Kroeker	375dff54fc	Merge pull request #1733 from fenrus75/dsymv Add an AVX512 enabled DSYMV (L) function	2018-08-12 18:18:36 +02:00
Martin Kroeker	a5f165275a	Merge pull request #1732 from fenrus75/dgemv Add an AVX512 enabled DGEMV (n) function	2018-08-12 18:17:42 +02:00
Martin Kroeker	8c13aa495a	Merge pull request #1730 from fenrus75/fix-sdot Fix typo in sdot function	2018-08-12 18:17:01 +02:00
Arjan van de Ven	9bec34cb67	Add an AVX512 enabled DSYMV (L) function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:46:24 +00:00
Arjan van de Ven	87bebdbd8a	Add an AVX512 enabled DGEMV (n) function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:38:12 +00:00
Arjan van de Ven	36add7570a	Fix typo in sdot function it looks like my previous pull request was short the final commit; fix a typo in sdot	2018-08-11 17:16:45 +00:00
Arjan van de Ven	cacacc8007	Add an AVX512 enabled DSCAL function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-11 17:14:57 +00:00
Martin Kroeker	1a00ef3d27	Merge pull request #1725 from fenrus75/axpy Add a AVX512 enabled SAXPY/DAXPY functions	2018-08-11 11:01:20 +02:00
Arjan van de Ven	2e99873ff7	Add a AVX512 enabled SAXPY/DAXPY functions written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-10 02:58:32 +00:00
Arjan van de Ven	00abaa865b	Add an AVX512 enabled SDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-10 02:33:43 +00:00
Arjan van de Ven	7932ff3ea9	Add an AVX512 enabled DDOT function written in C intrinsics for best readability. (the same C code works for Haswell as well) For logistical reasons the code falls back to the existing haswell AVX2 implementation if the GCC or LLVM compiler is not new enough	2018-08-09 03:55:52 +00:00
Martin Kroeker	4e103c822c	typo fix	2018-07-16 12:56:39 +02:00
Martin Kroeker	d2142760e0	Fix precision problem in DSDOT	2018-07-15 17:11:40 +02:00
Martin Kroeker	2fbfc64da8	Use C kernels for default c/zAXPY, xROT, c/zSWAP	2018-07-15 17:09:55 +02:00
Martin Kroeker	ba8388cee0	Merge pull request #1651 from martin-frbg/avx512-nodgemm Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:48:03 +02:00
Martin Kroeker	6e54b0a027	Disable the 16x2 DTRMM kernel on SkylakeX as well	2018-06-30 17:31:06 +02:00
Martin Kroeker	40c8cbc3bf	Merge pull request #1650 from martin-frbg/avx512-nodgemm Disable the AVX512 DGEMM kernel for now	2018-06-30 13:05:46 +02:00
Martin Kroeker	f0a8dc2eec	Disable the AVX512 DGEMM kernel for now due to #1643	2018-06-30 11:34:48 +02:00
Martin Kroeker	b83e4c60c7	Remove premature exit for INC_X or INC_Y zero	2018-06-26 20:46:42 +02:00

1 2 3 4 5 ...

1039 Commits