OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	76b4b8980f	Use -dumpversion with gcc only	2018-12-23 19:08:19 +01:00
Martin Kroeker	49e0f485da	Add -mavx2 for TARGET=HASWELL if compiler supports and requires it	2018-12-23 17:26:09 +01:00
Martin Kroeker	43c2b0eb55	Add -mavx2 to TARGET=HASWELL builds to leverage improvements from PR#1921	2018-12-23 17:16:43 +01:00
Martin Kroeker	942e229ed5	Merge pull request #1930 from martin-frbg/issue1908 Reflect ARMV8 target definition changes from PR1876	2018-12-23 15:06:33 +01:00
Martin Kroeker	26a3402773	Reflect ARMV8 target definition changes from PR1876 and create config target directory for cross-compiles.	2018-12-23 12:26:01 +01:00
Martin Kroeker	20033f992a	Merge pull request #1929 from martin-frbg/issue1924 Avoid taking the root of a negative number in simple threaded syrk	2018-12-23 09:03:58 +01:00
Martin Kroeker	f343ed65b5	Avoid taking the root of a negative number Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.	2018-12-22 22:30:29 +01:00
Martin Kroeker	a5a1118527	Merge pull request #1 from xianyi/develop rebase	2018-12-22 22:13:44 +01:00
Martin Kroeker	e23366e860	Merge pull request #1921 from fenrus75/haswelldgemm Replicate some of the SKYLAKEX dgemm improvements also to HASWELL	2018-12-17 08:39:20 +01:00
Arjan van de Ven	b28f75cd7e	set GEMM_PREFERED_SIZE for HASWELL Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the threading code does to make it a nice multiple of the SIMD kernel size	2018-12-16 23:09:27 +00:00
Arjan van de Ven	d321448a63	dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives a nice performance boost for medium sized matrices	2018-12-16 23:09:22 +00:00
Arjan van de Ven	c43331ad0a	dgemm: Use the skylakex beta function also for haswell it's more efficient for certain tall/skinny matrices	2018-12-16 23:09:17 +00:00
Martin Kroeker	e8ca5a59a9	Merge pull request #1919 from fenrus75/haswelltuning (sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL	2018-12-16 20:11:05 +01:00
Martin Kroeker	c4e23dd016	Update Makefile	2018-12-16 18:14:40 +01:00
Martin Kroeker	cfc4acc221	typo	2018-12-16 16:19:51 +01:00
Martin Kroeker	545c2b1bbb	Add -mavx2 on Haswell only if the compiler supports it	2018-12-16 13:09:19 +01:00
Arjan van de Ven	69d206440a	Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support	2018-12-16 00:19:41 +00:00
Martin Kroeker	3843e3e017	use -maxv2 on haswell	2018-12-15 23:30:31 +01:00
Martin Kroeker	fbcb14a74b	should be core-avx2	2018-12-15 20:18:59 +01:00
Martin Kroeker	2a3190dc76	fix elseifeq and use older option core2-avx for compatibility	2018-12-15 20:17:44 +01:00
Martin Kroeker	1ebe5c0f49	Add -march=haswell to HASWELL part of DYNAMIC_ARCH build	2018-12-15 19:35:35 +01:00
Arjan van de Ven	0586899a10	Use sgemm_ncopy_4_skylakex.c also for Haswell sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the real perf win happens; this also works great for Haswell. This gives double digit percentage gains on small and skinny matrices	2018-12-15 13:49:19 +00:00
Arjan van de Ven	00dc09ad19	Use the skylake sgemm beta code also for haswell with a few small changes it's possible to use the skylake sgemm code also for haswell, this gives a modest gain (10% range) for smallish matrixes but does wonders for very skinny matrixes	2018-12-15 13:49:13 +00:00
Martin Kroeker	78d877b54b	Merge pull request #1914 from fenrus75/smallmatrix Add a "sgemm direct" mode for small matrixes	2018-12-13 19:08:14 +01:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Martin Kroeker	87718807f0	Merge pull request #1910 from martin-frbg/issue1909 Fix for DYNAMIC_ARCH builds made on a AVX512-capable host	2018-12-12 14:56:25 +01:00
Martin Kroeker	51aec8e96b	make sure the added march=skylake-avx512 does not cause problems on Windows	2018-12-11 22:47:32 +01:00
Martin Kroeker	06f7d78d70	Add -march=skylake-avx512 to SkylakeX part of DYNAMIC_ARCH builds	2018-12-11 21:10:38 +01:00
Martin Kroeker	38cc638591	Avoid adding blanket march=skylake-avx512 to dynamic_arch builds	2018-12-11 21:09:26 +01:00
Martin Kroeker	0bf6d74e5f	Fix typo in previous commit for arm dynamic arch	2018-12-07 19:37:33 +01:00
Martin Kroeker	133c278ee5	Add DYNAMIC_CORE list for ARM64 cf #1908	2018-12-07 17:42:23 +01:00
Martin Kroeker	2b355592e3	Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH cf. #1908	2018-12-07 16:25:55 +01:00
Martin Kroeker	ff3eb1d474	Merge pull request #1904 from martin-frbg/issue1870 Fix cmake parsing of GEMM kernels for ARMV8	2018-12-06 23:01:23 +01:00
Martin Kroeker	0b09516678	Fix missing parameter in popen call	2018-12-06 18:33:05 +01:00
Martin Kroeker	7639f2e1f0	Rewrite the conditional for OSX to fix cmake parsing on others The Makefile variable parser in utils.cmake currently does not handle conditionals. Having the definitions for non-OSX last will at least make cmake builds work again on non-OSX platforms.	2018-12-06 14:04:27 +01:00
Martin Kroeker	2fc712469d	Avoid creating spurious non-suffixed c/zgemm_kernels Plain cgemm_kernel and zgemm_kernel are not used anywhere, only cgemm_kernel_b etc. Needlessly building them (without any define like NN, CN, etc.) just happened to work on most platforms, but not on arm64. See #1870	2018-12-06 13:56:06 +01:00
Martin Kroeker	6ba30e270d	Fix typo that broke CNRM2 on ARMV8 since 0.3.0 must have happened in my #1449	2018-12-06 13:42:25 +01:00
Martin Kroeker	bf23518e36	Merge pull request #1903 from rengolin/armv8 Fix two mistakes on Arm64 builds	2018-12-05 22:10:53 +01:00
Renato Golin	31a490ea88	Fix two mistakes on Arm64 builds * Falkor is an ARMv8.0 with ARMv8.1 features, and chosing armv8.1-a for march generates instructions it cannot cope with. Reverting it back to armv8-a. * ThunderX2's build was left with a #define VULCAN, which made it miss the right compiler flags in Makefile.arm64, although it did create the right library in the end.	2018-12-05 18:51:38 +00:00
Martin Kroeker	701ea88347	Use p2align instead of align for OSX compatibility fixes #1902	2018-12-03 13:06:43 +01:00
Martin Kroeker	721c56c224	Merge pull request #1899 from brada4/fbsd12 Add mutually supported architecture mappings for FreeBSD12 ports	2018-12-03 12:50:27 +01:00
Martin Kroeker	c5f8aeff2d	Merge branch 'develop' into fbsd12	2018-12-03 12:50:14 +01:00
Martin Kroeker	8278cbe7f8	Merge pull request #1894 from pkubaj/patch-2 Use correct ARCH name on BSD powerpc64	2018-12-03 12:48:53 +01:00
Martin Kroeker	ea6d1b96bd	Update Makefile.system	2018-12-03 08:59:10 +01:00
Martin Kroeker	360374be62	Update with the changes from 0.3.4	2018-12-02 23:44:13 +01:00
Martin Kroeker	f5acaad8f0	Increment version to 0.3.5.dev	2018-12-02 23:43:15 +01:00
Martin Kroeker	93fa6b7b76	Increment version to 0.3.5.dev	2018-12-02 23:42:33 +01:00
Martin Kroeker	c0827a7164	Update with changes from 0.3.4	2018-12-02 23:41:17 +01:00
Martin Kroeker	86cff4effc	Merge pull request #1900 from xianyi/develop Update from develop for 0.3.4	2018-12-02 23:40:21 +01:00
Martin Kroeker	b028960aba	Merge branch 'release-0.3.0' into develop	2018-12-02 23:38:49 +01:00

1 2 3 4 5 ...

3414 Commits All Branches Search

3414 Commits

All Branches