OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	93240f489e	Fix wrong case in TARGET setting for Alpine	2018-12-29 18:12:54 +01:00
TiborGY	7cbc2c37d6	Update cpuid_mips64.c	2018-12-28 14:36:39 +01:00
TiborGY	c329de2931	Update Makefile	2018-12-28 14:35:41 +01:00
TiborGY	187233953c	Update cpuid_mips.c	2018-12-28 14:34:38 +01:00
TiborGY	09170268a3	Update cpuid_arm.c	2018-12-28 14:33:18 +01:00
TiborGY	211120c508	Fix typo in UNKNOWN core name Should be of no consequence, right?	2018-12-27 23:09:21 +01:00
Martin Kroeker	9e4d190f4f	Merge pull request #1932 from martin-frbg/issue1915 Add -fPIC to provided CFLAGS/FFLAGS if required	2018-12-24 23:48:33 +01:00
Martin Kroeker	fe02ba86a4	Remove unnecessary change again	2018-12-24 20:46:04 +01:00
Martin Kroeker	284fb00971	Merge pull request #1934 from fenrus75/betagoof Fix thinko in skylake beta handling	2018-12-24 19:53:50 +01:00
Arjan van de Ven	795285c587	Fix thinko in skylake beta handling casting ints is cheaper but it has a rounding, not memory casing effect, resulting in invalid outcome	2018-12-24 18:49:50 +00:00
Martin Kroeker	d6818777d1	Make sure that -fPIC is present if needed	2018-12-23 23:47:37 +01:00
Martin Kroeker	5bd21ab6e1	Make sure that -fPIC is present when needed override user-provided FFLAGS if necessary	2018-12-23 23:46:48 +01:00
Martin Kroeker	e1eab96502	Merge pull request #1931 from martin-frbg/pr1921 Add -mavx2 to TARGET=HASWELL builds	2018-12-23 23:15:54 +01:00
Martin Kroeker	76b4b8980f	Use -dumpversion with gcc only	2018-12-23 19:08:19 +01:00
Martin Kroeker	49e0f485da	Add -mavx2 for TARGET=HASWELL if compiler supports and requires it	2018-12-23 17:26:09 +01:00
Martin Kroeker	43c2b0eb55	Add -mavx2 to TARGET=HASWELL builds to leverage improvements from PR#1921	2018-12-23 17:16:43 +01:00
Martin Kroeker	942e229ed5	Merge pull request #1930 from martin-frbg/issue1908 Reflect ARMV8 target definition changes from PR1876	2018-12-23 15:06:33 +01:00
Martin Kroeker	26a3402773	Reflect ARMV8 target definition changes from PR1876 and create config target directory for cross-compiles.	2018-12-23 12:26:01 +01:00
Martin Kroeker	20033f992a	Merge pull request #1929 from martin-frbg/issue1924 Avoid taking the root of a negative number in simple threaded syrk	2018-12-23 09:03:58 +01:00
Martin Kroeker	f343ed65b5	Avoid taking the root of a negative number Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.	2018-12-22 22:30:29 +01:00
Martin Kroeker	a5a1118527	Merge pull request #1 from xianyi/develop rebase	2018-12-22 22:13:44 +01:00
Martin Kroeker	e23366e860	Merge pull request #1921 from fenrus75/haswelldgemm Replicate some of the SKYLAKEX dgemm improvements also to HASWELL	2018-12-17 08:39:20 +01:00
Arjan van de Ven	b28f75cd7e	set GEMM_PREFERED_SIZE for HASWELL Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the threading code does to make it a nice multiple of the SIMD kernel size	2018-12-16 23:09:27 +00:00
Arjan van de Ven	d321448a63	dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives a nice performance boost for medium sized matrices	2018-12-16 23:09:22 +00:00
Arjan van de Ven	c43331ad0a	dgemm: Use the skylakex beta function also for haswell it's more efficient for certain tall/skinny matrices	2018-12-16 23:09:17 +00:00
Martin Kroeker	e8ca5a59a9	Merge pull request #1919 from fenrus75/haswelltuning (sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL	2018-12-16 20:11:05 +01:00
Martin Kroeker	c4e23dd016	Update Makefile	2018-12-16 18:14:40 +01:00
Martin Kroeker	cfc4acc221	typo	2018-12-16 16:19:51 +01:00
Martin Kroeker	545c2b1bbb	Add -mavx2 on Haswell only if the compiler supports it	2018-12-16 13:09:19 +01:00
Arjan van de Ven	69d206440a	Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support	2018-12-16 00:19:41 +00:00
Martin Kroeker	3843e3e017	use -maxv2 on haswell	2018-12-15 23:30:31 +01:00
Martin Kroeker	fbcb14a74b	should be core-avx2	2018-12-15 20:18:59 +01:00
Martin Kroeker	2a3190dc76	fix elseifeq and use older option core2-avx for compatibility	2018-12-15 20:17:44 +01:00
Martin Kroeker	1ebe5c0f49	Add -march=haswell to HASWELL part of DYNAMIC_ARCH build	2018-12-15 19:35:35 +01:00
Arjan van de Ven	0586899a10	Use sgemm_ncopy_4_skylakex.c also for Haswell sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the real perf win happens; this also works great for Haswell. This gives double digit percentage gains on small and skinny matrices	2018-12-15 13:49:19 +00:00
Arjan van de Ven	00dc09ad19	Use the skylake sgemm beta code also for haswell with a few small changes it's possible to use the skylake sgemm code also for haswell, this gives a modest gain (10% range) for smallish matrixes but does wonders for very skinny matrixes	2018-12-15 13:49:13 +00:00
Martin Kroeker	78d877b54b	Merge pull request #1914 from fenrus75/smallmatrix Add a "sgemm direct" mode for small matrixes	2018-12-13 19:08:14 +01:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Martin Kroeker	87718807f0	Merge pull request #1910 from martin-frbg/issue1909 Fix for DYNAMIC_ARCH builds made on a AVX512-capable host	2018-12-12 14:56:25 +01:00
Martin Kroeker	51aec8e96b	make sure the added march=skylake-avx512 does not cause problems on Windows	2018-12-11 22:47:32 +01:00
Martin Kroeker	06f7d78d70	Add -march=skylake-avx512 to SkylakeX part of DYNAMIC_ARCH builds	2018-12-11 21:10:38 +01:00
Martin Kroeker	38cc638591	Avoid adding blanket march=skylake-avx512 to dynamic_arch builds	2018-12-11 21:09:26 +01:00
Martin Kroeker	0bf6d74e5f	Fix typo in previous commit for arm dynamic arch	2018-12-07 19:37:33 +01:00
Martin Kroeker	133c278ee5	Add DYNAMIC_CORE list for ARM64 cf #1908	2018-12-07 17:42:23 +01:00
Martin Kroeker	2b355592e3	Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH cf. #1908	2018-12-07 16:25:55 +01:00
Martin Kroeker	ff3eb1d474	Merge pull request #1904 from martin-frbg/issue1870 Fix cmake parsing of GEMM kernels for ARMV8	2018-12-06 23:01:23 +01:00
Martin Kroeker	0b09516678	Fix missing parameter in popen call	2018-12-06 18:33:05 +01:00
Martin Kroeker	7639f2e1f0	Rewrite the conditional for OSX to fix cmake parsing on others The Makefile variable parser in utils.cmake currently does not handle conditionals. Having the definitions for non-OSX last will at least make cmake builds work again on non-OSX platforms.	2018-12-06 14:04:27 +01:00
Martin Kroeker	2fc712469d	Avoid creating spurious non-suffixed c/zgemm_kernels Plain cgemm_kernel and zgemm_kernel are not used anywhere, only cgemm_kernel_b etc. Needlessly building them (without any define like NN, CN, etc.) just happened to work on most platforms, but not on arm64. See #1870	2018-12-06 13:56:06 +01:00
Martin Kroeker	6ba30e270d	Fix typo that broke CNRM2 on ARMV8 since 0.3.0 must have happened in my #1449	2018-12-06 13:42:25 +01:00

... 8 9 10 11 12 ...

3830 Commits All Branches Search

3830 Commits

All Branches