OpenBLAS

Author	SHA1	Message	Date
George Hartzell	0d52aefc6b	Typo: Skyalke -> Skylake Worth fixing, it gets in the way of searching....	2018-12-30 14:55:34 -08:00
Martin Kroeker	a6787b0f81	Merge pull request #1939 from TiborGY/patch-2 Fix typo in UNKNOWN core name	2018-12-30 20:10:05 +01:00
Martin Kroeker	8643521127	Merge pull request #1943 from martin-frbg/issue1748 Re-enable loop unrolling in trmv and remove the scary warning	2018-12-30 20:07:01 +01:00
Martin Kroeker	5a720cf9ca	Re-enable loop unrolling in trmv and remove the scary warning fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.	2018-12-30 15:22:37 +01:00
Martin Kroeker	ccd5945d38	Merge pull request #1942 from martin-frbg/issue1720 Delete the pthread key on cleanup in TLS mode	2018-12-30 14:47:05 +01:00
Martin Kroeker	9f80e0f5fc	Remove stray include of complex.h already provided conditionally by common.h via openblas_utest.h Unconditional inclusion breaks older Android and similar platforms that use OPENBLAS_COMPLEX_STRUCT	2018-12-30 14:39:18 +01:00
Martin Kroeker	bba1e67269	Delete the pthread key on cleanup in TLS mode to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose Fixes #1720	2018-12-29 21:59:31 +01:00
Martin Kroeker	93240f489e	Fix wrong case in TARGET setting for Alpine	2018-12-29 18:12:54 +01:00
TiborGY	7cbc2c37d6	Update cpuid_mips64.c	2018-12-28 14:36:39 +01:00
TiborGY	c329de2931	Update Makefile	2018-12-28 14:35:41 +01:00
TiborGY	187233953c	Update cpuid_mips.c	2018-12-28 14:34:38 +01:00
TiborGY	09170268a3	Update cpuid_arm.c	2018-12-28 14:33:18 +01:00
TiborGY	211120c508	Fix typo in UNKNOWN core name Should be of no consequence, right?	2018-12-27 23:09:21 +01:00
Martin Kroeker	9e4d190f4f	Merge pull request #1932 from martin-frbg/issue1915 Add -fPIC to provided CFLAGS/FFLAGS if required	2018-12-24 23:48:33 +01:00
Martin Kroeker	fe02ba86a4	Remove unnecessary change again	2018-12-24 20:46:04 +01:00
Martin Kroeker	284fb00971	Merge pull request #1934 from fenrus75/betagoof Fix thinko in skylake beta handling	2018-12-24 19:53:50 +01:00
Arjan van de Ven	795285c587	Fix thinko in skylake beta handling casting ints is cheaper but it has a rounding, not memory casing effect, resulting in invalid outcome	2018-12-24 18:49:50 +00:00
Martin Kroeker	d6818777d1	Make sure that -fPIC is present if needed	2018-12-23 23:47:37 +01:00
Martin Kroeker	5bd21ab6e1	Make sure that -fPIC is present when needed override user-provided FFLAGS if necessary	2018-12-23 23:46:48 +01:00
Martin Kroeker	e1eab96502	Merge pull request #1931 from martin-frbg/pr1921 Add -mavx2 to TARGET=HASWELL builds	2018-12-23 23:15:54 +01:00
Martin Kroeker	76b4b8980f	Use -dumpversion with gcc only	2018-12-23 19:08:19 +01:00
Martin Kroeker	49e0f485da	Add -mavx2 for TARGET=HASWELL if compiler supports and requires it	2018-12-23 17:26:09 +01:00
Martin Kroeker	43c2b0eb55	Add -mavx2 to TARGET=HASWELL builds to leverage improvements from PR#1921	2018-12-23 17:16:43 +01:00
Martin Kroeker	942e229ed5	Merge pull request #1930 from martin-frbg/issue1908 Reflect ARMV8 target definition changes from PR1876	2018-12-23 15:06:33 +01:00
Martin Kroeker	26a3402773	Reflect ARMV8 target definition changes from PR1876 and create config target directory for cross-compiles.	2018-12-23 12:26:01 +01:00
Martin Kroeker	20033f992a	Merge pull request #1929 from martin-frbg/issue1924 Avoid taking the root of a negative number in simple threaded syrk	2018-12-23 09:03:58 +01:00
Martin Kroeker	f343ed65b5	Avoid taking the root of a negative number Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.	2018-12-22 22:30:29 +01:00
Martin Kroeker	a5a1118527	Merge pull request #1 from xianyi/develop rebase	2018-12-22 22:13:44 +01:00
Martin Kroeker	e23366e860	Merge pull request #1921 from fenrus75/haswelldgemm Replicate some of the SKYLAKEX dgemm improvements also to HASWELL	2018-12-17 08:39:20 +01:00
Arjan van de Ven	b28f75cd7e	set GEMM_PREFERED_SIZE for HASWELL Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the threading code does to make it a nice multiple of the SIMD kernel size	2018-12-16 23:09:27 +00:00
Arjan van de Ven	d321448a63	dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives a nice performance boost for medium sized matrices	2018-12-16 23:09:22 +00:00
Arjan van de Ven	c43331ad0a	dgemm: Use the skylakex beta function also for haswell it's more efficient for certain tall/skinny matrices	2018-12-16 23:09:17 +00:00
Martin Kroeker	e8ca5a59a9	Merge pull request #1919 from fenrus75/haswelltuning (sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL	2018-12-16 20:11:05 +01:00
Martin Kroeker	c4e23dd016	Update Makefile	2018-12-16 18:14:40 +01:00
Martin Kroeker	cfc4acc221	typo	2018-12-16 16:19:51 +01:00
Martin Kroeker	545c2b1bbb	Add -mavx2 on Haswell only if the compiler supports it	2018-12-16 13:09:19 +01:00
Arjan van de Ven	69d206440a	Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support	2018-12-16 00:19:41 +00:00
Martin Kroeker	3843e3e017	use -maxv2 on haswell	2018-12-15 23:30:31 +01:00
Martin Kroeker	fbcb14a74b	should be core-avx2	2018-12-15 20:18:59 +01:00
Martin Kroeker	2a3190dc76	fix elseifeq and use older option core2-avx for compatibility	2018-12-15 20:17:44 +01:00
Martin Kroeker	1ebe5c0f49	Add -march=haswell to HASWELL part of DYNAMIC_ARCH build	2018-12-15 19:35:35 +01:00
Arjan van de Ven	0586899a10	Use sgemm_ncopy_4_skylakex.c also for Haswell sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the real perf win happens; this also works great for Haswell. This gives double digit percentage gains on small and skinny matrices	2018-12-15 13:49:19 +00:00
Arjan van de Ven	00dc09ad19	Use the skylake sgemm beta code also for haswell with a few small changes it's possible to use the skylake sgemm code also for haswell, this gives a modest gain (10% range) for smallish matrixes but does wonders for very skinny matrixes	2018-12-15 13:49:13 +00:00
Martin Kroeker	78d877b54b	Merge pull request #1914 from fenrus75/smallmatrix Add a "sgemm direct" mode for small matrixes	2018-12-13 19:08:14 +01:00
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Martin Kroeker	87718807f0	Merge pull request #1910 from martin-frbg/issue1909 Fix for DYNAMIC_ARCH builds made on a AVX512-capable host	2018-12-12 14:56:25 +01:00
Martin Kroeker	51aec8e96b	make sure the added march=skylake-avx512 does not cause problems on Windows	2018-12-11 22:47:32 +01:00
Martin Kroeker	06f7d78d70	Add -march=skylake-avx512 to SkylakeX part of DYNAMIC_ARCH builds	2018-12-11 21:10:38 +01:00
Martin Kroeker	38cc638591	Avoid adding blanket march=skylake-avx512 to dynamic_arch builds	2018-12-11 21:09:26 +01:00
Martin Kroeker	0bf6d74e5f	Fix typo in previous commit for arm dynamic arch	2018-12-07 19:37:33 +01:00

1 2 3 4 5 ...

3382 Commits