Martin Kroeker
ed704185ab
Increment version to 0.3.6.dev
2018-12-31 23:11:37 +01:00
Martin Kroeker
2940798ea7
Increment version to 0.3.6.dev
2018-12-31 23:10:59 +01:00
Martin Kroeker
eebc189287
Version 0.3.5
2018-12-31 23:09:59 +01:00
Martin Kroeker
9185d419d3
Version 0.3.5
2018-12-31 23:09:20 +01:00
Martin Kroeker
4cf9d32694
Merge pull request #1945 from xianyi/develop
...
Merge changes from develop for 0.3.5 release
2018-12-31 23:08:25 +01:00
Martin Kroeker
1c75b65d53
Merge branch 'release-0.3.0' into develop
2018-12-31 23:07:53 +01:00
Martin Kroeker
13d006339b
Update ChangeLog.txt with changes from 0.3.5
2018-12-31 23:00:46 +01:00
Martin Kroeker
bf76162635
Merge pull request #1944 from hartzell/patch-1
...
Typo: Skyalke -> Skylake
2018-12-31 18:36:18 +01:00
George Hartzell
0d52aefc6b
Typo: Skyalke -> Skylake
...
Worth fixing, it gets in the way of searching....
2018-12-30 14:55:34 -08:00
Martin Kroeker
a6787b0f81
Merge pull request #1939 from TiborGY/patch-2
...
Fix typo in UNKNOWN core name
2018-12-30 20:10:05 +01:00
Martin Kroeker
8643521127
Merge pull request #1943 from martin-frbg/issue1748
...
Re-enable loop unrolling in trmv and remove the scary warning
2018-12-30 20:07:01 +01:00
Martin Kroeker
5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
...
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
2018-12-30 15:22:37 +01:00
Martin Kroeker
ccd5945d38
Merge pull request #1942 from martin-frbg/issue1720
...
Delete the pthread key on cleanup in TLS mode
2018-12-30 14:47:05 +01:00
Martin Kroeker
9f80e0f5fc
Remove stray include of complex.h
...
already provided conditionally by common.h via openblas_utest.h
Unconditional inclusion breaks older Android and similar platforms that use OPENBLAS_COMPLEX_STRUCT
2018-12-30 14:39:18 +01:00
Martin Kroeker
bba1e67269
Delete the pthread key on cleanup in TLS mode
...
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
2018-12-29 21:59:31 +01:00
Martin Kroeker
93240f489e
Fix wrong case in TARGET setting for Alpine
2018-12-29 18:12:54 +01:00
TiborGY
7cbc2c37d6
Update cpuid_mips64.c
2018-12-28 14:36:39 +01:00
TiborGY
c329de2931
Update Makefile
2018-12-28 14:35:41 +01:00
TiborGY
187233953c
Update cpuid_mips.c
2018-12-28 14:34:38 +01:00
TiborGY
09170268a3
Update cpuid_arm.c
2018-12-28 14:33:18 +01:00
TiborGY
211120c508
Fix typo in UNKNOWN core name
...
Should be of no consequence, right?
2018-12-27 23:09:21 +01:00
Martin Kroeker
9e4d190f4f
Merge pull request #1932 from martin-frbg/issue1915
...
Add -fPIC to provided CFLAGS/FFLAGS if required
2018-12-24 23:48:33 +01:00
Martin Kroeker
fe02ba86a4
Remove unnecessary change again
2018-12-24 20:46:04 +01:00
Martin Kroeker
284fb00971
Merge pull request #1934 from fenrus75/betagoof
...
Fix thinko in skylake beta handling
2018-12-24 19:53:50 +01:00
Arjan van de Ven
795285c587
Fix thinko in skylake beta handling
...
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
2018-12-24 18:49:50 +00:00
Martin Kroeker
d6818777d1
Make sure that -fPIC is present if needed
2018-12-23 23:47:37 +01:00
Martin Kroeker
5bd21ab6e1
Make sure that -fPIC is present when needed
...
override user-provided FFLAGS if necessary
2018-12-23 23:46:48 +01:00
Martin Kroeker
e1eab96502
Merge pull request #1931 from martin-frbg/pr1921
...
Add -mavx2 to TARGET=HASWELL builds
2018-12-23 23:15:54 +01:00
Martin Kroeker
76b4b8980f
Use -dumpversion with gcc only
2018-12-23 19:08:19 +01:00
Martin Kroeker
49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it
2018-12-23 17:26:09 +01:00
Martin Kroeker
43c2b0eb55
Add -mavx2 to TARGET=HASWELL builds
...
to leverage improvements from PR#1921
2018-12-23 17:16:43 +01:00
Martin Kroeker
942e229ed5
Merge pull request #1930 from martin-frbg/issue1908
...
Reflect ARMV8 target definition changes from PR1876
2018-12-23 15:06:33 +01:00
Martin Kroeker
26a3402773
Reflect ARMV8 target definition changes from PR1876
...
and create config target directory for cross-compiles.
2018-12-23 12:26:01 +01:00
Martin Kroeker
20033f992a
Merge pull request #1929 from martin-frbg/issue1924
...
Avoid taking the root of a negative number in simple threaded syrk
2018-12-23 09:03:58 +01:00
Martin Kroeker
f343ed65b5
Avoid taking the root of a negative number
...
Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.
2018-12-22 22:30:29 +01:00
Martin Kroeker
a5a1118527
Merge pull request #1 from xianyi/develop
...
rebase
2018-12-22 22:13:44 +01:00
Martin Kroeker
e23366e860
Merge pull request #1921 from fenrus75/haswelldgemm
...
Replicate some of the SKYLAKEX dgemm improvements also to HASWELL
2018-12-17 08:39:20 +01:00
Arjan van de Ven
b28f75cd7e
set GEMM_PREFERED_SIZE for HASWELL
...
Haswell likes a GEMM_PREFERED_SIZE of 16 to improve the split that the
threading code does to make it a nice multiple of the SIMD kernel size
2018-12-16 23:09:27 +00:00
Arjan van de Ven
d321448a63
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
...
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
2018-12-16 23:09:22 +00:00
Arjan van de Ven
c43331ad0a
dgemm: Use the skylakex beta function also for haswell
...
it's more efficient for certain tall/skinny matrices
2018-12-16 23:09:17 +00:00
Martin Kroeker
e8ca5a59a9
Merge pull request #1919 from fenrus75/haswelltuning
...
(sgemm) Apply some of the SKYLAKEX optimizations also to HASWELL
2018-12-16 20:11:05 +01:00
Martin Kroeker
c4e23dd016
Update Makefile
2018-12-16 18:14:40 +01:00
Martin Kroeker
cfc4acc221
typo
2018-12-16 16:19:51 +01:00
Martin Kroeker
545c2b1bbb
Add -mavx2 on Haswell only if the compiler supports it
2018-12-16 13:09:19 +01:00
Arjan van de Ven
69d206440a
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
2018-12-16 00:19:41 +00:00
Martin Kroeker
3843e3e017
use -maxv2 on haswell
2018-12-15 23:30:31 +01:00
Martin Kroeker
fbcb14a74b
should be core-avx2
2018-12-15 20:18:59 +01:00
Martin Kroeker
2a3190dc76
fix elseifeq and use older option core2-avx for compatibility
2018-12-15 20:17:44 +01:00
Martin Kroeker
1ebe5c0f49
Add -march=haswell to HASWELL part of DYNAMIC_ARCH build
2018-12-15 19:35:35 +01:00
Arjan van de Ven
0586899a10
Use sgemm_ncopy_4_skylakex.c also for Haswell
...
sgemm_ncopy_4_skylakex.c uses SSE transpose operations where the
real perf win happens; this also works great for Haswell.
This gives double digit percentage gains on small and skinny matrices
2018-12-15 13:49:19 +00:00