Commit Graph

419 Commits

Author SHA1 Message Date
Martin Kroeker a6a8cc2b7f
Fix errors in cpu enumeration with glibc 2.6
for #2114
2019-05-07 13:34:52 +02:00
Jeff Baylor 40e53e52d6 snprintf define consolidated to common.h 2019-04-22 17:01:34 -07:00
Rashmica Gupta bcdf1d4917 Add in runtime CPU detection for POWER. 2019-04-09 14:20:16 +10:00
Erik M. Bray 8ba9e2a61a Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. 2019-03-19 11:21:44 +01:00
Erik M. Bray 4ad694eda1 Fix for #2063: The DllMain used in Cygwin did not run the thread memory
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
2019-03-19 09:26:50 +01:00
Martin Kroeker 3ce28fb81a
Merge pull request #2055 from martin-frbg/atomid
Add CPUID data for Intel Denverton (as Nehalem)
2019-03-12 22:57:07 +01:00
Martin Kroeker 04f2226ea6
Add Intel Denverton 2019-03-12 16:09:55 +01:00
Martin Kroeker 4741ce803b
Merge pull request #2045 from martin-frbg/2033-3
Do not compile in AVX512 check if AVX support is disabled
2019-03-06 22:40:26 +01:00
Martin Kroeker 11cfd0bd75
Do not compile in AVX512 check if AVX support is disabled
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
2019-03-05 16:04:25 +01:00
Martin Kroeker d7b2c53c0b
Merge pull request #2039 from brada4/meminit
Address warning in memory.c
2019-03-05 12:11:15 +01:00
Martin Kroeker 10d841d8b9
Merge pull request #2026 from martin-frbg/trmv_threads
Correct range limiting in trmv_thread and re-enable TRMV multithreading
2019-03-04 15:08:31 +01:00
Martin Kroeker 6c83b878f6
Merge pull request #2040 from martin-frbg/locks2002
Restore locking optimizations for OpenMP case
2019-03-04 15:07:14 +01:00
Martin Kroeker af480b02a4
Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
2019-03-03 14:17:07 +01:00
Andrew e4a79be6bb address warning introed with #1814 et al 2019-03-03 09:05:11 +02:00
Martin Kroeker 45333d5793
Fix error introduced during cleanup 2019-02-19 22:16:33 +01:00
Martin Kroeker 78d9910236
Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
2019-02-19 20:59:48 +01:00
Martin Kroeker 03a2bf2602
Fix potential memory leak in cpu enumeration on Linux (#2008)
* Fix potential memory leak in cpu enumeration with glibc

An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
2019-02-10 23:24:45 +01:00
Martin Kroeker 69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c (#2004)
* Restore dropped patches in the non-TLS branch of memory.c

As discovered in #2002, the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450, #1468, #1501, #1504 and #1520.
2019-02-07 20:06:13 +01:00
caiyu 29dc72889f Add support for Hygon Dhyana 2019-01-16 14:25:19 +08:00
Martin Kroeker dbc9a060ef
Fix missing braces in support_av() call 2019-01-14 22:41:31 +01:00
Martin Kroeker 21c0f2af7b
Merge pull request #1957 from martin-frbg/issue1954
Move TLS key deletion to openblas_quit
2019-01-10 12:04:08 +01:00
Martin Kroeker ad2c386d6a
Move TLS key deletion to openblas_quit
fixes #1954 (as suggested by thrasibule in that issue)
2019-01-10 00:32:50 +01:00
Martin Kroeker 31ed19e8b9
Add message for SkylakeX and KNL fallbacks to Haswell 2019-01-05 19:41:13 +01:00
Martin Kroeker e1574fa2b4
Add xcr0 (os support) check 2019-01-05 18:08:02 +01:00
Martin Kroeker ae1d1f74f7
Query AVX2 and AVX512 capability for runtime cpu selection 2019-01-05 16:55:33 +01:00
Martin Kroeker 8643521127
Merge pull request #1943 from martin-frbg/issue1748
Re-enable loop unrolling in trmv and remove the scary warning
2018-12-30 20:07:01 +01:00
Martin Kroeker 5a720cf9ca
Re-enable loop unrolling in trmv and remove the scary warning
fixes #1748 as that half of the fix for #1332 appears to have been an overreaction on my part.
2018-12-30 15:22:37 +01:00
Martin Kroeker ccd5945d38
Merge pull request #1942 from martin-frbg/issue1720
Delete the pthread key on cleanup in TLS mode
2018-12-30 14:47:05 +01:00
Martin Kroeker bba1e67269
Delete the pthread key on cleanup in TLS mode
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
2018-12-29 21:59:31 +01:00
Martin Kroeker f343ed65b5
Avoid taking the root of a negative number
Fixes #1924 where numpy 1.17+ would report the (transient) FE_INVALID exception raised for the domain error.
2018-12-22 22:30:29 +01:00
Martin Kroeker 0bf6d74e5f
Fix typo in previous commit for arm dynamic arch 2018-12-07 19:37:33 +01:00
Martin Kroeker 2b355592e3
Make sure to use the arm version of dynamic.c in ARM64 DYNAMIC_ARCH
cf. #1908
2018-12-07 16:25:55 +01:00
Andrew 2601cd58ab remove surplus locking code , only enabled w x86, disabled or never enabled on all others 2018-11-30 11:38:19 +01:00
Martin Kroeker 97d7298973
call it OpenBLAS not just version 2018-11-29 11:52:08 +01:00
Martin Kroeker de0d0ed52f
Improve formatting of config output 2018-11-29 11:28:19 +01:00
Martin Kroeker 816775e309
Add version information to openblas_get_config output 2018-11-29 00:06:44 +01:00
Martin Kroeker f72fdf525c
Merge pull request #1875 from martin-frbg/issue1851
Serialize accesses to parallelized level3 functions from multiple cal…
2018-11-25 20:53:46 +01:00
Martin Kroeker 113cb00b95
fix missing parenthesis 2018-11-19 21:01:36 +01:00
Martin Kroeker 5192651706
Add CriticalSection handling instead of mutexes for Windows 2018-11-19 17:58:22 +01:00
Martin Kroeker 2e6fae2aad
Serialize accesses to parallelized level3 functions from multiple callers
for #1851
2018-11-19 14:02:50 +01:00
Martin Kroeker 368d14f8c8
Fix harmless typo
fixes #1872
2018-11-16 14:58:28 +01:00
Martin Kroeker 0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
otherwise the introduction of a static array in 8e5a108 to improve #532 breaks concurrent calls from multiple threads as seen in #1844
2018-11-10 15:45:54 +01:00
Arjan van de Ven 5b708e5eb1 sgemm/dgemm: add a way for an arch kernel to specify prefered sizes
The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.

this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.

The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)
2018-11-01 01:43:20 +00:00
Martin Kroeker f5595d0262
Merge pull request #1843 from martin-frbg/aix_numprocs
Add get_num_procs implementation for AIX
2018-10-31 21:25:15 +01:00
Martin Kroeker 326d394a0f
Add get_num_procs implementation for AIX
(and copy HAIKU implementation to the non-TLS version of the code as well)
2018-10-31 18:38:22 +01:00
Erik M. Bray 38cf5d9364 ensure that threading has been initialized in the first place before calling openblas_set_num_threads 2018-10-28 21:16:52 +00:00
Ashwin Sekhar T K d5aeff636f ARM64: Enable DYNAMIC_ARCH
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt).

If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K d50abc8903 ARM64: Move parameters from parameter.c to param.h
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K 21f46a1cf2 ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Andrew 3439158dea address #1782 2nd loop 2018-10-03 21:20:50 +02:00