Andrew
2601cd58ab
remove surplus locking code , only enabled w x86, disabled or never enabled on all others
2018-11-30 11:38:19 +01:00
Martin Kroeker
97d7298973
call it OpenBLAS not just version
2018-11-29 11:52:08 +01:00
Martin Kroeker
de0d0ed52f
Improve formatting of config output
2018-11-29 11:28:19 +01:00
Martin Kroeker
816775e309
Add version information to openblas_get_config output
2018-11-29 00:06:44 +01:00
Martin Kroeker
f72fdf525c
Merge pull request #1875 from martin-frbg/issue1851
...
Serialize accesses to parallelized level3 functions from multiple cal…
2018-11-25 20:53:46 +01:00
Martin Kroeker
113cb00b95
fix missing parenthesis
2018-11-19 21:01:36 +01:00
Martin Kroeker
5192651706
Add CriticalSection handling instead of mutexes for Windows
2018-11-19 17:58:22 +01:00
Martin Kroeker
2e6fae2aad
Serialize accesses to parallelized level3 functions from multiple callers
...
for #1851
2018-11-19 14:02:50 +01:00
Martin Kroeker
368d14f8c8
Fix harmless typo
...
fixes #1872
2018-11-16 14:58:28 +01:00
Martin Kroeker
0427277cef
Allow optimization for small m, large n only if it can be made threadsafe
...
otherwise the introduction of a static array in 8e5a108
to improve #532 breaks concurrent calls from multiple threads as seen in #1844
2018-11-10 15:45:54 +01:00
Arjan van de Ven
5b708e5eb1
sgemm/dgemm: add a way for an arch kernel to specify prefered sizes
...
The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.
this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.
The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)
2018-11-01 01:43:20 +00:00
Martin Kroeker
f5595d0262
Merge pull request #1843 from martin-frbg/aix_numprocs
...
Add get_num_procs implementation for AIX
2018-10-31 21:25:15 +01:00
Martin Kroeker
326d394a0f
Add get_num_procs implementation for AIX
...
(and copy HAIKU implementation to the non-TLS version of the code as well)
2018-10-31 18:38:22 +01:00
Erik M. Bray
38cf5d9364
ensure that threading has been initialized in the first place before calling openblas_set_num_threads
2018-10-28 21:16:52 +00:00
Ashwin Sekhar T K
d5aeff636f
ARM64: Enable DYNAMIC_ARCH
...
Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid
feature in linux kernel to detect the core type at runtime
(https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt ).
If this feature is missing in kernel, then the user should use the
OPENBLAS_CORETYPE env variable to select the desired core type.
2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
...
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
...
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Andrew
3439158dea
address #1782 2nd loop
2018-10-03 21:20:50 +02:00
Martin Kroeker
28aa94bf4b
Include thread numbers in failure message from blas_thread_init
...
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
...
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
...
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
b55690a659
typo fix
2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986
Rewrite glibc version check
2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd
Update memory.c
2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434
Merge branch 'develop' into interim033
2018-08-25 19:45:19 +02:00
Martin Kroeker
fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739
2018-08-25 19:35:16 +02:00
Zoltán Mizsei
6463bffd59
Haiku supporting patches
2018-08-02 20:49:14 +02:00
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
...
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
...
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
...
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
...
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
...
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
...
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker
36aea5ce2d
Merge pull request #1680 from martin-frbg/snprint
...
Fix wrong redefinitions of snprintf for older MSVC
2018-07-12 14:05:13 +02:00
Martin Kroeker
571e9de2ac
Fix definition of snprintf for MSVC
...
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
2018-07-12 11:42:25 +02:00
Martin Kroeker
448ed15115
Merge pull request #1678 from martin-frbg/issue1677
...
Define snprintf for older versions of MSVC
2018-07-12 09:21:34 +02:00
Martin Kroeker
045fb5ea2c
Define snprintf for older versions of MSVC
...
for #1677
2018-07-12 07:30:58 +02:00
Martin Kroeker
4dd70d98d7
Merge pull request #1667 from xianyi/revert-1642-develop
...
Revert "Rewrite &= -> = and simplify the initial blocking phase."
2018-07-04 08:27:21 +02:00
Martin Kroeker
504310eeb9
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
...
Add cpuid for AMD Ryzen 2
2018-07-04 08:19:40 +02:00
Martin Kroeker
ea1f39518f
Merge pull request #1663 from martin-frbg/issue1641
...
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
2018-07-04 08:19:11 +02:00
Martin Kroeker
5f2a3c05cd
Revert "Rewrite &= -> = and simplify the initial blocking phase."
2018-07-03 21:42:28 +02:00
Martin Kroeker
d0ec4325cf
Add cpuid for AMD Ryzen 2
2018-07-03 21:03:24 +02:00
Martin Kroeker
a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
...
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker
9d15a3bd16
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
...
fixes 1659
2018-07-02 14:40:41 +02:00
Martin Kroeker
3d3c19717c
Merge pull request #1655 from martin-frbg/issue1641
...
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
2018-07-01 08:41:22 +02:00
Martin Kroeker
4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
...
fixes #1641
2018-06-30 23:57:50 +02:00
Martin Kroeker
750162a05f
Try gradual fallback for cores not in the dynamic core list
2018-06-25 21:02:31 +02:00
Martin Kroeker
e6d93f20f1
Merge pull request #2 from martin-frbg/develop
...
merge develop
2018-06-25 20:48:10 +02:00
Craig Donner
0144068537
Rewrite &= -> = and simplify the initial blocking phase.
2018-06-25 15:08:55 +01:00
Martin Kroeker
1833a67071
Add support for a user-defined list of dynamic targets
2018-06-23 19:42:15 +02:00