luz.paz
daf2fec12d
Misc. typo fixes
...
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Erik M. Bray
4ad694eda1
Fix for #2063 : The DllMain used in Cygwin did not run the thread memory
...
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
2019-03-19 09:26:50 +01:00
Martin Kroeker
d7b2c53c0b
Merge pull request #2039 from brada4/meminit
...
Address warning in memory.c
2019-03-05 12:11:15 +01:00
Martin Kroeker
6c83b878f6
Merge pull request #2040 from martin-frbg/locks2002
...
Restore locking optimizations for OpenMP case
2019-03-04 15:07:14 +01:00
Martin Kroeker
af480b02a4
Restore locking optimizations for OpenMP case
...
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
2019-03-03 14:17:07 +01:00
Andrew
e4a79be6bb
address warning introed with #1814 et al
2019-03-03 09:05:11 +02:00
Martin Kroeker
03a2bf2602
Fix potential memory leak in cpu enumeration on Linux ( #2008 )
...
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
2019-02-10 23:24:45 +01:00
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
...
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f
rather than a8002e2
, thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
2019-02-07 20:06:13 +01:00
Martin Kroeker
ad2c386d6a
Move TLS key deletion to openblas_quit
...
fixes #1954 (as suggested by thrasibule in that issue)
2019-01-10 00:32:50 +01:00
Martin Kroeker
bba1e67269
Delete the pthread key on cleanup in TLS mode
...
to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes #1720
2018-12-29 21:59:31 +01:00
Andrew
2601cd58ab
remove surplus locking code , only enabled w x86, disabled or never enabled on all others
2018-11-30 11:38:19 +01:00
Martin Kroeker
326d394a0f
Add get_num_procs implementation for AIX
...
(and copy HAIKU implementation to the non-TLS version of the code as well)
2018-10-31 18:38:22 +01:00
Andrew
3439158dea
address #1782 2nd loop
2018-10-03 21:20:50 +02:00
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
...
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
...
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
b55690a659
typo fix
2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986
Rewrite glibc version check
2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd
Update memory.c
2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434
Merge branch 'develop' into interim033
2018-08-25 19:45:19 +02:00
Martin Kroeker
fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739
2018-08-25 19:35:16 +02:00
Zoltán Mizsei
6463bffd59
Haiku supporting patches
2018-08-02 20:49:14 +02:00
Martin Kroeker
8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
...
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner
6400868e55
Fix #1705 where we incorrectly calculate page locations.
...
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Martin Kroeker
66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
...
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker
43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
...
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker
7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
...
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker
b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
...
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker
a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
...
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker
4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
...
fixes #1641
2018-06-30 23:57:50 +02:00
Craig Donner
28c28ed275
Fix data races reported by TSAN.
2018-06-21 16:41:02 +01:00
oon3m0oo
a399d00425
Further improvements to memory.c. ( #1625 )
...
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
over-allocating; this saves having external structures do the bookkeeping, and
reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
read than before
2018-06-20 22:04:03 +02:00
Craig Donner
bf40f806ef
Remove the need for most locking in memory.c.
...
Using thread local storage for tracking memory allocations means that threads
no longer have to lock at all when doing memory allocations / frees. This
particularly helps the gemm driver since it does an allocation per invocation.
Even without threading at all, this helps, since even calling a lock with
no contention has a cost:
Before this change, no threading:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 102 ns 102 ns 13504412
BM_SGEMM/6 175 ns 175 ns 7997580
BM_SGEMM/8 205 ns 205 ns 6842073
BM_SGEMM/10 266 ns 266 ns 5294919
BM_SGEMM/16 478 ns 478 ns 2963441
BM_SGEMM/20 690 ns 690 ns 2144755
BM_SGEMM/32 1906 ns 1906 ns 716981
BM_SGEMM/40 2983 ns 2983 ns 473218
BM_SGEMM/64 9421 ns 9422 ns 148450
BM_SGEMM/72 12630 ns 12631 ns 112105
BM_SGEMM/80 15845 ns 15846 ns 89118
BM_SGEMM/90 25675 ns 25676 ns 54332
BM_SGEMM/100 29864 ns 29865 ns 47120
BM_SGEMM/112 37841 ns 37842 ns 36717
BM_SGEMM/128 56531 ns 56532 ns 25361
BM_SGEMM/140 75886 ns 75888 ns 18143
BM_SGEMM/150 98493 ns 98496 ns 14299
BM_SGEMM/160 102620 ns 102622 ns 13381
BM_SGEMM/170 135169 ns 135173 ns 10231
BM_SGEMM/180 146170 ns 146172 ns 9535
BM_SGEMM/189 190226 ns 190231 ns 7397
BM_SGEMM/200 194513 ns 194519 ns 7210
BM_SGEMM/256 396561 ns 396573 ns 3531
```
with this change:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 95 ns 95 ns 14500387
BM_SGEMM/6 166 ns 166 ns 8381763
BM_SGEMM/8 196 ns 196 ns 7277044
BM_SGEMM/10 256 ns 256 ns 5515721
BM_SGEMM/16 463 ns 463 ns 3025197
BM_SGEMM/20 636 ns 636 ns 2070213
BM_SGEMM/32 1885 ns 1885 ns 739444
BM_SGEMM/40 2969 ns 2969 ns 472152
BM_SGEMM/64 9371 ns 9372 ns 148932
BM_SGEMM/72 12431 ns 12431 ns 112919
BM_SGEMM/80 15615 ns 15616 ns 89978
BM_SGEMM/90 25397 ns 25398 ns 55041
BM_SGEMM/100 29445 ns 29446 ns 47540
BM_SGEMM/112 37530 ns 37531 ns 37286
BM_SGEMM/128 55373 ns 55375 ns 25277
BM_SGEMM/140 76241 ns 76241 ns 18259
BM_SGEMM/150 102196 ns 102200 ns 13736
BM_SGEMM/160 101521 ns 101525 ns 13556
BM_SGEMM/170 136182 ns 136184 ns 10567
BM_SGEMM/180 146861 ns 146864 ns 9035
BM_SGEMM/189 192632 ns 192632 ns 7231
BM_SGEMM/200 198547 ns 198555 ns 6995
BM_SGEMM/256 392316 ns 392330 ns 3539
```
Before, when built with USE_THREAD=1, GEMM_MULTITHREAD_THRESHOLD = 4, the cost
of small matrix operations was overshadowed by thread locking (look smaller than
32) even when not explicitly spawning threads:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 328 ns 328 ns 4170562
BM_SGEMM/6 396 ns 396 ns 3536400
BM_SGEMM/8 418 ns 418 ns 3330102
BM_SGEMM/10 491 ns 491 ns 2863047
BM_SGEMM/16 710 ns 710 ns 2028314
BM_SGEMM/20 871 ns 871 ns 1581546
BM_SGEMM/32 2132 ns 2132 ns 657089
BM_SGEMM/40 3197 ns 3196 ns 437969
BM_SGEMM/64 9645 ns 9645 ns 144987
BM_SGEMM/72 35064 ns 32881 ns 50264
BM_SGEMM/80 37661 ns 35787 ns 42080
BM_SGEMM/90 36507 ns 36077 ns 40091
BM_SGEMM/100 32513 ns 31850 ns 48607
BM_SGEMM/112 41742 ns 41207 ns 37273
BM_SGEMM/128 67211 ns 65095 ns 21933
BM_SGEMM/140 68263 ns 67943 ns 19245
BM_SGEMM/150 121854 ns 115439 ns 10660
BM_SGEMM/160 116826 ns 115539 ns 10000
BM_SGEMM/170 126566 ns 122798 ns 11960
BM_SGEMM/180 130088 ns 127292 ns 11503
BM_SGEMM/189 120309 ns 116634 ns 13162
BM_SGEMM/200 114559 ns 110993 ns 10000
BM_SGEMM/256 217063 ns 207806 ns 6417
```
and after, it's gone (note this includes my other change which reduces calls
to num_cpu_avail):
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 95 ns 95 ns 12347650
BM_SGEMM/6 166 ns 166 ns 8259683
BM_SGEMM/8 193 ns 193 ns 7162210
BM_SGEMM/10 258 ns 258 ns 5415657
BM_SGEMM/16 471 ns 471 ns 2981009
BM_SGEMM/20 666 ns 666 ns 2148002
BM_SGEMM/32 1903 ns 1903 ns 738245
BM_SGEMM/40 2969 ns 2969 ns 473239
BM_SGEMM/64 9440 ns 9440 ns 148442
BM_SGEMM/72 37239 ns 33330 ns 46813
BM_SGEMM/80 57350 ns 55949 ns 32251
BM_SGEMM/90 36275 ns 36249 ns 42259
BM_SGEMM/100 31111 ns 31008 ns 45270
BM_SGEMM/112 43782 ns 40912 ns 34749
BM_SGEMM/128 67375 ns 64406 ns 22443
BM_SGEMM/140 76389 ns 67003 ns 21430
BM_SGEMM/150 72952 ns 71830 ns 19793
BM_SGEMM/160 97039 ns 96858 ns 11498
BM_SGEMM/170 123272 ns 122007 ns 11855
BM_SGEMM/180 126828 ns 126505 ns 11567
BM_SGEMM/189 115179 ns 114665 ns 11044
BM_SGEMM/200 89289 ns 87259 ns 16147
BM_SGEMM/256 226252 ns 222677 ns 7375
```
I've also tested this with ThreadSanitizer and found no data races during
execution. I'm not sure why 200 is always faster than it's neighbors, we must
be hitting some optimal cache size or something.
2018-06-14 16:54:58 +01:00
Matthew Brett
a8002e283a
Revert "take out unused variables"
...
This reverts commit e5752ff9b3
.
The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.
Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
f29389c7ac
Merge pull request #1520 from martin-frbg/cpucounts
...
Catch invalid cpu count returned by CPU_COUNT_S
2018-04-14 22:24:34 +02:00
Martin Kroeker
7c861605b2
Catch invalid cpu count returned by CPU_COUNT_S
...
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker
d636b418af
Merge pull request #1504 from ararslan/aa/openbsd
...
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan
a41d241a0e
Add support for DragonFly BSD
2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52
Allow building on OpenBSD
...
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
01c4b82f04
Update memory.c
2018-03-31 22:32:06 +02:00
Martin Kroeker
93db123f7e
Update memory.c
2018-03-29 13:13:49 +02:00
Martin Kroeker
752fdb5dd8
Add workaround for old gcc and clang versions
...
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker
7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
2018-02-21 11:45:33 +01:00
Martin Kroeker
8866e393a2
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:12 +01:00
Martin Kroeker
3119b2ab4c
Add locks only for non-OPENMP multithreading
...
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray
8f5f614615
On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.
2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd
Perform blas_thread_shutdown with pthread_atfork() on Cygwin
...
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Andrew
e5752ff9b3
take out unused variables
2018-01-20 11:42:31 +01:00
Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Martin Kroeker
ba1f91f17b
Convert another caller of "allocation" to LOCK_COMMAND
...
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
e882f3d6f3
Fix thread data race in memory.c
2017-09-09 18:58:38 +02:00