af480b02a4 
								
							 
						 
						
							
							
								
								Restore locking optimizations for OpenMP case  
							
							... 
							
							
							
							restore another accidentally dropped part of #1468  that was missed in #2004  to address performance regression reported in #1461  
							
						 
						
							2019-03-03 14:17:07 +01:00  
				
					
						
							
							
								 
						
							
								e4a79be6bb 
								
							 
						 
						
							
							
								
								address warning introed with  #1814  et al  
							
							
							
						 
						
							2019-03-03 09:05:11 +02:00  
				
					
						
							
							
								 
						
							
								03a2bf2602 
								
							 
						 
						
							
							
								
								Fix potential memory leak in cpu enumeration on Linux ( #2008 )  
							
							... 
							
							
							
							* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes  #2003  
							
						 
						
							2019-02-10 23:24:45 +01:00  
				
					
						
							
							
								 
						
							
								69edc5bbe7 
								
							 
						 
						
							
							
								
								Restore dropped patches in the non-TLS branch of memory.c ( #2004 )  
							
							... 
							
							
							
							* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91fa8002e2#1450 , #1468 , #1501 , #1504  and #1520 . 
							
						 
						
							2019-02-07 20:06:13 +01:00  
				
					
						
							
							
								 
						
							
								ad2c386d6a 
								
							 
						 
						
							
							
								
								Move TLS key deletion to openblas_quit  
							
							... 
							
							
							
							fixes  #1954  (as suggested by thrasibule in that issue) 
						
							2019-01-10 00:32:50 +01:00  
				
					
						
							
							
								 
						
							
								bba1e67269 
								
							 
						 
						
							
							
								
								Delete the pthread key on cleanup in TLS mode  
							
							... 
							
							
							
							to avoid a crash when OpenBLAS was loaded via dlopen and libc tries to clean up the leaked TLS after dlclose
Fixes  #1720  
							
						 
						
							2018-12-29 21:59:31 +01:00  
				
					
						
							
							
								 
						
							
								2601cd58ab 
								
							 
						 
						
							
							
								
								remove surplus locking code , only enabled w x86, disabled or never enabled on all others  
							
							
							
						 
						
							2018-11-30 11:38:19 +01:00  
				
					
						
							
							
								 
						
							
								326d394a0f 
								
							 
						 
						
							
							
								
								Add get_num_procs implementation for AIX  
							
							... 
							
							
							
							(and copy HAIKU implementation to the non-TLS version of the code as well) 
							
						 
						
							2018-10-31 18:38:22 +01:00  
				
					
						
							
							
								 
						
							
								3439158dea 
								
							 
						 
						
							
							
								
								address  #1782  2nd loop  
							
							
							
						 
						
							2018-10-03 21:20:50 +02:00  
				
					
						
							
							
								 
						
							
								1ad1e79062 
								
							 
						 
						
							
							
								
								Catch inadvertent USE_TLS=0 declaration  
							
							... 
							
							
							
							for #1766  
							
						 
						
							2018-09-19 18:03:43 +02:00  
				
					
						
							
							
								 
						
							
								b402626509 
								
							 
						 
						
							
							
								
								Do not use the new TLS code for non-threaded builds even if USE_TLS is set  
							
							... 
							
							
							
							Workaround for #1761  as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway). 
							
						 
						
							2018-09-16 12:43:36 +02:00  
				
					
						
							
							
								 
						
							
								b55690a659 
								
							 
						 
						
							
							
								
								typo fix  
							
							
							
						 
						
							2018-08-26 11:31:07 +02:00  
				
					
						
							
							
								 
						
							
								b902a40986 
								
							 
						 
						
							
							
								
								Rewrite glibc version check  
							
							
							
						 
						
							2018-08-26 11:18:02 +02:00  
				
					
						
							
							
								 
						
							
								5991d1a6cd 
								
							 
						 
						
							
							
								
								Update memory.c  
							
							
							
						 
						
							2018-08-25 22:12:40 +02:00  
				
					
						
							
							
								 
						
							
								b1b743f434 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into interim033  
							
							
							
						 
						
							2018-08-25 19:45:19 +02:00  
				
					
						
							
							
								 
						
							
								fd42ca462d 
								
							 
						 
						
							
							
								
								Combo of default pre-0.3.1 memory.c and band-aided version of PR1739  
							
							
							
						 
						
							2018-08-25 19:35:16 +02:00  
				
					
						
							
							
								 
						
							
								6463bffd59 
								
							 
						 
						
							
							
								
								Haiku supporting patches  
							
							
							
						 
						
							2018-08-02 20:49:14 +02:00  
				
					
						
							
							
								 
						
							
								8ef7d4fb54 
								
							 
						 
						
							
							
								
								Merge pull request  #1706  from oon3m0oo/develop  
							
							... 
							
							
							
							Fix  #1705  where we incorrectly calculate page locations. 
						
							2018-08-02 18:53:34 +02:00  
				
					
						
							
							
								 
						
							
								6400868e55 
								
							 
						 
						
							
							
								
								Fix   #1705  where we incorrectly calculate page locations.  
							
							... 
							
							
							
							Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly.  Now we detect if we've
found enough pages for the allocation and terminate the loop. 
							
						 
						
							2018-08-02 16:21:19 +01:00  
				
					
						
							
							
								 
						
							
								66fcdd5be8 
								
							 
						 
						
							
							
								
								Merge pull request  #1695  from martin-frbg/issue1692  
							
							... 
							
							
							
							Unset memory table entry, not just the local pointer to it on shutdown 
							
						 
						
							2018-07-22 16:34:09 +02:00  
				
					
						
							
							
								 
						
							
								43ac839c16 
								
							 
						 
						
							
							
								
								Unset memory table entry, not just the temporary pointer to it on shutdown  
							
							... 
							
							
							
							to fix crash with multiple instances of OpenBLAS, #1692  
							
						 
						
							2018-07-22 09:19:19 +02:00  
				
					
						
							
							
								 
						
							
								7ba5936ecd 
								
							 
						 
						
							
							
								
								Merge pull request  #1688  from martin-frbg/issue1673  
							
							... 
							
							
							
							Temporarily disable special handling of OPENMP thread memory allocation 
							
						 
						
							2018-07-19 19:03:45 +02:00  
				
					
						
							
							
								 
						
							
								b14f44d2ad 
								
							 
						 
						
							
							
								
								Temporarily disable special handling of OPENMP thread memory allocation  
							
							... 
							
							
							
							for issue #1673  
							
						 
						
							2018-07-19 08:57:56 +02:00  
				
					
						
							
							
								 
						
							
								a49203b48c 
								
							 
						 
						
							
							
								
								Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave  
							
							... 
							
							
							
							for #1641  
							
						 
						
							2018-07-03 17:35:54 +02:00  
				
					
						
							
							
								 
						
							
								4e9c34018e 
								
							 
						 
						
							
							
								
								Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS  
							
							... 
							
							
							
							fixes  #1641  
						
							2018-06-30 23:57:50 +02:00  
				
					
						
							
							
								 
						
							
								28c28ed275 
								
							 
						 
						
							
							
								
								Fix data races reported by TSAN.  
							
							
							
						 
						
							2018-06-21 16:41:02 +01:00  
				
					
						
							
							
								 
						
							
								a399d00425 
								
							 
						 
						
							
							
								
								Further improvements to memory.c. ( #1625 )  
							
							... 
							
							
							
							- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
  over-allocating; this saves having external structures do the bookkeeping, and
  reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
  my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
  read than before 
							
						 
						
							2018-06-20 22:04:03 +02:00  
				
					
						
							
							
								 
						
							
								bf40f806ef 
								
							 
						 
						
							
							
								
								Remove the need for most locking in memory.c.  
							
							... 
							
							
							
							Using thread local storage for tracking memory allocations means that threads
no longer have to lock at all when doing memory allocations / frees. This
particularly helps the gemm driver since it does an allocation per invocation.
Even without threading at all, this helps, since even calling a lock with
no contention has a cost:
Before this change, no threading:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          102 ns        102 ns   13504412
BM_SGEMM/6          175 ns        175 ns    7997580
BM_SGEMM/8          205 ns        205 ns    6842073
BM_SGEMM/10         266 ns        266 ns    5294919
BM_SGEMM/16         478 ns        478 ns    2963441
BM_SGEMM/20         690 ns        690 ns    2144755
BM_SGEMM/32        1906 ns       1906 ns     716981
BM_SGEMM/40        2983 ns       2983 ns     473218
BM_SGEMM/64        9421 ns       9422 ns     148450
BM_SGEMM/72       12630 ns      12631 ns     112105
BM_SGEMM/80       15845 ns      15846 ns      89118
BM_SGEMM/90       25675 ns      25676 ns      54332
BM_SGEMM/100      29864 ns      29865 ns      47120
BM_SGEMM/112      37841 ns      37842 ns      36717
BM_SGEMM/128      56531 ns      56532 ns      25361
BM_SGEMM/140      75886 ns      75888 ns      18143
BM_SGEMM/150      98493 ns      98496 ns      14299
BM_SGEMM/160     102620 ns     102622 ns      13381
BM_SGEMM/170     135169 ns     135173 ns      10231
BM_SGEMM/180     146170 ns     146172 ns       9535
BM_SGEMM/189     190226 ns     190231 ns       7397
BM_SGEMM/200     194513 ns     194519 ns       7210
BM_SGEMM/256     396561 ns     396573 ns       3531
```
with this change:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   14500387
BM_SGEMM/6          166 ns        166 ns    8381763
BM_SGEMM/8          196 ns        196 ns    7277044
BM_SGEMM/10         256 ns        256 ns    5515721
BM_SGEMM/16         463 ns        463 ns    3025197
BM_SGEMM/20         636 ns        636 ns    2070213
BM_SGEMM/32        1885 ns       1885 ns     739444
BM_SGEMM/40        2969 ns       2969 ns     472152
BM_SGEMM/64        9371 ns       9372 ns     148932
BM_SGEMM/72       12431 ns      12431 ns     112919
BM_SGEMM/80       15615 ns      15616 ns      89978
BM_SGEMM/90       25397 ns      25398 ns      55041
BM_SGEMM/100      29445 ns      29446 ns      47540
BM_SGEMM/112      37530 ns      37531 ns      37286
BM_SGEMM/128      55373 ns      55375 ns      25277
BM_SGEMM/140      76241 ns      76241 ns      18259
BM_SGEMM/150     102196 ns     102200 ns      13736
BM_SGEMM/160     101521 ns     101525 ns      13556
BM_SGEMM/170     136182 ns     136184 ns      10567
BM_SGEMM/180     146861 ns     146864 ns       9035
BM_SGEMM/189     192632 ns     192632 ns       7231
BM_SGEMM/200     198547 ns     198555 ns       6995
BM_SGEMM/256     392316 ns     392330 ns       3539
```
Before, when built with USE_THREAD=1, GEMM_MULTITHREAD_THRESHOLD = 4, the cost
of small matrix operations was overshadowed by thread locking (look smaller than
32) even when not explicitly spawning threads:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          328 ns        328 ns    4170562
BM_SGEMM/6          396 ns        396 ns    3536400
BM_SGEMM/8          418 ns        418 ns    3330102
BM_SGEMM/10         491 ns        491 ns    2863047
BM_SGEMM/16         710 ns        710 ns    2028314
BM_SGEMM/20         871 ns        871 ns    1581546
BM_SGEMM/32        2132 ns       2132 ns     657089
BM_SGEMM/40        3197 ns       3196 ns     437969
BM_SGEMM/64        9645 ns       9645 ns     144987
BM_SGEMM/72       35064 ns      32881 ns      50264
BM_SGEMM/80       37661 ns      35787 ns      42080
BM_SGEMM/90       36507 ns      36077 ns      40091
BM_SGEMM/100      32513 ns      31850 ns      48607
BM_SGEMM/112      41742 ns      41207 ns      37273
BM_SGEMM/128      67211 ns      65095 ns      21933
BM_SGEMM/140      68263 ns      67943 ns      19245
BM_SGEMM/150     121854 ns     115439 ns      10660
BM_SGEMM/160     116826 ns     115539 ns      10000
BM_SGEMM/170     126566 ns     122798 ns      11960
BM_SGEMM/180     130088 ns     127292 ns      11503
BM_SGEMM/189     120309 ns     116634 ns      13162
BM_SGEMM/200     114559 ns     110993 ns      10000
BM_SGEMM/256     217063 ns     207806 ns       6417
```
and after, it's gone (note this includes my other change which reduces calls
to num_cpu_avail):
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   12347650
BM_SGEMM/6          166 ns        166 ns    8259683
BM_SGEMM/8          193 ns        193 ns    7162210
BM_SGEMM/10         258 ns        258 ns    5415657
BM_SGEMM/16         471 ns        471 ns    2981009
BM_SGEMM/20         666 ns        666 ns    2148002
BM_SGEMM/32        1903 ns       1903 ns     738245
BM_SGEMM/40        2969 ns       2969 ns     473239
BM_SGEMM/64        9440 ns       9440 ns     148442
BM_SGEMM/72       37239 ns      33330 ns      46813
BM_SGEMM/80       57350 ns      55949 ns      32251
BM_SGEMM/90       36275 ns      36249 ns      42259
BM_SGEMM/100      31111 ns      31008 ns      45270
BM_SGEMM/112      43782 ns      40912 ns      34749
BM_SGEMM/128      67375 ns      64406 ns      22443
BM_SGEMM/140      76389 ns      67003 ns      21430
BM_SGEMM/150      72952 ns      71830 ns      19793
BM_SGEMM/160      97039 ns      96858 ns      11498
BM_SGEMM/170     123272 ns     122007 ns      11855
BM_SGEMM/180     126828 ns     126505 ns      11567
BM_SGEMM/189     115179 ns     114665 ns      11044
BM_SGEMM/200      89289 ns      87259 ns      16147
BM_SGEMM/256     226252 ns     222677 ns       7375
```
I've also tested this with ThreadSanitizer and found no data races during
execution.  I'm not sure why 200 is always faster than it's neighbors, we must
be hitting some optimal cache size or something. 
							
						 
						
							2018-06-14 16:54:58 +01:00  
				
					
						
							
							
								 
						
							
								a8002e283a 
								
							 
						 
						
							
							
								
								Revert "take out unused variables"  
							
							... 
							
							
							
							This reverts commit e5752ff9b3 
							
						 
						
							2018-06-01 23:20:00 +01:00  
				
					
						
							
							
								 
						
							
								f29389c7ac 
								
							 
						 
						
							
							
								
								Merge pull request  #1520  from martin-frbg/cpucounts  
							
							... 
							
							
							
							Catch invalid cpu count returned by CPU_COUNT_S 
							
						 
						
							2018-04-14 22:24:34 +02:00  
				
					
						
							
							
								 
						
							
								7c861605b2 
								
							 
						 
						
							
							
								
								Catch invalid cpu count returned by CPU_COUNT_S  
							
							... 
							
							
							
							mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide 
							
						 
						
							2018-04-14 18:29:10 +02:00  
				
					
						
							
							
								 
						
							
								d636b418af 
								
							 
						 
						
							
							
								
								Merge pull request  #1504  from ararslan/aa/openbsd  
							
							... 
							
							
							
							Allow building on OpenBSD 
							
						 
						
							2018-04-04 15:26:46 +02:00  
				
					
						
							
							
								 
						
							
								a41d241a0e 
								
							 
						 
						
							
							
								
								Add support for DragonFly BSD  
							
							
							
						 
						
							2018-04-03 16:39:29 -07:00  
				
					
						
							
							
								 
						
							
								8da6b6ae52 
								
							 
						 
						
							
							
								
								Allow building on OpenBSD  
							
							... 
							
							
							
							With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1. 
							
						 
						
							2018-04-02 10:48:22 -07:00  
				
					
						
							
							
								 
						
							
								01c4b82f04 
								
							 
						 
						
							
							
								
								Update memory.c  
							
							
							
						 
						
							2018-03-31 22:32:06 +02:00  
				
					
						
							
							
								 
						
							
								93db123f7e 
								
							 
						 
						
							
							
								
								Update memory.c  
							
							
							
						 
						
							2018-03-29 13:13:49 +02:00  
				
					
						
							
							
								 
						
							
								752fdb5dd8 
								
							 
						 
						
							
							
								
								Add workaround for old gcc and clang versions  
							
							... 
							
							
							
							Old gcc and clang do not handle constructor arguments, finally fix  #875  as discussed there, using the fedora patch 
							
						 
						
							2018-03-29 11:56:56 +02:00  
				
					
						
							
							
								 
						
							
								7646974227 
								
							 
						 
						
							
							
								
								Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading  
							
							
							
						 
						
							2018-02-21 11:45:33 +01:00  
				
					
						
							
							
								 
						
							
								8866e393a2 
								
							 
						 
						
							
							
								
								Revert "Add locks only for non-OPENMP multithreading"  
							
							
							
						 
						
							2018-02-20 17:17:12 +01:00  
				
					
						
							
							
								 
						
							
								3119b2ab4c 
								
							 
						 
						
							
							
								
								Add locks only for non-OPENMP multithreading  
							
							... 
							
							
							
							to migitate performance problems caused by #1052  and #1299  as seen in #1461  
							
						 
						
							2018-02-20 12:17:18 +01:00  
				
					
						
							
							
								 
						
							
								8f5f614615 
								
							 
						 
						
							
							
								
								On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.  
							
							
							
						 
						
							2018-02-06 12:23:27 +01:00  
				
					
						
							
							
								 
						
							
								f5fc109fbd 
								
							 
						 
						
							
							
								
								Perform blas_thread_shutdown with pthread_atfork() on Cygwin  
							
							... 
							
							
							
							Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state. 
							
						 
						
							2018-02-06 12:23:27 +01:00  
				
					
						
							
							
								 
						
							
								e5752ff9b3 
								
							 
						 
						
							
							
								
								take out unused variables  
							
							
							
						 
						
							2018-01-20 11:42:31 +01:00  
				
					
						
							
							
								 
						
							
								bfc2a88594 
								
							 
						 
						
							
							
								
								remove unused buffer  
							
							
							
						 
						
							2017-12-22 00:55:40 +01:00  
				
					
						
							
							
								 
						
							
								ba1f91f17b 
								
							 
						 
						
							
							
								
								Convert another caller of "allocation" to LOCK_COMMAND  
							
							... 
							
							
							
							... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock 
							
						 
						
							2017-09-09 20:30:33 +02:00  
				
					
						
							
							
								 
						
							
								e882f3d6f3 
								
							 
						 
						
							
							
								
								Fix thread data race in memory.c  
							
							
							
						 
						
							2017-09-09 18:58:38 +02:00  
				
					
						
							
							
								 
						
							
								63cfa32691 
								
							 
						 
						
							
							
								
								Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds  
							
							
							
						 
						
							2017-07-31 21:02:43 +02:00  
				
					
						
							
							
								 
						
							
								c4af196a2d 
								
							 
						 
						
							
							
								
								Honor cgroup/cpuset limits when enumerating cpus  
							
							
							
						 
						
							2017-07-25 22:47:34 +02:00  
				
					
						
							
							
								 
						
							
								480e697681 
								
							 
						 
						
							
							
								
								Revert "Honor cgroup/cpuset limits when enumerating cpus" ( #1246 )  
							
							
							
						 
						
							2017-07-24 16:17:50 +02:00  
				
					
						
							
							
								 
						
							
								731c518cff 
								
							 
						 
						
							
							
								
								Add files via upload  
							
							
							
						 
						
							2017-07-11 18:42:39 +02:00  
				
					
						
							
							
								 
						
							
								29fc429d9a 
								
							 
						 
						
							
							
								
								Honor cgroup/cpuset constraints when enumerating cpus  
							
							
							
						 
						
							2017-07-11 18:27:33 +02:00  
				
					
						
							
							
								 
						
							
								65e56cb29d 
								
							 
						 
						
							
							
								
								Add 64bit support for Microsoft Visual Studio  
							
							
							
						 
						
							2017-06-21 13:38:22 -07:00  
				
					
						
							
							
								 
						
							
								59c97cfee4 
								
							 
						 
						
							
							
								
								memory: Fix buffer overflow when position == NUM_BUFFERS  
							
							
							
						 
						
							2017-05-05 17:47:03 +01:00  
				
					
						
							
							
								 
						
							
								5fecfe0f42 
								
							 
						 
						
							
							
								
								memory: switch loop condition around in blas_memory_free  
							
							... 
							
							
							
							Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended. 
							
						 
						
							2017-05-05 16:01:58 +01:00  
				
					
						
							
							
								 
						
							
								4713e7c47f 
								
							 
						 
						
							
							
								
								ARM64: Add the VULCAN Target  
							
							
							
						 
						
							2017-01-10 15:01:17 +05:30  
				
					
						
							
							
								 
						
							
								51aa157e64 
								
							 
						 
						
							
							
								
								Relocate declaration of alloc_lock outside ifdef block  
							
							
							
						 
						
							2017-01-09 01:10:43 +01:00  
				
					
						
							
							
								 
						
							
								87c7d10b34 
								
							 
						 
						
							
							
								
								Fix thread data races detected by helgrind 3.12  
							
							... 
							
							
							
							Ref. #995 , may possibly help solve issues seen in 660,883 
							
						 
						
							2017-01-08 23:33:51 +01:00  
				
					
						
							
							
								 
						
							
								66c9a9b33d 
								
							 
						 
						
							
							
								
								Merge pull request  #981  from howard0su/develop  
							
							... 
							
							
							
							USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN 
							
						 
						
							2016-10-17 11:32:57 +08:00  
				
					
						
							
							
								 
						
							
								ff1da01476 
								
							 
						 
						
							
							
								
								USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN  
							
							... 
							
							
							
							to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU. 
							
						 
						
							2016-10-13 12:37:50 +00:00  
				
					
						
							
							
								 
						
							
								ef52a9266b 
								
							 
						 
						
							
							
								
								Fixed   #979 . Patch for NetBSD.  
							
							
							
						 
						
							2016-10-13 10:17:07 +08:00  
				
					
						
							
							
								 
						
							
								aa744dfa59 
								
							 
						 
						
							
							
								
								Update memory.c  
							
							
							
						 
						
							2016-03-22 20:02:37 +08:00  
				
					
						
							
							
								 
						
							
								61cf8f74d9 
								
							 
						 
						
							
							
								
								Fix access violation on Windows while static linking  
							
							
							
						 
						
							2016-03-22 19:14:54 +08:00  
				
					
						
							
							
								 
						
							
								05196a8497 
								
							 
						 
						
							
							
								
								Refs  #716 . Only call getenv at init function.  
							
							
							
						 
						
							2016-03-09 12:50:07 -05:00  
				
					
						
							
							
								 
						
							
								6b85dbb6dc 
								
							 
						 
						
							
							
								
								Refs  #696 . Turn off stack limit setting on Linux.  
							
							... 
							
							
							
							I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux. 
							
						 
						
							2016-02-24 14:21:42 -05:00  
				
					
						
							
							
								 
						
							
								f5df444ceb 
								
							 
						 
						
							
							
								
								Merge pull request  #762  from jeromerobert/bug760  
							
							... 
							
							
							
							Let openblas_get_num_threads return the number of active threads 
							
						 
						
							2016-01-26 08:45:16 -06:00  
				
					
						
							
							
								 
						
							
								aaa8551c57 
								
							 
						 
						
							
							
								
								Merge pull request  #749  from lotheac/illumos_fixes  
							
							... 
							
							
							
							illumos fixes 
							
						 
						
							2016-01-26 08:42:20 -06:00  
				
					
						
							
							
								 
						
							
								0d87c1ffb6 
								
							 
						 
						
							
							
								
								Let openblas_get_num_threads return the number of active threads  
							
							... 
							
							
							
							... not the number of allocated threads.
Close  #760  
							
						 
						
							2016-01-26 13:04:16 +01:00  
				
					
						
							
							
								 
						
							
								97cd4b8aee 
								
							 
						 
						
							
							
								
								illumos fixes to memory.c  
							
							
							
						 
						
							2016-01-22 18:55:43 +02:00  
				
					
						
							
							
								 
						
							
								0d22551a6b 
								
							 
						 
						
							
							
								
								increase the stack size limit in the constructor  
							
							
							
						 
						
							2015-11-20 09:23:01 +01:00  
				
					
						
							
							
								 
						
							
								fbc21266e6 
								
							 
						 
						
							
							
								
								Minor C code fixes in driver/  
							
							
							
						 
						
							2015-11-09 14:15:49 +05:30  
				
					
						
							
							
								 
						
							
								2feef49fa8 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into cmake  
							
							... 
							
							
							
							Conflicts:
	driver/others/memory.c 
							
						 
						
							2015-10-26 14:54:34 -05:00  
				
					
						
							
							
								 
						
							
								1ce054fcb3 
								
							 
						 
						
							
							
								
								Refs  #669 . Fixed the build bug with gcc on Mac OS X.  
							
							
							
						 
						
							2015-10-22 11:07:35 -05:00  
				
					
						
							
							
								 
						
							
								94b125255f 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into cmake  
							
							... 
							
							
							
							Conflicts:
	driver/others/memory.c 
							
						 
						
							2015-10-13 04:46:08 +08:00  
				
					
						
							
							
								 
						
							
								11ac4665c8 
								
							 
						 
						
							
							
								
								Fixed   #654 . Make sure the gotoblas_init function is run before all other static initializations.  
							
							
							
						 
						
							2015-10-05 14:14:32 -05:00  
				
					
						
							
							
								 
						
							
								d3e2f0a1af 
								
							 
						 
						
							
							
								
								add missing barriers  
							
							... 
							
							
							
							should fix issue #597  
							
						 
						
							2015-08-16 15:37:02 +02:00  
				
					
						
							
							
								 
						
							
								dcd5ba4443 
								
							 
						 
						
							
							
								
								Merge branch 'cmake' of  https://github.com/hpanderson/OpenBLAS  into hpanderson_cmake  
							
							
							
						 
						
							2015-07-22 04:06:39 +08:00  
				
					
						
							
							
								 
						
							
								a11555c715 
								
							 
						 
						
							
							
								
								Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)  
							
							... 
							
							
							
							e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7
In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/. 
							
						 
						
							2015-05-20 21:57:27 -05:00  
				
					
						
							
							
								 
						
							
								ebb9eba987 
								
							 
						 
						
							
							
								
								Fix build with ALLOC_SHM=0 (Android NDK)  
							
							... 
							
							
							
							Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d. 
							
						 
						
							2015-05-10 00:10:26 -07:00  
				
					
						
							
							
								 
						
							
								9798481979 
								
							 
						 
						
							
							
								
								Refs  #478 ,  #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.  
							
							... 
							
							
							
							For gemv_t, directly use malloc to create the buffer. 
							
						 
						
							2015-04-13 19:45:27 -05:00  
				
					
						
							
							
								 
						
							
								b6438dedea 
								
							 
						 
						
							
							
								
								Fix issue  #508  
							
							... 
							
							
							
							Fix race condition during shutdown causing a crash in
gotoblas_set_affinity(). 
							
						 
						
							2015-03-18 13:22:43 +01:00  
				
					
						
							
							
								 
						
							
								5ae8993752 
								
							 
						 
						
							
							
								
								Added intrinsics for MSVC.  
							
							
							
						 
						
							2015-02-25 11:52:51 -06:00  
				
					
						
							
							
								 
						
							
								84d90d6ed8 
								
							 
						 
						
							
							
								
								Fixed some compiler errors/warnings for clang.  
							
							
							
						 
						
							2015-02-25 11:52:25 -06:00  
				
					
						
							
							
								 
						
							
								cfa9392ffa 
								
							 
						 
						
							
							
								
								Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.  
							
							
							
						 
						
							2015-02-08 01:30:23 -06:00  
				
					
						
							
							
								 
						
							
								65a847cd36 
								
							 
						 
						
							
							
								
								Introduce openblas_get_num_threads and openblas_get_num_procs  
							
							
							
						 
						
							2015-02-03 12:23:41 -05:00  
				
					
						
							
							
								 
						
							
								2fb02626da 
								
							 
						 
						
							
							
								
								Update organization info.  
							
							
							
						 
						
							2014-11-25 15:28:58 +08:00  
				
					
						
							
							
								 
						
							
								f7eb81a846 
								
							 
						 
						
							
							
								
								Fix link error on Linux/musl.  
							
							... 
							
							
							
							get_nprocs() is a GNU convenience function equivalent to POSIX2008
sysconf(_SC_NPROCESSORS_ONLN); the latter should be available in unistd.h
on any current *nix. (OS X supports this call since 10.5, and FreeBSD
currently supports it. But this commit does not change FreeBSD or OS X
versions.) 
							
						 
						
							2014-08-03 15:06:30 -07:00  
				
					
						
							
							
								 
						
							
								7a8949e0ce 
								
							 
						 
						
							
							
								
								Merge branch 'develop' of  https://github.com/TimothyGu/OpenBLAS  into TimothyGu-develop  
							
							... 
							
							
							
							Conflicts:
	driver/others/memory.c 
							
						 
						
							2014-06-28 20:51:31 +08:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								f41f03ab83 
								
							 
						 
						
							
							
								
								fix   #394 . this cleans up some handles after using them, and doesn't disable ALL process privileges upon success  
							
							
							
						 
						
							2014-06-27 12:16:57 -04:00  
				
					
						
							
							
								 
						
							
								2c556f093a 
								
							 
						 
						
							
							
								
								Add cast to function pointer to remove warning  
							
							
							
						 
						
							2014-02-25 11:08:32 +01:00  
				
					
						
							
							
								 
						
							
								3b027d2528 
								
							 
						 
						
							
							
								
								Do not reference pthread_atfork in non-SMP_SERVER mode  
							
							
							
						 
						
							2014-02-25 11:08:32 +01:00  
				
					
						
							
							
								 
						
							
								49bd98f410 
								
							 
						 
						
							
							
								
								Do not reference pthread_atfork under windows  
							
							
							
						 
						
							2014-02-19 19:25:48 +01:00  
				
					
						
							
							
								 
						
							
								138a841390 
								
							 
						 
						
							
							
								
								FIX   #294 : make OpenBLAS thread-pool resilient to fork via pthread_atfork  
							
							
							
						 
						
							2014-02-19 19:01:15 +01:00  
				
					
						
							
							
								 
						
							
								046e4013cb 
								
							 
						 
						
							
							
								
								Revert "Refs  #294 . Used pthread_atfork to avoid hang after a Unix fork."  
							
							... 
							
							
							
							This reverts commit 3617c22a56 
							
						 
						
							2014-02-19 18:32:54 +01:00  
				
					
						
							
							
								 
						
							
								3617c22a56 
								
							 
						 
						
							
							
								
								Refs  #294 . Used pthread_atfork to avoid hang after a Unix fork.  
							
							... 
							
							
							
							The problem is the mutex we used in blas_server. Thus, we must clear
the mutex before the fork and re-init them at parent and child process.
If you used OpenMP, GOMP has the same problem by now. Please try other OpenMP
implemantation. 
							
						 
						
							2014-02-18 15:36:04 +08:00  
				
					
						
							
							
								 
						
							
								5048a80032 
								
							 
						 
						
							
							
								
								Refs  #283 . Fixed the incorrect usage of long data type for Windows 64.  
							
							
							
						 
						
							2013-11-14 13:46:42 +08:00  
				
					
						
							
							
								 
						
							
								f54f5bac9e 
								
							 
						 
						
							
							
								
								Refs  #248 . Fixed the LSB compatiable issue for BLAS only.  
							
							... 
							
							
							
							For example, make CC=lsbcc NO_LAPACK=1. 
							
						 
						
							2013-07-09 15:38:03 +08:00  
				
					
						
							
							
								 
						
							
								5d3312142a 
								
							 
						 
						
							
							
								
								Refs  #221   #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.  
							
							... 
							
							
							
							When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t          job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation. 
							
						 
						
							2013-07-08 01:07:05 +08:00  
				
					
						
							
							
								 
						
							
								32dbeb636d 
								
							 
						 
						
							
							
								
								Refs  #221 . Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.  
							
							
							
						 
						
							2013-07-02 14:17:55 +08:00  
				
					
						
							
							
								 
						
							
								6751f7b9a7 
								
							 
						 
						
							
							
								
								Fixed   #157 . Only detect the number of physical CPU cores on Mac OSX.  
							
							
							
						 
						
							2012-11-13 15:48:57 +08:00