OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	c49d46f25f	fix function typecast	2021-12-21 18:49:18 +01:00
Martin Kroeker	f4f74941bd	Update conditional for atomics to use HAVE_C11	2020-07-18 17:14:50 +00:00
Ali Saidi	208c7e7ca5	Use acq/rel semantics to pass flags/pointers in getrf_parallel. The current implementation has locks, but the locks each only have a critical section of one variable so atomic reads/writes with barriers can be used to achieve the same behavior. Like the previous patch, pthread_mutex_lock isn't fair, so in a tight loop the previous thread that has the lock can keep it starving another thread, even if that thread is about to write the data that will stop the current thread from spinning. On a 64c Arm system this improves performance by 20x on sgesv.goto.	2020-03-06 06:22:31 +00:00
Andrew	575a84398a	remove redundant code #2113	2019-05-07 23:46:54 +03:00
Zhiyong Dang	3716267124	Change _STDC_VERSION__ to __STDC_VERSION__ Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42	2018-05-11 12:15:08 +08:00
Martin Kroeker	20c6c38e51	Merge branch 'develop' into atomic	2018-04-07 12:09:39 +02:00
Martin Kroeker	8ec28ff461	Remove unguarded use of _Atomic and fix tabbing	2018-04-04 22:40:30 +02:00
Martin Kroeker	bb9876db33	Fix thread races and infinite looping on systems with many cpus On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497. This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.	2018-04-04 18:16:52 +02:00
Martin Kroeker	40160ff3c1	Use _Atomic instead of volatile for thread safety where C11 is supported	2018-03-10 00:15:44 +01:00
Ashwin Sekhar T K	3918d17025	LAPACK: Fix lapack-test errors in ARM64 threaded version	2017-01-31 23:36:23 +05:30
Werner Saar	c81dc6322f	prepared lapack/potrf functions for UNROLL values, that are not a power of two	2017-01-10 10:50:28 +01:00
Werner Saar	3e1bbd6b5f	prepared lapack/getrf functions for UNROLL values, that are not a power of two	2017-01-09 12:57:26 +01:00
Hank Anderson	e74462a3f5	Moved declarations to start of functions to satisfy MSVC C89 implementation.	2015-02-11 11:16:57 -06:00
Hank Anderson	056ba26755	Changed a number of inline calls to use __inline. MSVC doesn't inmplement C99, so can't use the inline keyword. __inline appears to work in MSVC and GCC.	2015-02-11 11:13:17 -06:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
Zhang Xianyi	5048a80032	Refs #283 . Fixed the incorrect usage of long data type for Windows 64.	2013-11-14 13:46:42 +08:00
Zhang Xianyi	32d2ca3035	Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows. I used a smaller threshold since the stack size is 1MB on windows.	2013-07-11 03:20:02 +08:00
Zhang Xianyi	5d3312142a	Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3. When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256. typedef struct { volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE]; } job_t; job_t job[MAX_CPU_NUMBER]; The job array is equal 8MB. Thus, We use malloc instead of stack allocation.	2013-07-08 01:07:05 +08:00
Zhang Xianyi	1b056c5328	Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf.	2012-08-09 20:06:51 +08:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

20 Commits