OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Martin Kroeker	6c431239da	Split test condition in LU computation - non-denormal for computation, exact zero for reporting singularity	2023-03-29 22:14:21 +02:00
Martin Kroeker	12aabb9f9b	fix conditional	2023-03-29 09:44:33 +02:00
Martin Kroeker	f3d21039ce	Improve fix from PR3924 (#3941 ) * compare denominator against DBL_MIN rather than a somewhat arbitrary small number near it	2023-03-16 15:09:32 +01:00
Martin Kroeker	3d27cbd9a3	avoid overflow in division	2023-02-26 23:44:14 +01:00
Martin Kroeker	a39ced0551	avoid overflow in division	2023-02-26 23:42:20 +01:00
Martin Kroeker	aa2a2d9c01	Conditionally compile files that may get replaced by ReLAPACK	2022-11-08 12:04:46 +01:00
Martin Kroeker	7656aba00e	Merge pull request #3493 from martin-frbg/casts+cleanup WIP casts and cleanups	2022-02-06 23:55:06 +01:00
Martin Kroeker	40003f8edb	Fix pivot offset calculation for negative incx	2022-01-17 00:11:18 +01:00
Martin Kroeker	57e2a72f40	Fix pivot offset calculation for negative incx	2022-01-17 00:10:21 +01:00
Martin Kroeker	3b6293f5a0	Fix offset calculation for negative incx	2022-01-17 00:09:14 +01:00
Martin Kroeker	afa0cece5c	Fix pivot offset calculation for negative incx	2022-01-17 00:08:20 +01:00
Martin Kroeker	eca2f50b48	Fix pivot offset calculation for negative incx	2022-01-17 00:07:33 +01:00
Martin Kroeker	0e9e951306	Fix pivot offset calculation for negative incx	2022-01-17 00:06:41 +01:00
Martin Kroeker	1b49ef8dcf	Fix pivot index for negative increments	2022-01-17 00:05:33 +01:00
Martin Kroeker	6b407a16cb	fix function typecasts	2021-12-21 18:51:28 +01:00
Martin Kroeker	aecb4a5e8d	fix function typecasts	2021-12-21 18:50:22 +01:00
Martin Kroeker	c49d46f25f	fix function typecast	2021-12-21 18:49:18 +01:00
gxw	af0a69f355	Add support for LOONGARCH64	2021-07-27 15:29:12 +08:00
Zhang Xianyi	d7ba7679b6	Merge branch 'develop' into risc-v	2020-10-16 23:27:38 +08:00
Martin Kroeker	4bb73c0171	Rename "HALF" type to "BFLOAT16"	2020-10-13 20:07:19 +02:00
Martin Kroeker	32733ded04	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:52:45 +02:00
Martin Kroeker	b27ca78a21	Adapt to having only a subset of variable types supported	2020-10-11 14:46:24 +02:00
Martin Kroeker	93454022a9	Adapt to having only a subset of variable types supported	2020-10-11 14:45:40 +02:00
Martin Kroeker	20cf1d773f	Adapt to having only a subset of variable types supported	2020-10-11 14:44:56 +02:00
Martin Kroeker	5c657fffad	Adapt to having only a subset of variable types supported	2020-10-11 14:44:13 +02:00
Martin Kroeker	b262058059	Adapt to having only a subset of variable types supported	2020-10-11 14:43:13 +02:00
Martin Kroeker	bc319cee82	Adapt to having only a subset of variable types supported	2020-10-11 14:42:26 +02:00
Martin Kroeker	e5966f8606	Adapt to having only a subset of variable types supported	2020-10-11 14:41:43 +02:00
Martin Kroeker	9df12eb08f	Adapt to having only a subset of variable types supported	2020-10-11 14:40:51 +02:00
Martin Kroeker	cf53970bcb	Adapt to having only a subset of variable types supported	2020-10-11 14:40:06 +02:00
Martin Kroeker	dcd51d5c72	Adapt to having only a subset of variable types supported	2020-10-11 14:39:19 +02:00
Martin Kroeker	b8f95354c7	Adapt to having only a subset of variable types supported	2020-10-11 14:38:25 +02:00
Martin Kroeker	f194ad59e1	Use _Atomic instead of volatile where available (file moved from ../getrf) must have misplaced this in ../getrf when I made that change in March 2018 (`40160ff`) the only changes since then were RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50 Change _STDC_VERSION__ to __STDC_VERSION__ Zhiyong Dang committed on 11 May 2018 as `3716267`	2020-07-25 08:52:24 +02:00
Martin Kroeker	4fda217f99	Delete potrf_parallel.c (moving it to ../potrf)	2020-07-25 06:42:39 +00:00
Martin Kroeker	bbe119ee3b	Update conditional for atomics to use HAVE_C11	2020-07-18 17:19:59 +00:00
Martin Kroeker	f4f74941bd	Update conditional for atomics to use HAVE_C11	2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan	22bb50fb81	cmake fixes	2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Ali Saidi	208c7e7ca5	Use acq/rel semantics to pass flags/pointers in getrf_parallel. The current implementation has locks, but the locks each only have a critical section of one variable so atomic reads/writes with barriers can be used to achieve the same behavior. Like the previous patch, pthread_mutex_lock isn't fair, so in a tight loop the previous thread that has the lock can keep it starving another thread, even if that thread is about to write the data that will stop the current thread from spinning. On a 64c Arm system this improves performance by 20x on sgesv.goto.	2020-03-06 06:22:31 +00:00
Xianyi Zhang	4aa2d89217	Merge branch 'develop' into risc-v	2020-02-27 13:53:49 +08:00
Martin Kroeker	c222b25b81	Correct generation of GETRF files by the CMAKE build fixes #2396	2020-02-15 19:29:14 +01:00
Martin Kroeker	9f7a9a32e3	Merge pull request #2252 from thrasibule/trtrs Optimized ?trtrs	2019-09-12 21:45:47 +02:00
Guillaume Horel	6cb47ea3f0	fix Makefile	2019-09-10 17:11:01 -04:00
Guillaume Horel	f2becb777a	fix Makefile	2019-09-09 11:36:50 -04:00
Guillaume Horel	9f6984fe4b	add missing files	2019-09-08 11:14:49 -04:00
Guillaume Horel	42203dafdc	add logic	2019-09-08 11:14:49 -04:00
Guillaume Horel	a4f17a9297	add missing objects	2019-09-08 11:14:49 -04:00
Guillaume Horel	733d97b2df	add files	2019-09-08 11:14:49 -04:00

1 2 3

109 Commits