Commit Graph

102 Commits

Author SHA1 Message Date
Martin Kroeker
aa2a2d9c01 Conditionally compile files that may get replaced by ReLAPACK 2022-11-08 12:04:46 +01:00
Martin Kroeker
7656aba00e Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
40003f8edb Fix pivot offset calculation for negative incx 2022-01-17 00:11:18 +01:00
Martin Kroeker
57e2a72f40 Fix pivot offset calculation for negative incx 2022-01-17 00:10:21 +01:00
Martin Kroeker
3b6293f5a0 Fix offset calculation for negative incx 2022-01-17 00:09:14 +01:00
Martin Kroeker
afa0cece5c Fix pivot offset calculation for negative incx 2022-01-17 00:08:20 +01:00
Martin Kroeker
eca2f50b48 Fix pivot offset calculation for negative incx 2022-01-17 00:07:33 +01:00
Martin Kroeker
0e9e951306 Fix pivot offset calculation for negative incx 2022-01-17 00:06:41 +01:00
Martin Kroeker
1b49ef8dcf Fix pivot index for negative increments 2022-01-17 00:05:33 +01:00
Martin Kroeker
6b407a16cb fix function typecasts 2021-12-21 18:51:28 +01:00
Martin Kroeker
aecb4a5e8d fix function typecasts 2021-12-21 18:50:22 +01:00
Martin Kroeker
c49d46f25f fix function typecast 2021-12-21 18:49:18 +01:00
gxw
af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
Zhang Xianyi
d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker
4bb73c0171 Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker
32733ded04 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker
b27ca78a21 Adapt to having only a subset of variable types supported 2020-10-11 14:46:24 +02:00
Martin Kroeker
93454022a9 Adapt to having only a subset of variable types supported 2020-10-11 14:45:40 +02:00
Martin Kroeker
20cf1d773f Adapt to having only a subset of variable types supported 2020-10-11 14:44:56 +02:00
Martin Kroeker
5c657fffad Adapt to having only a subset of variable types supported 2020-10-11 14:44:13 +02:00
Martin Kroeker
b262058059 Adapt to having only a subset of variable types supported 2020-10-11 14:43:13 +02:00
Martin Kroeker
bc319cee82 Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606 Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72 Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7 Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker
f194ad59e1 Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99 Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Martin Kroeker
bbe119ee3b Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd Update conditional for atomics to use HAVE_C11 2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan
22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi
208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Xianyi Zhang
4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker
c222b25b81 Correct generation of GETRF files by the CMAKE build
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
9f7a9a32e3 Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel
6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel
f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel
9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel
42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel
a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel
733d97b2df add files 2019-09-08 11:14:49 -04:00
Andrew
4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew
575a84398a remove redundant code #2113 2019-05-07 23:46:54 +03:00
Martin Kroeker
e882b239aa Correct naming of getrf_parallel object
fixes #1984
2019-01-26 00:45:45 +01:00
Zhiyong Dang
3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Jerry Zhao
c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Martin Kroeker
20c6c38e51 Merge branch 'develop' into atomic 2018-04-07 12:09:39 +02:00
Martin Kroeker
8ec28ff461 Remove unguarded use of _Atomic and fix tabbing 2018-04-04 22:40:30 +02:00