Martin Kroeker
aa2a2d9c01
Conditionally compile files that may get replaced by ReLAPACK
2022-11-08 12:04:46 +01:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
40003f8edb
Fix pivot offset calculation for negative incx
2022-01-17 00:11:18 +01:00
Martin Kroeker
57e2a72f40
Fix pivot offset calculation for negative incx
2022-01-17 00:10:21 +01:00
Martin Kroeker
3b6293f5a0
Fix offset calculation for negative incx
2022-01-17 00:09:14 +01:00
Martin Kroeker
afa0cece5c
Fix pivot offset calculation for negative incx
2022-01-17 00:08:20 +01:00
Martin Kroeker
eca2f50b48
Fix pivot offset calculation for negative incx
2022-01-17 00:07:33 +01:00
Martin Kroeker
0e9e951306
Fix pivot offset calculation for negative incx
2022-01-17 00:06:41 +01:00
Martin Kroeker
1b49ef8dcf
Fix pivot index for negative increments
2022-01-17 00:05:33 +01:00
Martin Kroeker
6b407a16cb
fix function typecasts
2021-12-21 18:51:28 +01:00
Martin Kroeker
aecb4a5e8d
fix function typecasts
2021-12-21 18:50:22 +01:00
Martin Kroeker
c49d46f25f
fix function typecast
2021-12-21 18:49:18 +01:00
gxw
af0a69f355
Add support for LOONGARCH64
2021-07-27 15:29:12 +08:00
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
2020-10-16 23:27:38 +08:00
Martin Kroeker
4bb73c0171
Rename "HALF" type to "BFLOAT16"
2020-10-13 20:07:19 +02:00
Martin Kroeker
32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:52:45 +02:00
Martin Kroeker
b27ca78a21
Adapt to having only a subset of variable types supported
2020-10-11 14:46:24 +02:00
Martin Kroeker
93454022a9
Adapt to having only a subset of variable types supported
2020-10-11 14:45:40 +02:00
Martin Kroeker
20cf1d773f
Adapt to having only a subset of variable types supported
2020-10-11 14:44:56 +02:00
Martin Kroeker
5c657fffad
Adapt to having only a subset of variable types supported
2020-10-11 14:44:13 +02:00
Martin Kroeker
b262058059
Adapt to having only a subset of variable types supported
2020-10-11 14:43:13 +02:00
Martin Kroeker
bc319cee82
Adapt to having only a subset of variable types supported
2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606
Adapt to having only a subset of variable types supported
2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f
Adapt to having only a subset of variable types supported
2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb
Adapt to having only a subset of variable types supported
2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72
Adapt to having only a subset of variable types supported
2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7
Adapt to having only a subset of variable types supported
2020-10-11 14:38:25 +02:00
Martin Kroeker
f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
...
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff
)
the only changes since then were
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50
Change _STDC_VERSION__ to __STDC_VERSION__
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99
Delete potrf_parallel.c (moving it to ../potrf)
2020-07-25 06:42:39 +00:00
Martin Kroeker
bbe119ee3b
Update conditional for atomics to use HAVE_C11
2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd
Update conditional for atomics to use HAVE_C11
2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan
22bb50fb81
cmake fixes
2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi
208c7e7ca5
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
...
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
2020-02-27 13:53:49 +08:00
Martin Kroeker
c222b25b81
Correct generation of GETRF files by the CMAKE build
...
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
...
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel
6cb47ea3f0
fix Makefile
2019-09-10 17:11:01 -04:00
Guillaume Horel
f2becb777a
fix Makefile
2019-09-09 11:36:50 -04:00
Guillaume Horel
9f6984fe4b
add missing files
2019-09-08 11:14:49 -04:00
Guillaume Horel
42203dafdc
add logic
2019-09-08 11:14:49 -04:00
Guillaume Horel
a4f17a9297
add missing objects
2019-09-08 11:14:49 -04:00
Guillaume Horel
733d97b2df
add files
2019-09-08 11:14:49 -04:00
Andrew
4de545aa7d
address minor warnings from gcc7
2019-09-07 10:21:08 +03:00
Andrew
575a84398a
remove redundant code #2113
2019-05-07 23:46:54 +03:00
Martin Kroeker
e882b239aa
Correct naming of getrf_parallel object
...
fixes #1984
2019-01-26 00:45:45 +01:00
Zhiyong Dang
3716267124
Change _STDC_VERSION__ to __STDC_VERSION__
...
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Jerry Zhao
c167a3d6f4
Added RISCV build
2018-04-16 14:08:31 -07:00
Martin Kroeker
20c6c38e51
Merge branch 'develop' into atomic
2018-04-07 12:09:39 +02:00
Martin Kroeker
8ec28ff461
Remove unguarded use of _Atomic and fix tabbing
2018-04-04 22:40:30 +02:00