Commit Graph

102 Commits

Author SHA1 Message Date
Martin Kroeker aa2a2d9c01
Conditionally compile files that may get replaced by ReLAPACK 2022-11-08 12:04:46 +01:00
Martin Kroeker 7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker 40003f8edb
Fix pivot offset calculation for negative incx 2022-01-17 00:11:18 +01:00
Martin Kroeker 57e2a72f40
Fix pivot offset calculation for negative incx 2022-01-17 00:10:21 +01:00
Martin Kroeker 3b6293f5a0
Fix offset calculation for negative incx 2022-01-17 00:09:14 +01:00
Martin Kroeker afa0cece5c
Fix pivot offset calculation for negative incx 2022-01-17 00:08:20 +01:00
Martin Kroeker eca2f50b48
Fix pivot offset calculation for negative incx 2022-01-17 00:07:33 +01:00
Martin Kroeker 0e9e951306
Fix pivot offset calculation for negative incx 2022-01-17 00:06:41 +01:00
Martin Kroeker 1b49ef8dcf
Fix pivot index for negative increments 2022-01-17 00:05:33 +01:00
Martin Kroeker 6b407a16cb
fix function typecasts 2021-12-21 18:51:28 +01:00
Martin Kroeker aecb4a5e8d
fix function typecasts 2021-12-21 18:50:22 +01:00
Martin Kroeker c49d46f25f
fix function typecast 2021-12-21 18:49:18 +01:00
gxw af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker 4bb73c0171
Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker 32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker b27ca78a21
Adapt to having only a subset of variable types supported 2020-10-11 14:46:24 +02:00
Martin Kroeker 93454022a9
Adapt to having only a subset of variable types supported 2020-10-11 14:45:40 +02:00
Martin Kroeker 20cf1d773f
Adapt to having only a subset of variable types supported 2020-10-11 14:44:56 +02:00
Martin Kroeker 5c657fffad
Adapt to having only a subset of variable types supported 2020-10-11 14:44:13 +02:00
Martin Kroeker b262058059
Adapt to having only a subset of variable types supported 2020-10-11 14:43:13 +02:00
Martin Kroeker bc319cee82
Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker e5966f8606
Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker 9df12eb08f
Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker cf53970bcb
Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker dcd51d5c72
Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker b8f95354c7
Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker 4fda217f99
Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Martin Kroeker bbe119ee3b
Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00
Martin Kroeker f4f74941bd
Update conditional for atomics to use HAVE_C11 2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan 22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi 208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Xianyi Zhang 4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker c222b25b81
Correct generation of GETRF files by the CMAKE build
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker 9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel 6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel 9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel 42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel 733d97b2df add files 2019-09-08 11:14:49 -04:00
Andrew 4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew 575a84398a remove redundant code #2113 2019-05-07 23:46:54 +03:00
Martin Kroeker e882b239aa
Correct naming of getrf_parallel object
fixes #1984
2019-01-26 00:45:45 +01:00
Zhiyong Dang 3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Jerry Zhao c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Martin Kroeker 20c6c38e51
Merge branch 'develop' into atomic 2018-04-07 12:09:39 +02:00
Martin Kroeker 8ec28ff461
Remove unguarded use of _Atomic and fix tabbing 2018-04-04 22:40:30 +02:00