Commit Graph

90 Commits

Author SHA1 Message Date
gxw
af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
Zhang Xianyi
d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker
4bb73c0171 Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker
32733ded04 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker
b27ca78a21 Adapt to having only a subset of variable types supported 2020-10-11 14:46:24 +02:00
Martin Kroeker
93454022a9 Adapt to having only a subset of variable types supported 2020-10-11 14:45:40 +02:00
Martin Kroeker
20cf1d773f Adapt to having only a subset of variable types supported 2020-10-11 14:44:56 +02:00
Martin Kroeker
5c657fffad Adapt to having only a subset of variable types supported 2020-10-11 14:44:13 +02:00
Martin Kroeker
b262058059 Adapt to having only a subset of variable types supported 2020-10-11 14:43:13 +02:00
Martin Kroeker
bc319cee82 Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker
e5966f8606 Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker
9df12eb08f Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker
cf53970bcb Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker
dcd51d5c72 Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker
b8f95354c7 Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker
f194ad59e1 Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker
4fda217f99 Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Martin Kroeker
bbe119ee3b Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00
Martin Kroeker
f4f74941bd Update conditional for atomics to use HAVE_C11 2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan
22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi
208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Xianyi Zhang
4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker
c222b25b81 Correct generation of GETRF files by the CMAKE build
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker
9f7a9a32e3 Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel
6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel
f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel
9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel
42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel
a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel
733d97b2df add files 2019-09-08 11:14:49 -04:00
Andrew
4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew
575a84398a remove redundant code #2113 2019-05-07 23:46:54 +03:00
Martin Kroeker
e882b239aa Correct naming of getrf_parallel object
fixes #1984
2019-01-26 00:45:45 +01:00
Zhiyong Dang
3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Jerry Zhao
c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Martin Kroeker
20c6c38e51 Merge branch 'develop' into atomic 2018-04-07 12:09:39 +02:00
Martin Kroeker
8ec28ff461 Remove unguarded use of _Atomic and fix tabbing 2018-04-04 22:40:30 +02:00
Martin Kroeker
bb9876db33 Fix thread races and infinite looping on systems with many cpus
On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497.
This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.
2018-04-04 18:16:52 +02:00
Martin Kroeker
40160ff3c1 Use _Atomic instead of volatile for thread safety where C11 is supported 2018-03-10 00:15:44 +01:00
Andrew
9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew
d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Martin Kroeker
c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Sacha Refshauge
37858d1146 Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Isuru Fernando
d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Dan Horák
56762d5e4c add lapack laswp for zarch 2017-04-13 15:38:59 +02:00
Ashwin Sekhar T K
3918d17025 LAPACK: Fix lapack-test errors in ARM64 threaded version 2017-01-31 23:36:23 +05:30
Werner Saar
209b63197e prepared lapack/lauum for UNROLL values, that are not a power of two 2017-01-11 07:29:17 +01:00
Werner Saar
c81dc6322f prepared lapack/potrf functions for UNROLL values, that are not a power of two 2017-01-10 10:50:28 +01:00
Werner Saar
3e1bbd6b5f prepared lapack/getrf functions for UNROLL values, that are not a power of two 2017-01-09 12:57:26 +01:00