Commit Graph

89 Commits

Author SHA1 Message Date
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker 4bb73c0171
Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker 32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker b27ca78a21
Adapt to having only a subset of variable types supported 2020-10-11 14:46:24 +02:00
Martin Kroeker 93454022a9
Adapt to having only a subset of variable types supported 2020-10-11 14:45:40 +02:00
Martin Kroeker 20cf1d773f
Adapt to having only a subset of variable types supported 2020-10-11 14:44:56 +02:00
Martin Kroeker 5c657fffad
Adapt to having only a subset of variable types supported 2020-10-11 14:44:13 +02:00
Martin Kroeker b262058059
Adapt to having only a subset of variable types supported 2020-10-11 14:43:13 +02:00
Martin Kroeker bc319cee82
Adapt to having only a subset of variable types supported 2020-10-11 14:42:26 +02:00
Martin Kroeker e5966f8606
Adapt to having only a subset of variable types supported 2020-10-11 14:41:43 +02:00
Martin Kroeker 9df12eb08f
Adapt to having only a subset of variable types supported 2020-10-11 14:40:51 +02:00
Martin Kroeker cf53970bcb
Adapt to having only a subset of variable types supported 2020-10-11 14:40:06 +02:00
Martin Kroeker dcd51d5c72
Adapt to having only a subset of variable types supported 2020-10-11 14:39:19 +02:00
Martin Kroeker b8f95354c7
Adapt to having only a subset of variable types supported 2020-10-11 14:38:25 +02:00
Martin Kroeker f194ad59e1
Use _Atomic instead of volatile where available (file moved from ../getrf)
must have misplaced this in ../getrf when I made that change in March 2018 (40160ff)
the only changes since then were 
RFC : Add half precision gemm for bfloat16 in OpenBLAS Rajalakshmi Srinivasaraghavan
Rajalakshmi Srinivasaraghavan committed on 14 Apr 2020 as 7ebbb50

    Change _STDC_VERSION__ to __STDC_VERSION__ 
Zhiyong Dang committed on 11 May 2018 as 3716267
2020-07-25 08:52:24 +02:00
Martin Kroeker 4fda217f99
Delete potrf_parallel.c (moving it to ../potrf) 2020-07-25 06:42:39 +00:00
Martin Kroeker bbe119ee3b
Update conditional for atomics to use HAVE_C11 2020-07-18 17:19:59 +00:00
Martin Kroeker f4f74941bd
Update conditional for atomics to use HAVE_C11 2020-07-18 17:14:50 +00:00
Rajalakshmi Srinivasaraghavan 22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Ali Saidi 208c7e7ca5 Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
2020-03-06 06:22:31 +00:00
Xianyi Zhang 4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker c222b25b81
Correct generation of GETRF files by the CMAKE build
fixes #2396
2020-02-15 19:29:14 +01:00
Martin Kroeker 9f7a9a32e3
Merge pull request #2252 from thrasibule/trtrs
Optimized ?trtrs
2019-09-12 21:45:47 +02:00
Guillaume Horel 6cb47ea3f0 fix Makefile 2019-09-10 17:11:01 -04:00
Guillaume Horel f2becb777a fix Makefile 2019-09-09 11:36:50 -04:00
Guillaume Horel 9f6984fe4b add missing files 2019-09-08 11:14:49 -04:00
Guillaume Horel 42203dafdc add logic 2019-09-08 11:14:49 -04:00
Guillaume Horel a4f17a9297 add missing objects 2019-09-08 11:14:49 -04:00
Guillaume Horel 733d97b2df add files 2019-09-08 11:14:49 -04:00
Andrew 4de545aa7d address minor warnings from gcc7 2019-09-07 10:21:08 +03:00
Andrew 575a84398a remove redundant code #2113 2019-05-07 23:46:54 +03:00
Martin Kroeker e882b239aa
Correct naming of getrf_parallel object
fixes #1984
2019-01-26 00:45:45 +01:00
Zhiyong Dang 3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Jerry Zhao c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Martin Kroeker 20c6c38e51
Merge branch 'develop' into atomic 2018-04-07 12:09:39 +02:00
Martin Kroeker 8ec28ff461
Remove unguarded use of _Atomic and fix tabbing 2018-04-04 22:40:30 +02:00
Martin Kroeker bb9876db33
Fix thread races and infinite looping on systems with many cpus
On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497.
This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.
2018-04-04 18:16:52 +02:00
Martin Kroeker 40160ff3c1
Use _Atomic instead of volatile for thread safety where C11 is supported 2018-03-10 00:15:44 +01:00
Andrew 9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Martin Kroeker c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Sacha Refshauge 37858d1146 Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Isuru Fernando d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Dan Horák 56762d5e4c add lapack laswp for zarch 2017-04-13 15:38:59 +02:00
Ashwin Sekhar T K 3918d17025 LAPACK: Fix lapack-test errors in ARM64 threaded version 2017-01-31 23:36:23 +05:30
Werner Saar 209b63197e prepared lapack/lauum for UNROLL values, that are not a power of two 2017-01-11 07:29:17 +01:00
Werner Saar c81dc6322f prepared lapack/potrf functions for UNROLL values, that are not a power of two 2017-01-10 10:50:28 +01:00
Werner Saar 3e1bbd6b5f prepared lapack/getrf functions for UNROLL values, that are not a power of two 2017-01-09 12:57:26 +01:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00