Commit Graph

96 Commits

Author SHA1 Message Date
Martin Kroeker
07b1c0bc10 Stop using sched_yield on non-Windows x86_64 2024-03-11 08:01:49 +01:00
Dirreke
ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Martin Kroeker
f9b2d7f225 Merge pull request #3253 from wi24rd/patch-1
Fix typo in common.h
2024-01-13 19:55:01 +01:00
TGY
b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Martin Kroeker
9773a9d6b3 undefine YIELDING for the Emscripten js converter 2022-09-14 17:04:11 +02:00
Pablo Romero
84a5f0e2eb Fixes #3743. 2022-08-26 11:44:11 +02:00
Martin Kroeker
bc93f468ef Add Elbrus E2000 architecture as generic x86_64 compatible 2022-01-22 18:53:38 +01:00
gxw
af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
王滋涵 Zephyr Wang
a62cfc3ccf Fix typo in common.h 2021-05-29 18:10:00 +08:00
H.J. Lu
53ee0b76bb x86: Enable Intel CET
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
xoviat
b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Zhang Xianyi
d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker
84949754a0 Fix bfloat16 conditional 2020-10-13 09:11:36 +02:00
Martin Kroeker
ca31c32693 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:49:22 +02:00
Martin Kroeker
dc8e4e1959 Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable 2020-10-04 22:59:24 +02:00
User User-User
d2333e7842 aarch64 fix std=c18 compilation 2020-10-03 18:00:34 +03:00
Chen, Guobing
deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
a36eb19ae0 Update conditional for C11 atomics to use HAVE_C11 2020-07-18 17:13:24 +00:00
Rajalakshmi Srinivasaraghavan
9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16 Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
79fd006c58 Expose the support_avx512 function provided in dynamic.c 2020-03-26 21:25:39 +01:00
Xianyi Zhang
4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker
d2cb610272 Add option USE_LOCKING for single-threaded build with locking support
for calling from concurrent threads
2019-05-15 23:18:43 +02:00
Jeff Baylor
40e53e52d6 snprintf define consolidated to common.h 2019-04-22 17:01:34 -07:00
Martin Kroeker
7c51cc8527 Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17 power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 2019-03-29 15:49:40 +00:00
Erik M. Bray
1006ff8a7b Use POSIX getenv on Cygwin
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
2019-03-15 15:06:30 +01:00
Andrew
9531d0e175 lets fit it in one 4k page 2018-11-06 17:51:24 +00:00
Andrew
3fd41313fc add low bound for number of buffers 2018-11-06 09:40:13 +00:00
Steven G. Johnson
48610a4524 fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker
4a553e8678 Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
40c068a875 Introduce blasabs() to switch between abs() and labs() for INTERFACE64 2018-08-04 20:07:59 +02:00
Zoltán Mizsei
6463bffd59 Haiku supporting patches 2018-08-02 20:49:14 +02:00
Martin Kroeker
de8fff671d Revert "Use usleep instead of sched_yield by default" 2018-06-11 17:05:27 +02:00
Martin Kroeker
ed7c4a043b Use usleep instead of sched_yield by default
sched_yield only burns cpu cycles, fixes #900,  see also #923, #1560
2018-06-07 10:18:26 +02:00
Martin Kroeker
83da278093 Update common.h 2018-06-06 09:27:49 +02:00
Martin Kroeker
358d4df2bd Merge branch 'develop' into issue1593-2 2018-06-06 09:21:41 +02:00
Martin Kroeker
06d43760e4 Restore _Atomic define before stdatomic.h for old gcc
see #1593
2018-06-06 09:18:10 +02:00
Martin Kroeker
354a976a59 Fix inverted condition in _Atomic declaration
fixes #1593
2018-06-05 10:31:34 +02:00
zhiyong.dang
53457f222f move _Atomic define to common.h 2018-05-11 00:13:16 -07:00
Zhiyong Dang
1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Jerry Zhao
c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Alex Arslan
a41d241a0e Add support for DragonFly BSD 2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52 Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Isuru Fernando
eb98fdddfc typedefs only for c 2017-07-29 20:38:16 +05:30
Isuru Fernando
ca17b4b75c Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
Neil Shipp
34513be726 Add Microsoft Windows 10 UWP build support 2017-06-23 13:07:34 -07:00
Martin Kroeker
ea26b00c06 Fix CREAL,CIMAG macros for PGI 2017-03-13 00:36:01 +01:00
Zhang Xianyi
b678471d65 Merge branch 'z13' into develop
Conflicts:
	CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00