Commit Graph

96 Commits

Author SHA1 Message Date
Martin Kroeker 07b1c0bc10
Stop using sched_yield on non-Windows x86_64 2024-03-11 08:01:49 +01:00
Dirreke ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Martin Kroeker f9b2d7f225
Merge pull request #3253 from wi24rd/patch-1
Fix typo in common.h
2024-01-13 19:55:01 +01:00
TGY b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Martin Kroeker 9773a9d6b3
undefine YIELDING for the Emscripten js converter 2022-09-14 17:04:11 +02:00
Pablo Romero 84a5f0e2eb Fixes #3743. 2022-08-26 11:44:11 +02:00
Martin Kroeker bc93f468ef
Add Elbrus E2000 architecture as generic x86_64 compatible 2022-01-22 18:53:38 +01:00
gxw af0a69f355 Add support for LOONGARCH64 2021-07-27 15:29:12 +08:00
王滋涵 Zephyr Wang a62cfc3ccf
Fix typo in common.h 2021-05-29 18:10:00 +08:00
H.J. Lu 53ee0b76bb x86: Enable Intel CET
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
xoviat b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker 84949754a0
Fix bfloat16 conditional 2020-10-13 09:11:36 +02:00
Martin Kroeker ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:49:22 +02:00
Martin Kroeker dc8e4e1959
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable 2020-10-04 22:59:24 +02:00
User User-User d2333e7842 aarch64 fix std=c18 compilation 2020-10-03 18:00:34 +03:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker a36eb19ae0
Update conditional for C11 atomics to use HAVE_C11 2020-07-18 17:13:24 +00:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Rajalakshmi Srinivasaraghavan 67cc4b9e16 Fix warnings in clang and export symbol 2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker 79fd006c58
Expose the support_avx512 function provided in dynamic.c 2020-03-26 21:25:39 +01:00
Xianyi Zhang 4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker d2cb610272
Add option USE_LOCKING for single-threaded build with locking support
for calling from concurrent threads
2019-05-15 23:18:43 +02:00
Jeff Baylor 40e53e52d6 snprintf define consolidated to common.h 2019-04-22 17:01:34 -07:00
Martin Kroeker 7c51cc8527
Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
AbdelRauf 853a18bc17 power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 2019-03-29 15:49:40 +00:00
Erik M. Bray 1006ff8a7b Use POSIX getenv on Cygwin
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
2019-03-15 15:06:30 +01:00
Andrew 9531d0e175 lets fit it in one 4k page 2018-11-06 17:51:24 +00:00
Andrew 3fd41313fc add low bound for number of buffers 2018-11-06 09:40:13 +00:00
Steven G. Johnson 48610a4524
fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker 4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker 40c068a875
Introduce blasabs() to switch between abs() and labs() for INTERFACE64 2018-08-04 20:07:59 +02:00
Zoltán Mizsei 6463bffd59 Haiku supporting patches 2018-08-02 20:49:14 +02:00
Martin Kroeker de8fff671d
Revert "Use usleep instead of sched_yield by default" 2018-06-11 17:05:27 +02:00
Martin Kroeker ed7c4a043b
Use usleep instead of sched_yield by default
sched_yield only burns cpu cycles, fixes #900,  see also #923, #1560
2018-06-07 10:18:26 +02:00
Martin Kroeker 83da278093
Update common.h 2018-06-06 09:27:49 +02:00
Martin Kroeker 358d4df2bd
Merge branch 'develop' into issue1593-2 2018-06-06 09:21:41 +02:00
Martin Kroeker 06d43760e4
Restore _Atomic define before stdatomic.h for old gcc
see #1593
2018-06-06 09:18:10 +02:00
Martin Kroeker 354a976a59
Fix inverted condition in _Atomic declaration
fixes #1593
2018-06-05 10:31:34 +02:00
zhiyong.dang 53457f222f move _Atomic define to common.h 2018-05-11 00:13:16 -07:00
Zhiyong Dang 1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Jerry Zhao c167a3d6f4 Added RISCV build 2018-04-16 14:08:31 -07:00
Alex Arslan a41d241a0e
Add support for DragonFly BSD 2018-04-03 16:39:29 -07:00
Alex Arslan 8da6b6ae52
Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Isuru Fernando eb98fdddfc typedefs only for c 2017-07-29 20:38:16 +05:30
Isuru Fernando ca17b4b75c Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
Neil Shipp 34513be726 Add Microsoft Windows 10 UWP build support 2017-06-23 13:07:34 -07:00
Martin Kroeker ea26b00c06 Fix CREAL,CIMAG macros for PGI 2017-03-13 00:36:01 +01:00
Zhang Xianyi b678471d65 Merge branch 'z13' into develop
Conflicts:
	CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00