Martin Kroeker
07b1c0bc10
Stop using sched_yield on non-Windows x86_64
2024-03-11 08:01:49 +01:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00
Martin Kroeker
f9b2d7f225
Merge pull request #3253 from wi24rd/patch-1
...
Fix typo in common.h
2024-01-13 19:55:01 +01:00
TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
Martin Kroeker
9773a9d6b3
undefine YIELDING for the Emscripten js converter
2022-09-14 17:04:11 +02:00
Pablo Romero
84a5f0e2eb
Fixes #3743 .
2022-08-26 11:44:11 +02:00
Martin Kroeker
bc93f468ef
Add Elbrus E2000 architecture as generic x86_64 compatible
2022-01-22 18:53:38 +01:00
gxw
af0a69f355
Add support for LOONGARCH64
2021-07-27 15:29:12 +08:00
王滋涵 Zephyr Wang
a62cfc3ccf
Fix typo in common.h
2021-05-29 18:10:00 +08:00
H.J. Lu
53ee0b76bb
x86: Enable Intel CET
...
When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry.
2021-04-30 19:45:39 -07:00
xoviat
b60de4447a
add cortex-m platform
2021-01-19 08:57:44 -06:00
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
2020-10-16 23:27:38 +08:00
Martin Kroeker
84949754a0
Fix bfloat16 conditional
2020-10-13 09:11:36 +02:00
Martin Kroeker
ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:49:22 +02:00
Martin Kroeker
dc8e4e1959
Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable
2020-10-04 22:59:24 +02:00
User User-User
d2333e7842
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker
a36eb19ae0
Update conditional for C11 atomics to use HAVE_C11
2020-07-18 17:13:24 +00:00
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
...
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Rajalakshmi Srinivasaraghavan
67cc4b9e16
Fix warnings in clang and export symbol
2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
79fd006c58
Expose the support_avx512 function provided in dynamic.c
2020-03-26 21:25:39 +01:00
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
2020-02-27 13:53:49 +08:00
Martin Kroeker
d2cb610272
Add option USE_LOCKING for single-threaded build with locking support
...
for calling from concurrent threads
2019-05-15 23:18:43 +02:00
Jeff Baylor
40e53e52d6
snprintf define consolidated to common.h
2019-04-22 17:01:34 -07:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Erik M. Bray
1006ff8a7b
Use POSIX getenv on Cygwin
...
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
2019-03-15 15:06:30 +01:00
Andrew
9531d0e175
lets fit it in one 4k page
2018-11-06 17:51:24 +00:00
Andrew
3fd41313fc
add low bound for number of buffers
2018-11-06 09:40:13 +00:00
Steven G. Johnson
48610a4524
fix blasabs for windows
...
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
2018-08-05 08:18:51 -04:00
Martin Kroeker
4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
...
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
2018-08-04 23:51:31 +02:00
Martin Kroeker
40c068a875
Introduce blasabs() to switch between abs() and labs() for INTERFACE64
2018-08-04 20:07:59 +02:00
Zoltán Mizsei
6463bffd59
Haiku supporting patches
2018-08-02 20:49:14 +02:00
Martin Kroeker
de8fff671d
Revert "Use usleep instead of sched_yield by default"
2018-06-11 17:05:27 +02:00
Martin Kroeker
ed7c4a043b
Use usleep instead of sched_yield by default
...
sched_yield only burns cpu cycles, fixes #900 , see also #923 , #1560
2018-06-07 10:18:26 +02:00
Martin Kroeker
83da278093
Update common.h
2018-06-06 09:27:49 +02:00
Martin Kroeker
358d4df2bd
Merge branch 'develop' into issue1593-2
2018-06-06 09:21:41 +02:00
Martin Kroeker
06d43760e4
Restore _Atomic define before stdatomic.h for old gcc
...
see #1593
2018-06-06 09:18:10 +02:00
Martin Kroeker
354a976a59
Fix inverted condition in _Atomic declaration
...
fixes #1593
2018-06-05 10:31:34 +02:00
zhiyong.dang
53457f222f
move _Atomic define to common.h
2018-05-11 00:13:16 -07:00
Zhiyong Dang
1b83341d19
Fix race condition in blas_server_omp.c
...
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Jerry Zhao
c167a3d6f4
Added RISCV build
2018-04-16 14:08:31 -07:00
Alex Arslan
a41d241a0e
Add support for DragonFly BSD
2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52
Allow building on OpenBSD
...
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Isuru Fernando
eb98fdddfc
typedefs only for c
2017-07-29 20:38:16 +05:30
Isuru Fernando
ca17b4b75c
Fix complex support for MSVC headers
2017-07-28 11:50:29 +05:30
Neil Shipp
34513be726
Add Microsoft Windows 10 UWP build support
2017-06-23 13:07:34 -07:00
Martin Kroeker
ea26b00c06
Fix CREAL,CIMAG macros for PGI
2017-03-13 00:36:01 +01:00
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00