Rafael Cardoso Fernandes Sousa
0e8b4adf22
Remove unused commented code (#if directive)
2021-09-15 22:18:48 +00:00
Martin Kroeker
fa8bf57768
Merge pull request #3380 from martin-frbg/structwarn
...
Remove extraneous qualifiers from struct definition
2021-09-15 07:19:09 +02:00
Martin Kroeker
dd09f0173e
Remove extraneous qualifiers from struct definition
2021-09-14 21:52:26 +02:00
Martin Kroeker
2f8220d757
Add sbgemm
2021-09-14 16:14:43 +02:00
Martin Kroeker
5f6a609253
Add sbgemv
2021-09-14 16:13:57 +02:00
Wangyang Guo
045ed5c91d
sbgemm: fix build error in BFLOAT16 disabled
2021-09-07 23:37:08 +08:00
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
2021-09-07 21:30:46 +08:00
Martin Kroeker
cd10d1c03b
Fix typo
2021-08-30 14:38:28 +02:00
Martin Kroeker
2db1a99aca
Clean up debug messages
2021-08-30 14:21:25 +02:00
Martin Kroeker
89fc5b8f4f
Fix unmap logic
2021-08-29 19:50:24 +02:00
Martin Kroeker
7fd12a5e69
Add likely() hints for gcc
2021-08-29 13:54:51 +02:00
Martin Kroeker
2ba9a567aa
Fix typo
2021-08-28 17:14:59 +02:00
Martin Kroeker
b4b952eece
Add auxiliary tracking space for thread buffer frees too
2021-08-28 17:03:53 +02:00
Martin Kroeker
7d1becc575
Allocate an auxiliary struct when running out of preconfigured threads
2021-08-28 14:18:36 +02:00
Martin Kroeker
898212efcd
Actually add the message to the TLS section
2021-08-02 14:50:14 +02:00
Martin Kroeker
210a1584c5
Rebase source and edit TLS version of the message as well
2021-08-02 14:19:16 +02:00
Martin Kroeker
f2a7a67f5a
Improve the "tried to allocate too many buffers" error message
2021-07-31 17:23:40 +02:00
Craig Watson
4d7dfe4845
Include Haiku in processor count checks
2021-07-27 09:00:30 +00:00
JonasZhou
0fca36c8c3
Add cpu detection support for Zhaoxin processors
...
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
2021-07-12 13:43:45 +08:00
River Dillon
2f6326a630
Remove <linux/unistd.h>
2021-07-10 00:36:07 -07:00
Martin Kroeker
8f22ac552b
Add vendor string Shanghai as successor to Centaur
2021-07-08 18:28:49 +02:00
Martin Kroeker
eb2fdd3af0
Recognize newer Zhaoxin/Centaur processors as Nehalem
2021-07-08 12:23:15 +02:00
User User-User
750719528a
bugz
2021-06-20 16:40:43 +02:00
User User-User
6423b282a1
dynamic_arch
2021-06-20 14:19:41 +02:00
Martin Kroeker
307c4c0786
Fix typo
2021-06-16 13:41:16 +02:00
Martin Kroeker
e83df93975
Work around another recent macro name collision with winnt.h
2021-06-16 12:32:34 +02:00
Martin Kroeker
cbfd3c87e1
Recognize Intel Ice Lake SP as Cooper Lake
2021-05-14 20:44:06 +02:00
Martin Kroeker
623d580b4c
Restore __volatile__ keyword
2021-04-16 10:27:32 +02:00
Martin Kroeker
186368ddc3
Fix compilation with CLANG
2021-03-16 16:52:57 +01:00
Martin Kroeker
1a3ad4b670
Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64
2021-02-22 19:40:36 +01:00
Peter Hawkins
dbbf92c1d1
Fix race in blas_thread_shutdown.
...
blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.
2021-02-18 13:46:50 -05:00
Martin Kroeker
cb429d6b12
Merge pull request #3110 from martin-frbg/issue3108
...
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 15:45:25 +01:00
Martin Kroeker
b0bded3f2f
Fix get_num_procs() in the USE_TLS branch for non-glibc systems
2021-02-18 11:14:05 +01:00
Martin Kroeker
e4e5042e38
Recognize Intel Tiger Lake as SkylakeX
2021-02-11 20:17:11 +01:00
Martin Kroeker
0cc36770f1
Merge pull request #3073 from xoviat/embedded
...
add embedded option
2021-01-31 18:02:41 +01:00
Martin Kroeker
eea0c0f2ed
Merge pull request #3085 from alexhenrie/memory_alloc
...
Fix null pointer check in blas_memory_alloc
2021-01-26 20:11:42 +01:00
Martin Kroeker
0cb9e9fc8d
Remove the VORTEX support bits again for now
2021-01-25 19:02:21 +01:00
Alex Henrie
113840da12
Fix null pointer check in blas_memory_alloc
2021-01-24 22:20:44 -07:00
Martin Kroeker
deb2e66bcc
Add DYNAMIC_LIST support for ARM64
2021-01-24 23:18:52 +01:00
xoviat
2e8d6e8690
add functions for embedded
2021-01-23 22:12:17 -06:00
Martin Kroeker
b94dab5250
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
2021-01-20 21:34:36 +01:00
Martin Kroeker
63fa3c3f8f
Require gcc 11 for builtin_cpu_is(power10)
...
fixes #3074
2021-01-20 15:41:04 +01:00
xoviat
b60de4447a
add cortex-m platform
2021-01-19 08:57:44 -06:00
Martin Kroeker
2c445be8ba
Merge pull request #3051 from martin-frbg/rocketlake
...
Add CPUID information for Intel Rocket Lake
2021-01-14 15:56:25 +01:00
Martin Kroeker
6fe0f1fab9
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
2021-01-11 19:05:29 +01:00
Martin Kroeker
17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
2020-12-19 23:21:22 +01:00
Martin Kroeker
865676682d
Add Intel Rocket Lake
2020-12-14 22:40:23 +01:00
Martin Kroeker
6232237dba
Make fallback from P10 to P9 conditional on suitable compiler
2020-12-11 23:41:17 +01:00
Martin Kroeker
18d8a67485
Merge pull request #2994 from antonblanchard/power10-fixes
...
Power10 fixes
2020-12-11 23:37:30 +01:00
Martin Kroeker
83de62c20d
Merge pull request #3026 from martin-frbg/revert747
...
Revert PR747 - SYRK parameter changes for Haswell and related targets
2020-12-10 16:29:41 +01:00
gxw
4b548857d6
Add msa support for loongson
...
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Martin Kroeker
a554712439
remove extra/intermediate size step for min_jj introduced in PR747
2020-12-08 21:01:36 +01:00
Martin Kroeker
5d26223f4a
remove extra/intermediate size step of min_jj from PR747
2020-12-08 20:59:56 +01:00
Martin Kroeker
bc5b1ddf0d
Merge pull request #3004 from martin-frbg/bsd_getauxval
...
ARM64 DYNAMIC_ARCH build fix for BSD/OSX
2020-11-23 08:35:12 +01:00
Martin Kroeker
e7bf8ced6c
Build fix for systems that do not support getauxval
2020-11-22 20:20:28 +01:00
Martin Kroeker
5fa305172a
Use ifeq instead of ifdef for user-definable options
2020-11-22 16:29:56 +01:00
Martin Kroeker
d3ff1f889f
Convert ifndefs to ifneq
2020-11-22 16:27:17 +01:00
Alexander Grund
60005eb47b
Don't overwrite blas_thread_buffer if already set
...
After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes #2993
2020-11-19 14:51:51 +01:00
Anton Blanchard
043f3d6faa
POWER10: Use POWER9 as a fallback
...
If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.
2020-11-19 21:04:10 +11:00
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
...
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-05 15:12:36 +08:00
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
...
fix avx2 detection
2020-10-31 20:24:21 +01:00
Guillaume Horel
1f564d729b
fix avx2 detection
...
reword commits to make it clearer
2020-10-31 10:00:48 -04:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
...
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
2020-10-28 09:37:32 +01:00
Martin Kroeker
b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
2020-10-27 17:51:32 +01:00
Martin Kroeker
fd7da56965
Move definitions that are neither needed nor supported on SUNOS
2020-10-25 12:01:50 +01:00
Martin Kroeker
ff65952e46
Move HAVE_P10_SUPPORT to the build system
...
to be able to include a binutils version check
2020-10-20 00:55:41 +02:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:06:06 +02:00
Martin Kroeker
85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:05:05 +02:00
Martin Kroeker
887e00fd7f
Adapt for supporting only a subset of variable types
2020-10-11 14:58:57 +02:00
Martin Kroeker
886a8e3190
Adapt for supporting only a subset of variable types
2020-10-11 14:57:32 +02:00
Martin Kroeker
ac653c94f3
Merge branch 'develop' into issue2588-cmake
2020-10-11 13:57:07 +02:00
Martin Kroeker
f032d8966e
Merge pull request #2874 from Flamefire/memory_fixes
...
Avoid out of bounds access on invalid memory free
2020-10-04 15:16:51 +02:00
Martin Kroeker
f6e4cf2f9d
Merge pull request #2876 from Flamefire/omp_fork_fix
...
Lazyly reinit threads after a fork in OMP mode
2020-10-03 22:52:17 +02:00
User User-User
d2333e7842
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
Alexander Grund
3094fc6c83
Lazyly reinit threads after a fork in OMP mode
...
This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that.
2020-10-01 15:41:42 +02:00
Alexander Grund
3c05f54df8
Avoid out of bounds access on invalid memory free
2020-10-01 10:48:45 +02:00
Alexander Grund
dee7c49938
Fix TABs and trailing space
2020-10-01 10:43:16 +02:00
Martin Kroeker
896bbd55e1
Add support for building only selected variable types
2020-09-26 23:25:55 +02:00
Martin Kroeker
357bff06b5
Add BUILD_vartype defines
2020-09-22 23:24:22 +02:00
Martin Kroeker
988a6f429e
Add BUILD_vartype defines
2020-09-22 23:23:33 +02:00
Martin Kroeker
e5e2fbd593
Support building only selected types
2020-09-22 23:21:30 +02:00
Martin Kroeker
3287848c8f
Support building only seleced types
2020-09-22 23:20:51 +02:00
y00512012
06cf73a239
fix a bug of trmm
2020-09-22 16:47:10 +08:00
Martin Kroeker
ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm
...
Adding performance patch for trmm, just like trsm (#2836 )
2020-09-15 21:17:48 +02:00
fossum
dfeca46098
Adding performance patch for trmm, just like #2836
2020-09-15 08:59:50 -05:00
fossum
274d6e015b
Fixing a performance bug in trsm_[LR].c.
2020-09-14 13:10:48 -05:00
Martin Kroeker
91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
...
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Marius Hillenbrand
a55fe06f25
s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
...
Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand
4f34bcfb5e
s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
...
... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.
To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.
We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Martin Kroeker
330044d821
Fix potentiol domain error in sqrt
2020-09-05 09:44:33 +02:00
Chen, Guobing
deaeb6c5b8
Add bfloat16 based dot and conversion with single/double
...
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
shstobf16 -- convert single float array to bfloat16 array
shdtobf16 -- convert double float array to bfloat16 array
sbf16tos -- convert bfloat16 array to single float array
dbf16tod -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Chen, Guobing
0c1c903f1e
Fix OMP num specify issue
...
In current code, no matter what number of threads specified, all
available CPU count is used when invoking OMP, which leads to very bad
performance if the workload is small while all available CPUs are big.
Lots of time are wasted on inter-thread sync. Fix this issue by really
using the number specified by the variable 'num' from calling API.
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-08-24 02:45:54 +08:00
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
...
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
60cd5e55fc
Protect against inadvertent activation of USE_CUDA
2020-08-01 12:31:39 +02:00
Martin Kroeker
7c02f4b1f7
Merge pull request #2744 from martin-frbg/issue2738
...
Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3
2020-07-28 19:32:04 +02:00
Martin Kroeker
12918358aa
Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2
...
also support AMD family 22 Jaguar/Puma as Bobcat
2020-07-28 13:53:17 +00:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00