Commit Graph

735 Commits

Author SHA1 Message Date
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
ab6009b0b6 Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker
db50ab4a72 Add BUILD_vartype defines 2022-10-01 15:14:51 +02:00
Elliot Saba
d2ce93179f Add OPENBLAS_DEFAULT_NUM_THREADS
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.

The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`.  Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.

[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Kai T. Ohlhus
84453b924f Support CONSISTENT_FPCSR on AARCH64 2022-09-22 00:20:40 +09:00
Martin Kroeker
9402df5604 Fix missing external declaration 2022-09-14 21:44:34 +02:00
Martin Kroeker
bd30120ba7 Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang
fae9368f14 Implement DYNAMIC_LIST for MIPS64
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:31 +01:00
Jiaxun Yang
a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang
b633eb79f2 Use $at as temporary register for mips/loongson CPUCFG read
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-07 13:22:32 +01:00
Martin Kroeker
19fefd100e Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
2022-08-03 15:38:39 +02:00
Jiaxun Yang
19d4f90c44 Use auvx to detect CPUCFG on mips/loongson
It's safer and easier than SIGILL.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-07-31 19:41:59 +01:00
Martin Kroeker
d0ba257de0 Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 20:31:20 +02:00
gxw
fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
Martin Kroeker
80cdfed7b2 Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size 2022-07-27 23:43:20 +02:00
Martin Kroeker
08e3754b39 Add environment variable OMP_ADAPTIVE 2022-07-27 23:41:47 +02:00
Martin Kroeker
30473b6a9d add openblas_getaffinity() 2022-07-27 19:15:18 +02:00
Martin Kroeker
daca01622b fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH (#3700)
* fix detection of Neoverse V1 and user-enforced selection of N2
2022-07-27 09:17:43 +02:00
Honglin Zhu
d5ca477f42 Neoverse N2: DYNAMIC_ARCH 2022-07-12 00:50:45 +08:00
Martin Kroeker
69148ae795 Guard against sysconf returning zero processors 2022-07-06 17:22:18 +02:00
Martin Kroeker
e9260f5451 Guard against system call returning zero processors 2022-07-06 17:21:10 +02:00
Martin Kroeker
2c62096fce Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes 2022-05-18 15:35:30 +02:00
Adam Niederer
69f2ac4ea2 Fix broken elif in dynamic.c
This fixes compilation in the following case:

$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
2022-03-17 20:04:37 -04:00
Martin Kroeker
8d5a9c2f98 Merge pull request #3565 from jonaszhou1/develop
Support Zhaoxin/Centaur kh40000 as ZEN
2022-03-11 14:29:30 +01:00
Martin Kroeker
bf4642eb7e Report USE_TLS if set 2022-03-10 16:19:29 +01:00
JonasZhou
2d0ad89b0d Support Zhaoxin/Centaur kh40000 as ZEN
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
2022-03-10 15:08:38 +08:00
Martin Kroeker
fa3e9f25e6 Support AVX512-enabled Alder Lake 2022-02-07 00:00:56 +01:00
Martin Kroeker
7656aba00e Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
7f0b11fbc1 Exclude some complex drivers when NO_LAPACK is set 2022-01-27 22:00:39 +01:00
Martin Kroeker
b6b024232d Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli
19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Martin Kroeker
b329e45288 Guard against omp_get_num_places returning zero 2022-01-01 00:46:23 +01:00
Martin Kroeker
07fe5b19a4 typecast function pointers 2021-12-21 12:31:54 +01:00
Martin Kroeker
6ed52576f8 Add feature-based fallback for unknown x86_64 cpus 2021-12-16 22:02:49 +01:00
Martin Kroeker
7a7fbb11c3 define "unlikely" on non-cygwin too 2021-12-16 17:28:28 +01:00
Martin Kroeker
b31349c22a Open up delayed (re)init to non-Cygwin OS as well 2021-12-16 16:58:12 +01:00
Martin Kroeker
c8d05aa7a5 Move the threads overflow flag under the protection of the local blas lock (#3476)
* Move accesses to the overflow flag into the scope of the blas lock
2021-12-13 08:34:52 +01:00
Rafael Cardoso Fernandes Sousa
214fbcee15 Fix cmake for power 2021-12-09 08:28:17 -06:00
Martin Kroeker
4f057bffd6 Fix NULL pointer checks in blas_memory_alloc 2021-11-05 10:43:17 +01:00
Martin Kroeker
08f8bb66c0 Add CPUIDs for Alder Lake and other recent Intel cpus 2021-11-04 20:36:39 +01:00
Martin Kroeker
efb16fafb0 Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437)
*  return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
2021-11-04 12:11:16 +01:00
Marius Hillenbrand
77747bc536 cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.

To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-28 12:08:48 +02:00
Martin Kroeker
22a616bd8f Add model number for Tiger Lake H (mobile variant) 2021-10-27 22:17:58 +02:00
Marius Hillenbrand
44950ca173 s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-26 16:19:14 +02:00
Wangyang Guo
3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Rafael Cardoso Fernandes Sousa
0e8b4adf22 Remove unused commented code (#if directive) 2021-09-15 22:18:48 +00:00
Martin Kroeker
fa8bf57768 Merge pull request #3380 from martin-frbg/structwarn
Remove extraneous qualifiers from struct definition
2021-09-15 07:19:09 +02:00
Martin Kroeker
dd09f0173e Remove extraneous qualifiers from struct definition 2021-09-14 21:52:26 +02:00
Martin Kroeker
2f8220d757 Add sbgemm 2021-09-14 16:14:43 +02:00
Martin Kroeker
5f6a609253 Add sbgemv 2021-09-14 16:13:57 +02:00