Commit Graph

637 Commits

Author SHA1 Message Date
Martin Kroeker da6e426b13
fix Cooperlake not selectable via environment variable 2022-11-03 18:13:35 +01:00
Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker db50ab4a72
Add BUILD_vartype defines 2022-10-01 15:14:51 +02:00
Elliot Saba d2ce93179f Add `OPENBLAS_DEFAULT_NUM_THREADS`
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.

The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`.  Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.

[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Kai T. Ohlhus 84453b924f
Support CONSISTENT_FPCSR on AARCH64 2022-09-22 00:20:40 +09:00
Martin Kroeker 9402df5604
Fix missing external declaration 2022-09-14 21:44:34 +02:00
Martin Kroeker bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang fae9368f14 Implement DYNAMIC_LIST for MIPS64
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:31 +01:00
Jiaxun Yang a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang b633eb79f2 Use $at as temporary register for mips/loongson CPUCFG read
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-07 13:22:32 +01:00
Martin Kroeker 19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
2022-08-03 15:38:39 +02:00
Jiaxun Yang 19d4f90c44 Use auvx to detect CPUCFG on mips/loongson
It's safer and easier than SIGILL.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-07-31 19:41:59 +01:00
Martin Kroeker d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 20:31:20 +02:00
gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
Martin Kroeker 80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size 2022-07-27 23:43:20 +02:00
Martin Kroeker 08e3754b39
Add environment variable OMP_ADAPTIVE 2022-07-27 23:41:47 +02:00
Martin Kroeker 30473b6a9d
add openblas_getaffinity() 2022-07-27 19:15:18 +02:00
Martin Kroeker daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH (#3700)
* fix detection of Neoverse V1 and user-enforced selection of N2
2022-07-27 09:17:43 +02:00
Honglin Zhu d5ca477f42 Neoverse N2: DYNAMIC_ARCH 2022-07-12 00:50:45 +08:00
Martin Kroeker 69148ae795
Guard against sysconf returning zero processors 2022-07-06 17:22:18 +02:00
Martin Kroeker e9260f5451
Guard against system call returning zero processors 2022-07-06 17:21:10 +02:00
Martin Kroeker 2c62096fce
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes 2022-05-18 15:35:30 +02:00
Adam Niederer 69f2ac4ea2 Fix broken elif in dynamic.c
This fixes compilation in the following case:

$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
2022-03-17 20:04:37 -04:00
Martin Kroeker 8d5a9c2f98
Merge pull request #3565 from jonaszhou1/develop
Support Zhaoxin/Centaur kh40000 as ZEN
2022-03-11 14:29:30 +01:00
Martin Kroeker bf4642eb7e
Report USE_TLS if set 2022-03-10 16:19:29 +01:00
JonasZhou 2d0ad89b0d Support Zhaoxin/Centaur kh40000 as ZEN
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
2022-03-10 15:08:38 +08:00
Martin Kroeker fa3e9f25e6
Support AVX512-enabled Alder Lake 2022-02-07 00:00:56 +01:00
Martin Kroeker 7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker 7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set 2022-01-27 22:00:39 +01:00
Martin Kroeker b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Martin Kroeker b329e45288
Guard against omp_get_num_places returning zero 2022-01-01 00:46:23 +01:00
Martin Kroeker 07fe5b19a4
typecast function pointers 2021-12-21 12:31:54 +01:00
Martin Kroeker 6ed52576f8
Add feature-based fallback for unknown x86_64 cpus 2021-12-16 22:02:49 +01:00
Martin Kroeker 7a7fbb11c3
define "unlikely" on non-cygwin too 2021-12-16 17:28:28 +01:00
Martin Kroeker b31349c22a
Open up delayed (re)init to non-Cygwin OS as well 2021-12-16 16:58:12 +01:00
Martin Kroeker c8d05aa7a5
Move the threads overflow flag under the protection of the local blas lock (#3476)
* Move accesses to the overflow flag into the scope of the blas lock
2021-12-13 08:34:52 +01:00
Rafael Cardoso Fernandes Sousa 214fbcee15 Fix cmake for power 2021-12-09 08:28:17 -06:00
Martin Kroeker 4f057bffd6
Fix NULL pointer checks in blas_memory_alloc 2021-11-05 10:43:17 +01:00
Martin Kroeker 08f8bb66c0
Add CPUIDs for Alder Lake and other recent Intel cpus 2021-11-04 20:36:39 +01:00
Martin Kroeker efb16fafb0
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437)
*  return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
2021-11-04 12:11:16 +01:00
Marius Hillenbrand 77747bc536 cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.

To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-28 12:08:48 +02:00
Martin Kroeker 22a616bd8f
Add model number for Tiger Lake H (mobile variant) 2021-10-27 22:17:58 +02:00
Marius Hillenbrand 44950ca173 s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-26 16:19:14 +02:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Rafael Cardoso Fernandes Sousa 0e8b4adf22 Remove unused commented code (#if directive) 2021-09-15 22:18:48 +00:00
Martin Kroeker fa8bf57768
Merge pull request #3380 from martin-frbg/structwarn
Remove extraneous qualifiers from struct definition
2021-09-15 07:19:09 +02:00
Martin Kroeker dd09f0173e
Remove extraneous qualifiers from struct definition 2021-09-14 21:52:26 +02:00