Chip-Kerchner
c60f9d9c08
Add missing CPU_POWER5.
2023-10-06 09:49:17 -05:00
Chip Kerchner
3cc72a3797
Only include cpu_id and cpu_supports in AIX and fix parameter types.
2023-10-04 09:54:37 -05:00
Chip-Kerchner
09212f84bf
Fix default case for cpu_is.
2023-10-03 12:23:21 -05:00
Chip-Kerchner
2d0b233425
Fix missing parens.
2023-10-03 10:26:14 -05:00
Chip-Kerchner
a8c90eb3ed
Added cpu_is
2023-10-03 10:24:04 -05:00
Chip-Kerchner
b677d0d5fd
Adding missing endif
2023-10-02 13:09:12 -05:00
Chip-Kerchner
e5dc376912
Remove duplicate defines.
2023-10-02 12:48:47 -05:00
Chip-Kerchner
10210748de
Revert PGI changes.
2023-10-02 12:44:07 -05:00
Chip-Kerchner
a922a07e61
Cleanup white spaces.
2023-10-02 12:24:30 -05:00
Chip-Kerchner
12130ee961
Remove tab.
2023-10-02 12:19:22 -05:00
Chip-Kerchner
eb738d9929
Minor changes.
2023-10-02 12:14:46 -05:00
Chip-Kerchner
48da98b2a7
Merge remote-tracking branch 'origin/develop' into XLC-AIX
2023-10-02 12:01:33 -05:00
Chip-Kerchner
3b1150fcee
Fix CPU identification to work on AIX.
2023-10-02 12:00:48 -05:00
Martin Kroeker
90f890ee67
fix improper function prototypes (empty parentheses) (USE_TLS branch)
2023-09-30 23:12:36 +02:00
Martin Kroeker
cf2174fb69
fix improper function prototypes (empty parentheses)
2023-09-30 17:04:39 +02:00
Martin Kroeker
c6b1d8e7a3
fix improper function prototypes (empty parentheses)
2023-09-30 12:52:06 +02:00
Martin Kroeker
7e939fb831
Fix handling of additional buffer structures in case of overflow
2023-09-19 23:33:39 +02:00
Tiziano Müller
6a611db560
memory: show correct number of max threads
2023-09-10 08:44:07 +02:00
Martin Kroeker
c2f4bdbbb4
Merge pull request #4163 from martin-frbg/issue4017
...
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker
9ff84dc3f2
remove unused status variable
2023-07-26 10:02:44 +02:00
Martin Kroeker
3326b924b3
remove status variable blas_num_threads_set; initialize openmp thread maximum on startup
2023-07-26 00:31:24 +02:00
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
...
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Martin Kroeker
3bdcf3259d
Merge branch 'xianyi:develop' into issue4101
2023-07-20 08:23:20 +02:00
Martin Kroeker
b34f19a365
Ensure that a premature call to set_num_threads will not overwrite unrelated memory
2023-07-19 22:19:22 +02:00
Martin Kroeker
66904f8148
Ensure that a premature call will not overwrite unrelated memory
2023-07-19 22:14:34 +02:00
Martin Kroeker
5c58994eb2
Add fallback warning
2023-07-19 18:27:41 +02:00
Martin Kroeker
ca7199f249
Treat newer Neoverse as N1 if SVE unavailable (may be disabled in container/cloud env)
2023-07-19 14:48:42 +02:00
Martin Kroeker
616fdea82a
Revert "Improve Windows threading performance scaling"
2023-06-28 09:45:17 +02:00
Mark Seminatore
d6991dd230
fix missing #endif
2023-06-24 15:43:32 -07:00
Mark Seminatore
7783a9af02
attempt to fix old mingw gcc issue
2023-06-24 14:35:11 -07:00
Mark Seminatore
8caabc5982
fix #4063 remove unused pool_lock
2023-06-23 19:45:16 -07:00
Mark Seminatore
d301649430
fix #4063 threading perf issues on Windows
2023-06-23 19:42:27 -07:00
Honglin Zhu
9e80a194d6
Fix dynamic_list build and gcc version check error
2023-05-21 19:52:58 +08:00
Honglin Zhu
0b83088887
spr dynamic arch support
2023-05-19 10:48:18 +08:00
Martin Kroeker
e5538a62cb
Add suggestions to NUM_THREADS/auxiliary buffer message
2023-05-04 22:56:39 +02:00
Martin Kroeker
579bc86671
remove call to omp_set_num_threads
2023-03-21 20:58:56 +01:00
Martin Kroeker
e298d613fa
initialize status variable for openblas_set_num_threads
2023-03-08 23:43:15 +01:00
Martin Kroeker
05aa88268f
add status variable for openblas_set_num_threads
2023-03-08 23:41:57 +01:00
Martin Kroeker
e38ab079a0
Fix OpenMP thread counting returning places rather than cores
2023-03-08 19:17:33 +01:00
Martin Kroeker
d4868babbc
Fix typos
2022-12-29 23:07:55 +01:00
Martin Kroeker
18c99d3e63
Update dynamic_arm64.c
2022-12-25 13:31:38 +01:00
Martin Kroeker
186a310f92
Update dynamic_arm64.c
2022-12-25 12:22:48 +01:00
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
2022-11-03 18:13:35 +01:00
Martin Kroeker
ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
...
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker
db50ab4a72
Add BUILD_vartype defines
2022-10-01 15:14:51 +02:00
Elliot Saba
d2ce93179f
Add `OPENBLAS_DEFAULT_NUM_THREADS`
...
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Kai T. Ohlhus
84453b924f
Support CONSISTENT_FPCSR on AARCH64
2022-09-22 00:20:40 +09:00
Martin Kroeker
9402df5604
Fix missing external declaration
2022-09-14 21:44:34 +02:00
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
...
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang
fae9368f14
Implement DYNAMIC_LIST for MIPS64
...
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:31 +01:00
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
...
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang
b633eb79f2
Use $at as temporary register for mips/loongson CPUCFG read
...
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-07 13:22:32 +01:00
Martin Kroeker
19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
...
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
2022-08-03 15:38:39 +02:00
Jiaxun Yang
19d4f90c44
Use auvx to detect CPUCFG on mips/loongson
...
It's safer and easier than SIGILL.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-07-31 19:41:59 +01:00
Martin Kroeker
d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
...
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 20:31:20 +02:00
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 14:28:45 +08:00
Martin Kroeker
80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
2022-07-27 23:43:20 +02:00
Martin Kroeker
08e3754b39
Add environment variable OMP_ADAPTIVE
2022-07-27 23:41:47 +02:00
Martin Kroeker
30473b6a9d
add openblas_getaffinity()
2022-07-27 19:15:18 +02:00
Martin Kroeker
daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH ( #3700 )
...
* fix detection of Neoverse V1 and user-enforced selection of N2
2022-07-27 09:17:43 +02:00
Honglin Zhu
d5ca477f42
Neoverse N2: DYNAMIC_ARCH
2022-07-12 00:50:45 +08:00
Martin Kroeker
69148ae795
Guard against sysconf returning zero processors
2022-07-06 17:22:18 +02:00
Martin Kroeker
e9260f5451
Guard against system call returning zero processors
2022-07-06 17:21:10 +02:00
Martin Kroeker
2c62096fce
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes
2022-05-18 15:35:30 +02:00
Adam Niederer
69f2ac4ea2
Fix broken elif in dynamic.c
...
This fixes compilation in the following case:
$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
2022-03-17 20:04:37 -04:00
Martin Kroeker
8d5a9c2f98
Merge pull request #3565 from jonaszhou1/develop
...
Support Zhaoxin/Centaur kh40000 as ZEN
2022-03-11 14:29:30 +01:00
Martin Kroeker
bf4642eb7e
Report USE_TLS if set
2022-03-10 16:19:29 +01:00
JonasZhou
2d0ad89b0d
Support Zhaoxin/Centaur kh40000 as ZEN
...
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
2022-03-10 15:08:38 +08:00
Martin Kroeker
fa3e9f25e6
Support AVX512-enabled Alder Lake
2022-02-07 00:00:56 +01:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
b6b024232d
Merge pull request #3508 from snadampal/v1_n2
...
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-07 00:28:17 +00:00
Martin Kroeker
b329e45288
Guard against omp_get_num_places returning zero
2022-01-01 00:46:23 +01:00
Martin Kroeker
07fe5b19a4
typecast function pointers
2021-12-21 12:31:54 +01:00
Martin Kroeker
6ed52576f8
Add feature-based fallback for unknown x86_64 cpus
2021-12-16 22:02:49 +01:00
Martin Kroeker
7a7fbb11c3
define "unlikely" on non-cygwin too
2021-12-16 17:28:28 +01:00
Martin Kroeker
b31349c22a
Open up delayed (re)init to non-Cygwin OS as well
2021-12-16 16:58:12 +01:00
Martin Kroeker
c8d05aa7a5
Move the threads overflow flag under the protection of the local blas lock ( #3476 )
...
* Move accesses to the overflow flag into the scope of the blas lock
2021-12-13 08:34:52 +01:00
Rafael Cardoso Fernandes Sousa
214fbcee15
Fix cmake for power
2021-12-09 08:28:17 -06:00
Martin Kroeker
4f057bffd6
Fix NULL pointer checks in blas_memory_alloc
2021-11-05 10:43:17 +01:00
Martin Kroeker
08f8bb66c0
Add CPUIDs for Alder Lake and other recent Intel cpus
2021-11-04 20:36:39 +01:00
Martin Kroeker
efb16fafb0
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE ( #3437 )
...
* return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
2021-11-04 12:11:16 +01:00
Marius Hillenbrand
77747bc536
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
...
Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.
To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-28 12:08:48 +02:00
Martin Kroeker
22a616bd8f
Add model number for Tiger Lake H (mobile variant)
2021-10-27 22:17:58 +02:00
Marius Hillenbrand
44950ca173
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
...
On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2021-10-26 16:19:14 +02:00
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
2021-10-12 01:30:40 -07:00
Rafael Cardoso Fernandes Sousa
0e8b4adf22
Remove unused commented code (#if directive)
2021-09-15 22:18:48 +00:00
Martin Kroeker
dd09f0173e
Remove extraneous qualifiers from struct definition
2021-09-14 21:52:26 +02:00
Wangyang Guo
045ed5c91d
sbgemm: fix build error in BFLOAT16 disabled
2021-09-07 23:37:08 +08:00
Wangyang Guo
8356a604f0
sbgemm: cooperlake: tuning for block params
2021-09-07 21:30:46 +08:00
Martin Kroeker
cd10d1c03b
Fix typo
2021-08-30 14:38:28 +02:00
Martin Kroeker
2db1a99aca
Clean up debug messages
2021-08-30 14:21:25 +02:00
Martin Kroeker
89fc5b8f4f
Fix unmap logic
2021-08-29 19:50:24 +02:00
Martin Kroeker
7fd12a5e69
Add likely() hints for gcc
2021-08-29 13:54:51 +02:00
Martin Kroeker
2ba9a567aa
Fix typo
2021-08-28 17:14:59 +02:00
Martin Kroeker
b4b952eece
Add auxiliary tracking space for thread buffer frees too
2021-08-28 17:03:53 +02:00
Martin Kroeker
7d1becc575
Allocate an auxiliary struct when running out of preconfigured threads
2021-08-28 14:18:36 +02:00
Martin Kroeker
898212efcd
Actually add the message to the TLS section
2021-08-02 14:50:14 +02:00
Martin Kroeker
210a1584c5
Rebase source and edit TLS version of the message as well
2021-08-02 14:19:16 +02:00
Martin Kroeker
f2a7a67f5a
Improve the "tried to allocate too many buffers" error message
2021-07-31 17:23:40 +02:00