Martin Kroeker
616fdea82a
Revert "Improve Windows threading performance scaling"
2023-06-28 09:45:17 +02:00
Mark Seminatore
d6991dd230
fix missing #endif
2023-06-24 15:43:32 -07:00
Mark Seminatore
7783a9af02
attempt to fix old mingw gcc issue
2023-06-24 14:35:11 -07:00
Mark Seminatore
8caabc5982
fix #4063 remove unused pool_lock
2023-06-23 19:45:16 -07:00
Mark Seminatore
d301649430
fix #4063 threading perf issues on Windows
2023-06-23 19:42:27 -07:00
Honglin Zhu
9e80a194d6
Fix dynamic_list build and gcc version check error
2023-05-21 19:52:58 +08:00
Honglin Zhu
0b83088887
spr dynamic arch support
2023-05-19 10:48:18 +08:00
Martin Kroeker
e5538a62cb
Add suggestions to NUM_THREADS/auxiliary buffer message
2023-05-04 22:56:39 +02:00
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
...
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
...
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
36fcb52094
Fix logic - we want real OR imaginary part of X to be nonzero here
2023-04-01 00:02:54 +02:00
H. Vetinari
f2659516ef
remove unqualified ifdef's for NO_LAPACK(E)
2023-03-28 19:01:31 +11:00
Martin Kroeker
579bc86671
remove call to omp_set_num_threads
2023-03-21 20:58:56 +01:00
Martin Kroeker
e298d613fa
initialize status variable for openblas_set_num_threads
2023-03-08 23:43:15 +01:00
Martin Kroeker
05aa88268f
add status variable for openblas_set_num_threads
2023-03-08 23:41:57 +01:00
Martin Kroeker
e38ab079a0
Fix OpenMP thread counting returning places rather than cores
2023-03-08 19:17:33 +01:00
Martin Kroeker
d4868babbc
Fix typos
2022-12-29 23:07:55 +01:00
Martin Kroeker
18c99d3e63
Update dynamic_arm64.c
2022-12-25 13:31:38 +01:00
Martin Kroeker
186a310f92
Update dynamic_arm64.c
2022-12-25 12:22:48 +01:00
Martin Kroeker
da6e426b13
fix Cooperlake not selectable via environment variable
2022-11-03 18:13:35 +01:00
Honglin Zhu
4989e039a5
Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build
2022-10-27 14:10:26 +08:00
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
...
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
ab6009b0b6
Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
...
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker
db50ab4a72
Add BUILD_vartype defines
2022-10-01 15:14:51 +02:00
Elliot Saba
d2ce93179f
Add `OPENBLAS_DEFAULT_NUM_THREADS`
...
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.
The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`. Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.
[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Kai T. Ohlhus
84453b924f
Support CONSISTENT_FPCSR on AARCH64
2022-09-22 00:20:40 +09:00
Martin Kroeker
9402df5604
Fix missing external declaration
2022-09-14 21:44:34 +02:00
Martin Kroeker
bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
...
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang
fae9368f14
Implement DYNAMIC_LIST for MIPS64
...
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:31 +01:00
Jiaxun Yang
a50b29c540
Provide a fallback MIPS64_GENERIC target
...
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang
b633eb79f2
Use $at as temporary register for mips/loongson CPUCFG read
...
Some compilers (namely LLVM) are not happy with clobbering
registers in inline assembly.
Use $at as temporary register and explicitly use noat
hint.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-07 13:22:32 +01:00
Martin Kroeker
19fefd100e
Merge pull request #3703 from martin-frbg/omp_adaptive
...
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
2022-08-03 15:38:39 +02:00
Jiaxun Yang
19d4f90c44
Use auvx to detect CPUCFG on mips/loongson
...
It's safer and easier than SIGILL.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-07-31 19:41:59 +01:00
Martin Kroeker
d0ba257de0
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
...
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 20:31:20 +02:00
gxw
fbfe1daf6e
LoongArch64: Add DYNAMIC_ARCH support
2022-07-28 14:28:45 +08:00
Martin Kroeker
80cdfed7b2
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
2022-07-27 23:43:20 +02:00
Martin Kroeker
08e3754b39
Add environment variable OMP_ADAPTIVE
2022-07-27 23:41:47 +02:00
Martin Kroeker
30473b6a9d
add openblas_getaffinity()
2022-07-27 19:15:18 +02:00
Martin Kroeker
daca01622b
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH ( #3700 )
...
* fix detection of Neoverse V1 and user-enforced selection of N2
2022-07-27 09:17:43 +02:00
Honglin Zhu
d5ca477f42
Neoverse N2: DYNAMIC_ARCH
2022-07-12 00:50:45 +08:00
Martin Kroeker
69148ae795
Guard against sysconf returning zero processors
2022-07-06 17:22:18 +02:00
Martin Kroeker
e9260f5451
Guard against system call returning zero processors
2022-07-06 17:21:10 +02:00
Martin Kroeker
2c62096fce
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes
2022-05-18 15:35:30 +02:00
Adam Niederer
69f2ac4ea2
Fix broken elif in dynamic.c
...
This fixes compilation in the following case:
$(MAKE) USE_OPENMP=1 USE_THREAD=1 NO_LAPACK=0 DYNAMIC_ARCH=1 \
DYNAMIC_LIST="HASWELL SKYLAKEX ATOM COOPERLAKE SAPPHIRERAPIDS ZEN"
2022-03-17 20:04:37 -04:00
Martin Kroeker
8d5a9c2f98
Merge pull request #3565 from jonaszhou1/develop
...
Support Zhaoxin/Centaur kh40000 as ZEN
2022-03-11 14:29:30 +01:00
Martin Kroeker
bf4642eb7e
Report USE_TLS if set
2022-03-10 16:19:29 +01:00
JonasZhou
2d0ad89b0d
Support Zhaoxin/Centaur kh40000 as ZEN
...
Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>
2022-03-10 15:08:38 +08:00
Martin Kroeker
fa3e9f25e6
Support AVX512-enabled Alder Lake
2022-02-07 00:00:56 +01:00
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
...
WIP casts and cleanups
2022-02-06 23:55:06 +01:00
Martin Kroeker
7f0b11fbc1
Exclude some complex drivers when NO_LAPACK is set
2022-01-27 22:00:39 +01:00