Commit Graph

679 Commits

Author SHA1 Message Date
Chip-Kerchner
b677d0d5fd Adding missing endif 2023-10-02 13:09:12 -05:00
Chip-Kerchner
e5dc376912 Remove duplicate defines. 2023-10-02 12:48:47 -05:00
Chip-Kerchner
10210748de Revert PGI changes. 2023-10-02 12:44:07 -05:00
Chip-Kerchner
a922a07e61 Cleanup white spaces. 2023-10-02 12:24:30 -05:00
Chip-Kerchner
12130ee961 Remove tab. 2023-10-02 12:19:22 -05:00
Chip-Kerchner
eb738d9929 Minor changes. 2023-10-02 12:14:46 -05:00
Chip-Kerchner
48da98b2a7 Merge remote-tracking branch 'origin/develop' into XLC-AIX 2023-10-02 12:01:33 -05:00
Chip-Kerchner
3b1150fcee Fix CPU identification to work on AIX. 2023-10-02 12:00:48 -05:00
Martin Kroeker
90f890ee67 fix improper function prototypes (empty parentheses) (USE_TLS branch) 2023-09-30 23:12:36 +02:00
Martin Kroeker
cf2174fb69 fix improper function prototypes (empty parentheses) 2023-09-30 17:04:39 +02:00
Martin Kroeker
c6b1d8e7a3 fix improper function prototypes (empty parentheses) 2023-09-30 12:52:06 +02:00
Martin Kroeker
c4bd4a2e5d fix improper function prototypes (empty parentheses) 2023-09-30 12:49:24 +02:00
Martin Kroeker
7e939fb831 Fix handling of additional buffer structures in case of overflow 2023-09-19 23:33:39 +02:00
Tiziano Müller
6a611db560 memory: show correct number of max threads 2023-09-10 08:44:07 +02:00
Martin Kroeker
c2f4bdbbb4 Merge pull request #4163 from martin-frbg/issue4017
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker
9ff84dc3f2 remove unused status variable 2023-07-26 10:02:44 +02:00
Martin Kroeker
3326b924b3 remove status variable blas_num_threads_set; initialize openmp thread maximum on startup 2023-07-26 00:31:24 +02:00
Chris Sidebottom
f971ef55f2 Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.

To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Martin Kroeker
3bdcf3259d Merge branch 'xianyi:develop' into issue4101 2023-07-20 08:23:20 +02:00
Martin Kroeker
b34f19a365 Ensure that a premature call to set_num_threads will not overwrite unrelated memory 2023-07-19 22:19:22 +02:00
Martin Kroeker
66904f8148 Ensure that a premature call will not overwrite unrelated memory 2023-07-19 22:14:34 +02:00
Martin Kroeker
5c58994eb2 Add fallback warning 2023-07-19 18:27:41 +02:00
Martin Kroeker
ca7199f249 Treat newer Neoverse as N1 if SVE unavailable (may be disabled in container/cloud env) 2023-07-19 14:48:42 +02:00
Martin Kroeker
616fdea82a Revert "Improve Windows threading performance scaling" 2023-06-28 09:45:17 +02:00
Mark Seminatore
d6991dd230 fix missing #endif 2023-06-24 15:43:32 -07:00
Mark Seminatore
7783a9af02 attempt to fix old mingw gcc issue 2023-06-24 14:35:11 -07:00
Mark Seminatore
8caabc5982 fix #4063 remove unused pool_lock 2023-06-23 19:45:16 -07:00
Mark Seminatore
d301649430 fix #4063 threading perf issues on Windows 2023-06-23 19:42:27 -07:00
Honglin Zhu
9e80a194d6 Fix dynamic_list build and gcc version check error 2023-05-21 19:52:58 +08:00
Honglin Zhu
0b83088887 spr dynamic arch support 2023-05-19 10:48:18 +08:00
Martin Kroeker
e5538a62cb Add suggestions to NUM_THREADS/auxiliary buffer message 2023-05-04 22:56:39 +02:00
Martin Kroeker
437c0bf2b4 Merge pull request #3843 from Mousius/switch-ratio
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
36fcb52094 Fix logic - we want real OR imaginary part of X to be nonzero here 2023-04-01 00:02:54 +02:00
H. Vetinari
f2659516ef remove unqualified ifdef's for NO_LAPACK(E) 2023-03-28 19:01:31 +11:00
Martin Kroeker
579bc86671 remove call to omp_set_num_threads 2023-03-21 20:58:56 +01:00
Martin Kroeker
e298d613fa initialize status variable for openblas_set_num_threads 2023-03-08 23:43:15 +01:00
Martin Kroeker
05aa88268f add status variable for openblas_set_num_threads 2023-03-08 23:41:57 +01:00
Martin Kroeker
e38ab079a0 Fix OpenMP thread counting returning places rather than cores 2023-03-08 19:17:33 +01:00
Martin Kroeker
d4868babbc Fix typos 2022-12-29 23:07:55 +01:00
Martin Kroeker
18c99d3e63 Update dynamic_arm64.c 2022-12-25 13:31:38 +01:00
Martin Kroeker
186a310f92 Update dynamic_arm64.c 2022-12-25 12:22:48 +01:00
Martin Kroeker
da6e426b13 fix Cooperlake not selectable via environment variable 2022-11-03 18:13:35 +01:00
Honglin Zhu
4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
ab6009b0b6 Merge pull request #3773 from staticfloat/sf/openblas_default_num_threads
Add `OPENBLAS_DEFAULT_NUM_THREADS`
2022-10-13 14:15:14 +02:00
Martin Kroeker
db50ab4a72 Add BUILD_vartype defines 2022-10-01 15:14:51 +02:00
Elliot Saba
d2ce93179f Add OPENBLAS_DEFAULT_NUM_THREADS
This allows Julia to set a default number of threads (usually `1`) to be
used when no other thread counts are specified [0], to short-circuit the
default OpenBLAS thread initialization routine that spins up a different
number of threads than Julia would otherwise choose.

The reason to add a new environment variable is that we want to be able
to configure OpenBLAS to avoid performing its initial memory
allocation/thread startup, as that can consume significant amounts of
memory, but we still want to be sensitive to legacy codebases that set
things like `OMP_NUM_THREADS` or `GOTOBLAS_NUM_THREADS`.  Creating a new
environment variable that is openblas-specific and is not already
publicly used to control the overall number of threads of programs like
Julia seems to be the best way forward.

[0] https://github.com/JuliaLang/julia/pull/46844
2022-09-30 01:21:44 +00:00
Kai T. Ohlhus
84453b924f Support CONSISTENT_FPCSR on AARCH64 2022-09-22 00:20:40 +09:00
Martin Kroeker
9402df5604 Fix missing external declaration 2022-09-14 21:44:34 +02:00