Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
...
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
2024-09-29 10:13:30 +08:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2
...
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
...
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393
properly fix buffer allocation and assignment
2024-08-15 15:32:58 +02:00
gxw
fd033467ac
Fixed the undefined reference to blas_set_parameter
...
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker
23b5d66a86
Ensure a memory buffer has been allocated for each thread before invoking it
2024-08-14 10:35:44 +02:00
Martin Kroeker
753c7ebe17
Merge pull request #4835 from martin-frbg/revertwin4359
...
Temporarily revert to the coarse-grained locking in the Windows thread server
2024-08-07 14:09:32 +02:00
Martin Kroeker
50397e017a
Merge pull request #4838 from martin-frbg/fix4662-3
...
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 11:32:10 +02:00
Martin Kroeker
5257f807a9
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 00:03:17 +02:00
Martin Kroeker
2aed90171a
Add riscv sources for DYNAMIC_ARCH
2024-08-03 23:58:10 +02:00
Martin Kroeker
6468dc1142
restore the coarse locking of the pre-4359 version
2024-08-02 16:39:47 +02:00
yamazaki-mitsufumi
821ef34635
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
2024-07-23 20:44:39 +09:00
Martin Kroeker
a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
...
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker
a373d0f107
Improve the error message for thread creation failure
2024-07-15 18:32:21 +02:00
Mark Ryan
3b715e6162
Add autodetection for riscv64
...
Implement DYNAMIC_ARCH support for riscv64. Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly. Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0. The
approach taken is to first try hwprobe. If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.
Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.
A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
Martin Kroeker
d0b9948b23
Guard against invalid thread_status.queue
2024-06-30 19:31:15 +02:00
Martin Kroeker
9b2a0c79cb
Add Zhaoxin KX7000
2024-06-20 09:23:08 +02:00
Deeksha Goplani
0dc80a5c8d
locks improvement
2024-05-13 22:17:23 +05:30
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00
Martin Kroeker
5500b4ab26
Merge pull request #4680 from theAeon/develop
...
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
Martin Kroeker
f0f1ff7820
fix HUGETLB allocation for TLS mode as well
2024-05-08 00:40:36 +02:00
Andrew Robbins
edfe1aa471
Expose whether locking is enabled in get_config
2024-05-07 11:12:03 -04:00
Martin Kroeker
dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb
2024-05-04 14:49:00 +02:00
Martin Kroeker
ddcd7d6fa8
Merge branch 'develop' into Threading_Callback
2024-04-21 22:27:11 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
shivammonaka
7102367fde
Introduced callback to Pthread, Win32 and OpenMP backend
2024-04-02 08:53:49 +05:30
Mark Seminatore
b0ad8a78ff
code to fix lost work in case of re-entrant calls to exec_blas_async()
2024-03-28 15:24:52 -07:00
Martin Kroeker
88b5330ae7
Restore outer loop of blas_buffer_inuse setup
2024-03-24 18:33:21 +01:00
shivammonaka
d49ebc54e1
Merge branch 'shivam-develop' into shivam-Locks
2024-02-29 11:58:14 +05:30
shivammonaka
bc191015e3
Using OpenMP locks with NUM_PARALLEL
2024-02-29 11:47:05 +05:30
Mark Seminatore
b29fd48998
Merge branch 'develop' into win_tidy
2024-02-12 10:23:17 -08:00
Mark Seminatore
98c56a7314
more cleanup
2024-02-08 13:50:15 -08:00
Chip Kerchner
d408ecedba
Add environment variable to display coretype for dynamic arch.
2024-02-08 12:17:18 -06:00
Chip Kerchner
ac6b4b7aa4
Make sure CPU ID works for all POWER_10 conditions
2024-02-08 08:56:30 -06:00
Chip Kerchner
08ce6b1c1c
Add missing CPU ID definitions for old versions of AIX.
2024-02-07 07:54:06 -06:00
Martin Kroeker
a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
...
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2024-02-05 07:27:04 +01:00
Martin Kroeker
e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids
2024-02-04 10:05:20 +01:00
Mark Seminatore
42cb567f0f
more cleanup
2024-01-31 13:24:28 -08:00
Mark Seminatore
0d7fe5ea61
clean up whitespace
2024-01-29 22:33:47 -08:00
Martin Kroeker
d938aed7fe
reset "mem structure overflowed" state on shutdown
2024-01-23 17:15:53 +01:00
Chris Sidebottom
aaf65210cc
Add dynamic support for Arm(R) Neoverse(TM) V2 processor
...
Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
2024-01-19 19:05:50 +00:00
Martin Kroeker
152a6c43b6
Add blas_omp_threads_local
2024-01-14 19:59:55 +01:00
Martin Kroeker
8a9d492af7
Add default for blas_omp_threads_local
2024-01-14 19:58:49 +01:00
Martin Kroeker
87d31af2ae
Add openblas_set_num_threads_local()
2024-01-13 20:06:24 +01:00
Martin Kroeker
e7a895e714
Add Apple M as NeoverseN1
2023-12-25 12:36:05 +01:00
Chris Sidebottom
dc20a78188
Use functionally equivalent dynamic targets
...
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Mark Seminatore
6bd7c54af5
introduce MT_TRACE to clean up SMP_DEBUG code
2023-12-11 15:13:04 -08:00