Commit Graph

761 Commits

Author SHA1 Message Date
Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw 30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw 48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker 3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker d24b3cf393
properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
gxw fd033467ac Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker 23b5d66a86
Ensure a memory buffer has been allocated for each thread before invoking it 2024-08-14 10:35:44 +02:00
Martin Kroeker 753c7ebe17
Merge pull request #4835 from martin-frbg/revertwin4359
Temporarily revert to the coarse-grained locking in the Windows thread server
2024-08-07 14:09:32 +02:00
Martin Kroeker 50397e017a
Merge pull request #4838 from martin-frbg/fix4662-3
fix invalid ifdef syntax in HUGETLB handling
2024-08-04 11:32:10 +02:00
Martin Kroeker 5257f807a9
fix invalid ifdef syntax in HUGETLB handling 2024-08-04 00:03:17 +02:00
Martin Kroeker 2aed90171a
Add riscv sources for DYNAMIC_ARCH 2024-08-03 23:58:10 +02:00
Martin Kroeker 6468dc1142
restore the coarse locking of the pre-4359 version 2024-08-02 16:39:47 +02:00
yamazaki-mitsufumi 821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Martin Kroeker a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker a373d0f107
Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00
Mark Ryan 3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
Martin Kroeker d0b9948b23
Guard against invalid thread_status.queue 2024-06-30 19:31:15 +02:00
Martin Kroeker 7e9a4ba427
Merge pull request #4741 from shivammonaka/Pthread_Scalability_Improvement
Enhancing Core Utilization in BLAS Calls: A Scalable Architecture
2024-06-20 13:36:23 +02:00
Martin Kroeker 9b2a0c79cb
Add Zhaoxin KX7000 2024-06-20 09:23:08 +02:00
shivammonaka 9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 2024-06-07 08:40:17 +05:30
Martin Kroeker db070a9223
add gemm_batch drivers 2024-05-31 18:29:27 +02:00
Martin Kroeker d0794f88dc
add gemm_batch driver 2024-05-29 15:49:20 +02:00
Martin Kroeker 0073affe63
Merge pull request #4693 from goplanid/locks-improvement
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker 6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani 0dc80a5c8d locks improvement 2024-05-13 22:17:23 +05:30
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw 637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker 5500b4ab26
Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
Martin Kroeker f0f1ff7820
fix HUGETLB allocation for TLS mode as well 2024-05-08 00:40:36 +02:00
Andrew Robbins edfe1aa471
Expose whether locking is enabled in get_config 2024-05-07 11:12:03 -04:00
Martin Kroeker dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb 2024-05-04 14:49:00 +02:00
Martin Kroeker ddcd7d6fa8
Merge branch 'develop' into Threading_Callback 2024-04-21 22:27:11 +02:00
yamazaki-mitsufumi 51ab1903e7 Expanding the scop of 2D thread distribution 2024-04-18 18:20:25 +09:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
shivammonaka 7102367fde Introduced callback to Pthread, Win32 and OpenMP backend 2024-04-02 08:53:49 +05:30
Mark Seminatore b0ad8a78ff code to fix lost work in case of re-entrant calls to exec_blas_async() 2024-03-28 15:24:52 -07:00
Martin Kroeker 88b5330ae7
Restore outer loop of blas_buffer_inuse setup 2024-03-24 18:33:21 +01:00
shivammonaka d49ebc54e1 Merge branch 'shivam-develop' into shivam-Locks 2024-02-29 11:58:14 +05:30
shivammonaka bc191015e3 Using OpenMP locks with NUM_PARALLEL 2024-02-29 11:47:05 +05:30
Mark Seminatore b29fd48998
Merge branch 'develop' into win_tidy 2024-02-12 10:23:17 -08:00
Mark Seminatore 98c56a7314 more cleanup 2024-02-08 13:50:15 -08:00
Chip Kerchner d408ecedba Add environment variable to display coretype for dynamic arch. 2024-02-08 12:17:18 -06:00
Chip Kerchner ac6b4b7aa4 Make sure CPU ID works for all POWER_10 conditions 2024-02-08 08:56:30 -06:00
Chip Kerchner 08ce6b1c1c Add missing CPU ID definitions for old versions of AIX. 2024-02-07 07:54:06 -06:00
Martin Kroeker a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2024-02-05 07:27:04 +01:00
Martin Kroeker e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids 2024-02-04 10:05:20 +01:00
Mark Seminatore 42cb567f0f more cleanup 2024-01-31 13:24:28 -08:00
Mark Seminatore 0d7fe5ea61 clean up whitespace 2024-01-29 22:33:47 -08:00