Martin Kroeker
db070a9223
add gemm_batch drivers
2024-05-31 18:29:27 +02:00
Martin Kroeker
d0794f88dc
add gemm_batch driver
2024-05-29 15:49:20 +02:00
Martin Kroeker
0073affe63
Merge pull request #4693 from goplanid/locks-improvement
...
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker
6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
...
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani
0dc80a5c8d
locks improvement
2024-05-13 22:17:23 +05:30
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
2024-05-10 11:42:53 +08:00
Martin Kroeker
5500b4ab26
Merge pull request #4680 from theAeon/develop
...
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
Martin Kroeker
f0f1ff7820
fix HUGETLB allocation for TLS mode as well
2024-05-08 00:40:36 +02:00
Andrew Robbins
edfe1aa471
Expose whether locking is enabled in get_config
2024-05-07 11:12:03 -04:00
Martin Kroeker
dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb
2024-05-04 14:49:00 +02:00
Martin Kroeker
ddcd7d6fa8
Merge branch 'develop' into Threading_Callback
2024-04-21 22:27:11 +02:00
yamazaki-mitsufumi
51ab1903e7
Expanding the scop of 2D thread distribution
2024-04-18 18:20:25 +09:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
shivammonaka
7102367fde
Introduced callback to Pthread, Win32 and OpenMP backend
2024-04-02 08:53:49 +05:30
Mark Seminatore
b0ad8a78ff
code to fix lost work in case of re-entrant calls to exec_blas_async()
2024-03-28 15:24:52 -07:00
Martin Kroeker
88b5330ae7
Restore outer loop of blas_buffer_inuse setup
2024-03-24 18:33:21 +01:00
shivammonaka
d49ebc54e1
Merge branch 'shivam-develop' into shivam-Locks
2024-02-29 11:58:14 +05:30
shivammonaka
bc191015e3
Using OpenMP locks with NUM_PARALLEL
2024-02-29 11:47:05 +05:30
Mark Seminatore
b29fd48998
Merge branch 'develop' into win_tidy
2024-02-12 10:23:17 -08:00
Mark Seminatore
98c56a7314
more cleanup
2024-02-08 13:50:15 -08:00
Chip Kerchner
d408ecedba
Add environment variable to display coretype for dynamic arch.
2024-02-08 12:17:18 -06:00
Chip Kerchner
ac6b4b7aa4
Make sure CPU ID works for all POWER_10 conditions
2024-02-08 08:56:30 -06:00
Chip Kerchner
08ce6b1c1c
Add missing CPU ID definitions for old versions of AIX.
2024-02-07 07:54:06 -06:00
Martin Kroeker
a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
...
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2024-02-05 07:27:04 +01:00
Martin Kroeker
e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids
2024-02-04 10:05:20 +01:00
Mark Seminatore
42cb567f0f
more cleanup
2024-01-31 13:24:28 -08:00
Mark Seminatore
0d7fe5ea61
clean up whitespace
2024-01-29 22:33:47 -08:00
Martin Kroeker
d938aed7fe
reset "mem structure overflowed" state on shutdown
2024-01-23 17:15:53 +01:00
Chris Sidebottom
aaf65210cc
Add dynamic support for Arm(R) Neoverse(TM) V2 processor
...
Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
2024-01-19 19:05:50 +00:00
Martin Kroeker
152a6c43b6
Add blas_omp_threads_local
2024-01-14 19:59:55 +01:00
Martin Kroeker
8a9d492af7
Add default for blas_omp_threads_local
2024-01-14 19:58:49 +01:00
Martin Kroeker
87d31af2ae
Add openblas_set_num_threads_local()
2024-01-13 20:06:24 +01:00
Martin Kroeker
e7a895e714
Add Apple M as NeoverseN1
2023-12-25 12:36:05 +01:00
Chris Sidebottom
dc20a78188
Use functionally equivalent dynamic targets
...
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Mark Seminatore
6bd7c54af5
introduce MT_TRACE to clean up SMP_DEBUG code
2023-12-11 15:13:04 -08:00
Mark Seminatore
edac80d7e8
some cleanup, dynamically scale threads, add missing WIN_CASE defn
2023-12-07 14:59:27 -08:00
Mark Seminatore
4ebf814b42
fix bug failing to mark task as finished.
2023-12-05 23:28:37 -08:00
Mark Seminatore
5f51811728
try at new threading model
2023-12-05 22:43:36 -08:00
Shiyou Yin
1310a0931b
loongarch: Refine build control for loongarch64.
...
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
2023-11-28 20:23:55 +08:00
Chip-Kerchner
d99aad8ee3
Fix older version of gcc - missing __has_builtin, cpuid and no support of P10.
2023-11-14 11:07:08 -06:00
Martin Kroeker
9b5f8eb33a
Fix empty function prototypes
2023-11-12 19:35:53 +01:00
Martin Kroeker
9324520d0e
typo fix
2023-11-11 23:14:58 +01:00
Martin Kroeker
ff6437f2d7
Add workaround for omp_get_max_threads hanging on FreeBSD with libomp from LLVM14
2023-11-11 21:30:32 +01:00
Chip-Kerchner
4eecccd49b
Fix __builtin_cpu_is for AIX.
2023-11-08 07:12:21 -06:00
Chip-Kerchner
5e31c57083
Only define __builtin_cpu_is and __builtin_cpu_supports if not present.
2023-11-07 20:58:34 -06:00
Chip-Kerchner
7dcb2d67f2
Have POWER7 return arch=POWER6.
2023-11-01 15:23:28 -05:00
Chip-Kerchner
c8882bd9d8
Remove POWER7 from cpu list.
2023-11-01 14:53:55 -05:00
Chip Kerchner
badfb2e60f
Merge branch 'develop' into XLC-AIX
2023-10-26 09:19:31 -05:00
Martin Kroeker
e12aaed13d
Fix unwanted fallthrough from Intel Family 6 to 15 in case of identification failure
2023-10-18 16:28:54 +02:00