Commit Graph

739 Commits

Author SHA1 Message Date
Martin Kroeker db070a9223
add gemm_batch drivers 2024-05-31 18:29:27 +02:00
Martin Kroeker d0794f88dc
add gemm_batch driver 2024-05-29 15:49:20 +02:00
Martin Kroeker 0073affe63
Merge pull request #4693 from goplanid/locks-improvement
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker 6ca9ffa7f5
Merge pull request #4655 from yamazakimitsufumi/update_2d_thread_distribution
Expanding the scope of 2D thread distribution to improve multi-threaded DGEMM performance
2024-05-14 18:12:43 +02:00
Deeksha Goplani 0dc80a5c8d locks improvement 2024-05-13 22:17:23 +05:30
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw 637c650f4f loongarch64: Add buffer offset for target LOONGSON3R5 2024-05-10 11:42:53 +08:00
Martin Kroeker 5500b4ab26
Merge pull request #4680 from theAeon/develop
Expose whether locking is enabled in get_config
2024-05-08 19:03:57 +02:00
Martin Kroeker f0f1ff7820
fix HUGETLB allocation for TLS mode as well 2024-05-08 00:40:36 +02:00
Andrew Robbins edfe1aa471
Expose whether locking is enabled in get_config 2024-05-07 11:12:03 -04:00
Martin Kroeker dc99b61380
sort unwanted interdependencies of alloc_shm and alloc_hugetlb 2024-05-04 14:49:00 +02:00
Martin Kroeker ddcd7d6fa8
Merge branch 'develop' into Threading_Callback 2024-04-21 22:27:11 +02:00
yamazaki-mitsufumi 51ab1903e7 Expanding the scop of 2D thread distribution 2024-04-18 18:20:25 +09:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
shivammonaka 7102367fde Introduced callback to Pthread, Win32 and OpenMP backend 2024-04-02 08:53:49 +05:30
Mark Seminatore b0ad8a78ff code to fix lost work in case of re-entrant calls to exec_blas_async() 2024-03-28 15:24:52 -07:00
Martin Kroeker 88b5330ae7
Restore outer loop of blas_buffer_inuse setup 2024-03-24 18:33:21 +01:00
shivammonaka d49ebc54e1 Merge branch 'shivam-develop' into shivam-Locks 2024-02-29 11:58:14 +05:30
shivammonaka bc191015e3 Using OpenMP locks with NUM_PARALLEL 2024-02-29 11:47:05 +05:30
Mark Seminatore b29fd48998
Merge branch 'develop' into win_tidy 2024-02-12 10:23:17 -08:00
Mark Seminatore 98c56a7314 more cleanup 2024-02-08 13:50:15 -08:00
Chip Kerchner d408ecedba Add environment variable to display coretype for dynamic arch. 2024-02-08 12:17:18 -06:00
Chip Kerchner ac6b4b7aa4 Make sure CPU ID works for all POWER_10 conditions 2024-02-08 08:56:30 -06:00
Chip Kerchner 08ce6b1c1c Add missing CPU ID definitions for old versions of AIX. 2024-02-07 07:54:06 -06:00
Martin Kroeker a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2024-02-05 07:27:04 +01:00
Martin Kroeker e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids 2024-02-04 10:05:20 +01:00
Mark Seminatore 42cb567f0f more cleanup 2024-01-31 13:24:28 -08:00
Mark Seminatore 0d7fe5ea61 clean up whitespace 2024-01-29 22:33:47 -08:00
Martin Kroeker d938aed7fe
reset "mem structure overflowed" state on shutdown 2024-01-23 17:15:53 +01:00
Chris Sidebottom aaf65210cc Add dynamic support for Arm(R) Neoverse(TM) V2 processor
Whilst I figure out how best to map the L2 parameters without
duplicating all of `ARMV8SVE`, lets just map this to `NEOVERSEV1`.
2024-01-19 19:05:50 +00:00
Martin Kroeker 152a6c43b6
Add blas_omp_threads_local 2024-01-14 19:59:55 +01:00
Martin Kroeker 8a9d492af7
Add default for blas_omp_threads_local 2024-01-14 19:58:49 +01:00
Martin Kroeker 87d31af2ae
Add openblas_set_num_threads_local() 2024-01-13 20:06:24 +01:00
Martin Kroeker e7a895e714
Add Apple M as NeoverseN1 2023-12-25 12:36:05 +01:00
Chris Sidebottom dc20a78188 Use functionally equivalent dynamic targets
Similar to `drivers/other/dynamic.c`, I've looked for functionally
equivalent targets and mapped them in the default DYNAMIC_ARCH build.
Users can still build specific cores using DYNAMIC_LIST.
2023-12-23 12:45:27 +00:00
Mark Seminatore 6bd7c54af5 introduce MT_TRACE to clean up SMP_DEBUG code 2023-12-11 15:13:04 -08:00
Mark Seminatore edac80d7e8 some cleanup, dynamically scale threads, add missing WIN_CASE defn 2023-12-07 14:59:27 -08:00
Mark Seminatore 4ebf814b42 fix bug failing to mark task as finished. 2023-12-05 23:28:37 -08:00
Mark Seminatore 5f51811728 try at new threading model 2023-12-05 22:43:36 -08:00
Shiyou Yin 1310a0931b loongarch: Refine build control for loongarch64.
1. Use getauxval instead of cpucfg to test hardware capability.
2. Remove unnecessary code and option for compiler check in c_check.
2023-11-28 20:23:55 +08:00
Chip-Kerchner d99aad8ee3 Fix older version of gcc - missing __has_builtin, cpuid and no support of P10. 2023-11-14 11:07:08 -06:00
Martin Kroeker 9b5f8eb33a
Fix empty function prototypes 2023-11-12 19:35:53 +01:00
Martin Kroeker 9324520d0e
typo fix 2023-11-11 23:14:58 +01:00
Martin Kroeker ff6437f2d7
Add workaround for omp_get_max_threads hanging on FreeBSD with libomp from LLVM14 2023-11-11 21:30:32 +01:00
Chip-Kerchner 4eecccd49b Fix __builtin_cpu_is for AIX. 2023-11-08 07:12:21 -06:00
Chip-Kerchner 5e31c57083 Only define __builtin_cpu_is and __builtin_cpu_supports if not present. 2023-11-07 20:58:34 -06:00
Chip-Kerchner 7dcb2d67f2 Have POWER7 return arch=POWER6. 2023-11-01 15:23:28 -05:00
Chip-Kerchner c8882bd9d8 Remove POWER7 from cpu list. 2023-11-01 14:53:55 -05:00
Chip Kerchner badfb2e60f Merge branch 'develop' into XLC-AIX 2023-10-26 09:19:31 -05:00
Martin Kroeker e12aaed13d
Fix unwanted fallthrough from Intel Family 6 to 15 in case of identification failure 2023-10-18 16:28:54 +02:00