Commit Graph

8757 Commits

Author SHA1 Message Date
Chip Kerchner 2391dc1c0f Merge branch 'vectorizeBF16GEMV' of github.ibm.com:PowerAppLibs/OpenBLAS into vectorizeBF16GEMV 2024-10-13 13:48:33 -05:00
Chip Kerchner 36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Chip Kerchner f8e113f27b Replace types with include file. 2024-10-13 10:55:03 -05:00
Chip Kerchner a53a197934 Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV 2024-10-12 15:15:17 -05:00
Martin Kroeker 3184b7f209
Merge pull request #4933 from ChipKerchner/thread_sbgemv
Change multi-threading logic for SBGEMV to be the same as SGEMV.
2024-10-12 17:19:41 +02:00
Chip Kerchner 0082240044 Merge branch 'thread_sbgemv' into vectorizeBF16GEMV 2024-10-11 16:13:59 -05:00
Chip Kerchner 1d51ca5798 Change multi-threading logic for SBGEMV to be the same as SGEMV. 2024-10-11 16:08:48 -05:00
Chip Kerchner c8f53b85ce Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV 2024-10-11 11:10:20 -05:00
Martin Kroeker 18a23c23f7
Merge pull request #4929 from martin-frbg/issue4905
Fix CBLAS_?GEMMT filling in the wrong triangle for Row-Major
2024-10-11 08:54:02 +02:00
Martin Kroeker 5a79446bdb
Merge pull request #4918 from HaoZeke/testFixes
TST,BUG: Explicitly allow running tests multiple times
2024-10-10 21:53:18 +02:00
Martin Kroeker 7ba6591ff2
Merge branch 'OpenMathLib:develop' into issue4905 2024-10-10 21:50:38 +02:00
Martin Kroeker 550bc77832
Fix expectation values for CblasRowMajor order 2024-10-10 20:39:29 +02:00
Martin Kroeker e0ad20f72b
Merge pull request #4932 from martin-frbg/cirrusosxndk
Update Android NDK install path for M1/armv7 crossbuild on CirrusCI
2024-10-10 16:18:07 +02:00
Martin Kroeker e4bc5e4718
remove stray quote 2024-10-10 11:02:56 +02:00
Martin Kroeker b89fb9632f
Update Android NDK install path for M1/armv7 crossbuild 2024-10-10 10:19:11 +02:00
Martin Kroeker e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Martin Kroeker dbd83762f9
Merge pull request #4926 from NickelWenzel/fix_arm64_windows_and_uwp
fix: add missing NO_AFFINITY checks
2024-10-09 19:48:16 +02:00
Martin Kroeker 9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major 2024-10-09 18:06:39 +02:00
Gordon Fossum 0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
NickelWenzel bee123e8e3
fix: add missing NO_AFFINITY checks 2024-10-09 16:36:40 +02:00
Martin Kroeker 7ac5b9011f
Merge pull request #4923 from martin-frbg/zen5
Add preliminary cpu autodetection for Zen5/5c
2024-10-09 16:18:47 +02:00
Martin Kroeker 2c3b87a082
Add preliminary cpu autodetection for Zen5/5c 2024-10-08 23:07:42 +02:00
Martin Kroeker 73c1882129
Merge pull request #4922 from martin-frbg/issue4904-2
Update names of Loongarch64 targets in cmake cross-building
2024-10-07 13:24:14 +02:00
Martin Kroeker bc0691a556
Merge pull request #4920 from martin-frbg/issue4917
Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO
2024-10-07 08:26:03 +02:00
Martin Kroeker b0346e72f4
update names of loongarch64 targets for cross-compilation 2024-10-06 22:48:33 +02:00
Martin Kroeker 9c707dc6b9
Update dynamic arch list to new target scheme 2024-10-06 22:46:03 +02:00
Martin Kroeker 9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC 2024-10-06 22:43:11 +02:00
Martin Kroeker 0dfe42d62a
Merge pull request #4919 from martin-frbg/issue4916-2
Handle inf/nan in ppc440 s/dscal
2024-10-06 22:29:28 +02:00
Chip Kerchner d6bb8dcfd1 Common code. 2024-10-06 14:13:43 -05:00
Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
Martin Kroeker c9e92348a6
Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Rohit Goswami d9f368dfe6
TST: Signal abort for ctest failures correctly 2024-10-05 16:24:16 +00:00
Rohit Goswami 722e4ae07a
MAINT: Explicitly replace instead of unknown 2024-10-05 16:23:02 +00:00
Rohit Goswami a6b7751881
BUG: Allow tests to be run multiple times
Without failures due to existing files
2024-10-05 16:22:56 +00:00
Chip Kerchner 9ac0fb0111 Merge branch 'develop' into vectorizeBF16GEMV 2024-10-04 06:49:53 -05:00
Martin Kroeker 624e9d110e
Merge pull request #4916 from martin-frbg/issue4901
Fix SIGILL/SIGSEGV in PPCG4 SGEMM and fix NAN handling in PPCG4 SSCAL/DSCAL
2024-10-03 23:25:45 +02:00
Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Martin Kroeker 7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4 2024-10-03 22:00:15 +02:00
Chip Kerchner 915a6d6e44 Add casting. 2024-10-03 14:08:21 -05:00
Chip Kerchner 7ec3c16d82 Remove beta from optimized functions. 2024-10-03 13:27:33 -05:00
Martin Kroeker 54afc24e4d
Merge pull request #4906 from XiWeiGu/arm64_cmake_small_matrix_opt
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
2024-10-03 20:05:11 +02:00
Martin Kroeker b4495a8fb8
Merge branch 'develop' into arm64_cmake_small_matrix_opt 2024-10-03 20:04:52 +02:00
Martin Kroeker 68eefe60b9
Merge pull request #4915 from martin-frbg/issue4907
Support LoongArch64 compilation with LLVM
2024-10-03 18:29:29 +02:00
Martin Kroeker 4f00f02567
Do not add -mabi flags for Loongson when the compiler is flang 2024-10-03 16:06:33 +02:00
Martin Kroeker f817f26062
Add simpler EPILOGUE for clang 2024-10-03 16:01:10 +02:00
Martin Kroeker a492181665
filter out Loongarch -mabi options for flang-new 2024-10-03 15:58:47 +02:00
Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
Martin Kroeker edaf5933c4
Merge pull request #4913 from martin-frbg/issue4912
Declare the input array in CBLAS_?GEADD as const in cblas.h
2024-10-02 23:37:15 +02:00
Martin Kroeker 71131406ae
Declare the input array in CBLAS_?GEADD as const 2024-10-02 18:32:48 +02:00
Chip Kerchner 7cc00f68c9 Remove more duplicate. 2024-10-01 11:23:32 -05:00