Commit Graph

2342 Commits

Author SHA1 Message Date
Chip Kerchner 7cc00f68c9 Remove more duplicate. 2024-10-01 11:23:32 -05:00
Chip Kerchner e238a68c03 Remove duplicate. 2024-10-01 11:06:23 -05:00
Chip Kerchner 32095b0cbb Remove parameter. 2024-10-01 09:32:42 -05:00
Chip Kerchner c8788208c8 Fixing block issue with transpose version. 2024-09-27 13:27:03 -05:00
Chip Kerchner d7c0d87cd1 Small changes. 2024-09-26 15:21:29 -05:00
Chip Kerchner eb6f3a05ef Common MMA code. 2024-09-26 09:28:56 -05:00
Chip Kerchner fb287d17fc Common code. 2024-09-25 16:31:36 -05:00
Chip Kerchner 8ab6245771 Small change. 2024-09-24 16:50:21 -05:00
Chip Kerchner df19375560 Almost final code for MMA. 2024-09-24 16:30:01 -05:00
Chip Kerchner 05aa63e738 More MMA BF16 GEMV code. 2024-09-24 12:54:02 -05:00
Chip Kerchner c9ce37d527 Force vector pairs in clang. 2024-09-23 08:43:58 -05:00
Chip Kerchner 89a12fa083 MMA BF16 GEMV code. 2024-09-23 06:32:14 -05:00
Chip Kerchner 7947970f9d Move common code. 2024-09-13 06:22:13 -05:00
Chip Kerchner 72216d28c2 Fix bug with inc_y adding results twice. 2024-09-11 08:47:32 -05:00
Chip Kerchner 2f142ee857 More common code. 2024-09-09 14:41:55 -05:00
Chip Kerchner 39fd29f1de Minor improvement and turn off BF16 GEMV forwarding by default. 2024-09-08 18:28:31 -05:00
Chip Kerchner 8541b25e1d Special case beta is one. 2024-09-06 14:48:48 -05:00
Chip Kerchner 76227e2948 Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV. 2024-09-06 14:03:31 -05:00
Martin Kroeker e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros 2024-08-15 22:14:29 +02:00
Chip Kerchner a0aeba631d Merge branch 'develop' into betterPowerGEMVTail 2024-08-15 08:00:00 -05:00
Chip Kerchner 083faf7556 Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 15:56:03 -05:00
Chip Kerchner 75472b830a Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 10:52:46 -05:00
Henry Chen ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c.
2024-08-14 18:05:11 +08:00
Martin Kroeker 7ca835a82c
address clang array overflow warning 2024-08-10 13:44:56 +02:00
Martin Kroeker 46e331a917
remove the unworkable GEMM3M restriction from GENERIC again 2024-08-07 19:41:10 +02:00
Martin Kroeker ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM 2024-08-07 19:39:02 +02:00
Martin Kroeker f1c9803f9a
add proper return statement 2024-08-04 00:14:31 +02:00
Martin Kroeker 60abcc3991
add proper return statement 2024-08-04 00:13:31 +02:00
Chip Kerchner 1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker 9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker a875304eb0
fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker 24acdd6bbb
correct offset 2024-07-26 09:49:24 +02:00
Martin Kroeker fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker 15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00
Martin Kroeker a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
yamazaki-mitsufumi 88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Martin Kroeker b613754143
Update scal..c 2024-07-24 14:31:29 +02:00
Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Martin Kroeker dfbc2348a8
fix NAN handling 2024-07-20 18:27:15 +02:00
Martin Kroeker c064319ecb
fix alpha=NAN case 2024-07-20 17:42:31 +02:00
Martin Kroeker c2ffd90e8c
make NAN handling depend on dummy2 parameter 2024-07-20 17:31:00 +02:00
Chris Sidebottom ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom 7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Martin Kroeker a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker dd6c33d34d
make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Chris Sidebottom a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00