Commit Graph

2356 Commits

Author SHA1 Message Date
Chip Kerchner c8f53b85ce Merge remote-tracking branch 'origin/develop' into vectorizeBF16GEMV 2024-10-11 11:10:20 -05:00
Martin Kroeker e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum 0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Martin Kroeker 9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC 2024-10-06 22:43:11 +02:00
Chip Kerchner d6bb8dcfd1 Common code. 2024-10-06 14:13:43 -05:00
Martin Kroeker c9e92348a6
Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Chip Kerchner 9ac0fb0111 Merge branch 'develop' into vectorizeBF16GEMV 2024-10-04 06:49:53 -05:00
Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Chip Kerchner 915a6d6e44 Add casting. 2024-10-03 14:08:21 -05:00
Chip Kerchner 7ec3c16d82 Remove beta from optimized functions. 2024-10-03 13:27:33 -05:00
Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
Chip Kerchner 7cc00f68c9 Remove more duplicate. 2024-10-01 11:23:32 -05:00
Chip Kerchner e238a68c03 Remove duplicate. 2024-10-01 11:06:23 -05:00
Chip Kerchner 32095b0cbb Remove parameter. 2024-10-01 09:32:42 -05:00
gxw 30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw 48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner c8788208c8 Fixing block issue with transpose version. 2024-09-27 13:27:03 -05:00
Chip Kerchner d7c0d87cd1 Small changes. 2024-09-26 15:21:29 -05:00
Chip Kerchner eb6f3a05ef Common MMA code. 2024-09-26 09:28:56 -05:00
Chip Kerchner fb287d17fc Common code. 2024-09-25 16:31:36 -05:00
Chip Kerchner 8ab6245771 Small change. 2024-09-24 16:50:21 -05:00
Chip Kerchner df19375560 Almost final code for MMA. 2024-09-24 16:30:01 -05:00
Chip Kerchner 05aa63e738 More MMA BF16 GEMV code. 2024-09-24 12:54:02 -05:00
Chip Kerchner c9ce37d527 Force vector pairs in clang. 2024-09-23 08:43:58 -05:00
Chip Kerchner 89a12fa083 MMA BF16 GEMV code. 2024-09-23 06:32:14 -05:00
Chip Kerchner 7947970f9d Move common code. 2024-09-13 06:22:13 -05:00
Chip Kerchner 72216d28c2 Fix bug with inc_y adding results twice. 2024-09-11 08:47:32 -05:00
Chip Kerchner 2f142ee857 More common code. 2024-09-09 14:41:55 -05:00
Chip Kerchner 39fd29f1de Minor improvement and turn off BF16 GEMV forwarding by default. 2024-09-08 18:28:31 -05:00
Chip Kerchner 8541b25e1d Special case beta is one. 2024-09-06 14:48:48 -05:00
Chip Kerchner 76227e2948 Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV. 2024-09-06 14:03:31 -05:00
Deeksha Goplani 4894c54055 Improve TN case with further unrolling 2024-09-02 22:22:49 +05:30
Martin Kroeker e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros 2024-08-15 22:14:29 +02:00
Chip Kerchner a0aeba631d Merge branch 'develop' into betterPowerGEMVTail 2024-08-15 08:00:00 -05:00
Chip Kerchner 083faf7556 Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 15:56:03 -05:00
Chip Kerchner 75472b830a Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 10:52:46 -05:00
Henry Chen ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c.
2024-08-14 18:05:11 +08:00
Martin Kroeker 7ca835a82c
address clang array overflow warning 2024-08-10 13:44:56 +02:00
Martin Kroeker 46e331a917
remove the unworkable GEMM3M restriction from GENERIC again 2024-08-07 19:41:10 +02:00
Martin Kroeker ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM 2024-08-07 19:39:02 +02:00
Martin Kroeker f1c9803f9a
add proper return statement 2024-08-04 00:14:31 +02:00
Martin Kroeker 60abcc3991
add proper return statement 2024-08-04 00:13:31 +02:00
Chip Kerchner 1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker 9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker a875304eb0
fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker 24acdd6bbb
correct offset 2024-07-26 09:49:24 +02:00
Martin Kroeker fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker 15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00