2343 Commits

Author SHA1 Message Date
gxw
73c6a28073 x86_64: opt somatcopy_ct with AVX 2024-10-29 07:06:15 +00:00
Ayappan Perumal
020cce1068 Fix build issues with gcc compiler as well 2024-10-23 04:24:06 -05:00
Ayappan Perumal
b6ec73e77c Fix AIX build 2024-10-21 07:38:03 -05:00
Martin Kroeker
016bdb9b0b Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
2024-10-18 14:03:06 +02:00
Chip Kerchner
ab71a1edf2 Better VSX. 2024-10-17 08:25:02 -05:00
gxw
bb31bbef52 LoongArch64: Opt somatcopy_ct with LASX 2024-10-17 11:45:13 +00:00
gxw
b37129341b LoongArch64: Opt somatcopy_cn with LASX 2024-10-17 11:27:55 +00:00
gxw
acf6cab304 LoongArch64: Opt somatcopy_rn with LASX 2024-10-17 09:50:02 +00:00
gxw
15edb441bf LoongArch64: Opt somatcopy_rt with LASX 2024-10-17 09:15:42 +00:00
Chip Kerchner
36bd3eeddf Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power). 2024-10-13 13:46:11 -05:00
Martin Kroeker
e52d9b4cf1 Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Gordon Fossum
0b7fb5c791 CGEMM & ZGEMM using C code. 2024-10-09 09:42:23 -05:00
Martin Kroeker
9783dd07ab Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC 2024-10-06 22:43:11 +02:00
Martin Kroeker
c9e92348a6 Handle inf/nan if dummy2 flag is set 2024-10-06 19:57:17 +02:00
Martin Kroeker
d714013ab9 change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Martin Kroeker
de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
gxw
30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Deeksha Goplani
4894c54055 Improve TN case with further unrolling 2024-09-02 22:22:49 +05:30
Martin Kroeker
e05d98d00a expressly use fld.d/fst.d for floating point registers instead of LD/ST macros 2024-08-15 22:14:29 +02:00
Chip Kerchner
a0aeba631d Merge branch 'develop' into betterPowerGEMVTail 2024-08-15 08:00:00 -05:00
Chip Kerchner
083faf7556 Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 15:56:03 -05:00
Chip Kerchner
75472b830a Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 10:52:46 -05:00
Henry Chen
ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c.
2024-08-14 18:05:11 +08:00
Martin Kroeker
7ca835a82c address clang array overflow warning 2024-08-10 13:44:56 +02:00
Martin Kroeker
46e331a917 remove the unworkable GEMM3M restriction from GENERIC again 2024-08-07 19:41:10 +02:00
Martin Kroeker
ccc23338d7 have the dummy GEMM3M kernel at least forward to regular GEMM 2024-08-07 19:39:02 +02:00
Martin Kroeker
f1c9803f9a add proper return statement 2024-08-04 00:14:31 +02:00
Martin Kroeker
60abcc3991 add proper return statement 2024-08-04 00:13:31 +02:00
Chip Kerchner
1a7b8c650d Merge branch 'develop' into betterPowerGEMVTail 2024-08-01 14:59:12 -05:00
Martin Kroeker
9afd0c8afd Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker
edbf093c98 Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom
ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Martin Kroeker
a875304eb0 fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker
24acdd6bbb correct offset 2024-07-26 09:49:24 +02:00
Martin Kroeker
fb7c53c5e5 Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker
15c53dd2e0 Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00
Martin Kroeker
a4e56e0452 Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
yamazaki-mitsufumi
88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Martin Kroeker
b613754143 Update scal..c 2024-07-24 14:31:29 +02:00
Martin Kroeker
f5d04318e3 Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker
73f8866ffb make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Martin Kroeker
dfbc2348a8 fix NAN handling 2024-07-20 18:27:15 +02:00
Martin Kroeker
c064319ecb fix alpha=NAN case 2024-07-20 17:42:31 +02:00
Martin Kroeker
c2ffd90e8c make NAN handling depend on dummy2 parameter 2024-07-20 17:31:00 +02:00
Chris Sidebottom
ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom
7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Martin Kroeker
a815594fd1 Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker
dd6c33d34d make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Hong Bo Peng
db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00