Commit Graph

2216 Commits

Author SHA1 Message Date
Martin Kroeker
93d975d8fd Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd Get the l2 cache size via environment variable on confidential VM
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.

Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.

Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e Add support for Cortex-A76 2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39 Revert S/DNRM2 to the base NEON kernel to fix precision loss 2024-04-02 15:59:20 +02:00
gxw
96607cbb98 loongarch: Fixed dzamax
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8 loongarch: Fixed zrot LSX opt 2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac loongarch: Fixed {sc/dz}amax LSX opt 2024-03-19 09:56:11 +08:00
gxw
ad13e04669 loongarch: Fixed {s/d/sc/dz}amin LSX opt 2024-03-19 09:18:44 +08:00
gxw
bbf82cb624 loongarch: Fixed {s/d}axpby LSX opt 2024-03-18 17:51:42 +08:00
gxw
ac460eb42a loongarch: Fixed i{c/z}amin LSX opt 2024-03-18 17:15:58 +08:00
gxw
60e251a1f8 loongarch: Fixed {sc/dz}amax LASX opt 2024-03-16 14:52:17 +08:00
gxw
a10dde5554 loongarch: Fixed {s/d/sc/dz}amin LASX opt 2024-03-16 14:52:14 +08:00
gxw
6534d378b7 loongarch: Fixed {s/d/c/z}sum LASX opt 2024-03-16 14:52:10 +08:00
gxw
6159cffc58 loongarch: Fixed i{s/c/z}amin LASX opt 2024-03-16 14:52:06 +08:00
gxw
7d755912b9 loongarch: Fixed {s/d/c/z}axpby LASX opt 2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500 Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905 Update nrm2_vector.c 2024-03-13 11:40:14 +01:00
Martin Kroeker
18a6db6862 Update nrm2_vector.c 2024-03-13 11:10:26 +01:00
Martin Kroeker
3752e73919 handle incx < 0 2024-03-12 20:44:01 +01:00
Martin Kroeker
db70c7f7fb handle incx < 0 2024-03-12 20:42:11 +01:00
Martin Kroeker
dee8557d58 handle incx < 0 2024-03-12 20:40:29 +01:00
Martin Kroeker
d9dff17aec handle incx < 0 2024-03-12 20:38:23 +01:00
Martin Kroeker
552c521353 remove another early exit for incx < 0 2024-03-12 18:49:27 +01:00
Martin Kroeker
ed532dc75b remove another early exit for incx < 0 2024-03-12 18:47:00 +01:00
Martin Kroeker
6b89e1f1d7 fix loop condition for incx < 0 2024-03-12 15:49:41 +01:00
Martin Kroeker
20016a0096 fix loop condition for incx < 0 2024-03-12 15:48:55 +01:00
Martin Kroeker
09e84bd29a fix loop condition for incx < 0 2024-03-12 15:48:00 +01:00
Martin Kroeker
f747aedb52 fix loop condition for incx < 0 2024-03-12 15:47:17 +01:00
Martin Kroeker
23796f8d31 fix loop condition for incx < 0 2024-03-12 15:46:23 +01:00
Martin Kroeker
bf93459746 fix loop condition for incx < 0 2024-03-12 15:45:23 +01:00
Martin Kroeker
e41d01bad9 remove early exit on negative inc_x 2024-03-11 22:53:54 +01:00
Martin Kroeker
02a025f9c1 remove early exit on negative inc_x 2024-03-11 22:52:18 +01:00
pengxu
680a77fafc Optimized ssymv and dsymv kernel LSX for LoongArch 2024-03-05 20:36:59 +08:00
pengxu
6546600342 Optimized ssymv and dsymv kernel LASX for LoongArch 2024-03-04 16:18:39 +08:00
Chip-Kerchner
99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea15551, reversing
changes made to b925353006.
2024-03-01 07:57:39 -06:00
Martin Kroeker
577d480c62 Merge pull request #4529 from ErnstPeng/feature-branch
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 13:49:54 +01:00
pengxu
b2db064285 Optimized sgemv and dgemv kernel LSX for LoongArch 2024-02-28 18:07:27 +08:00
Martin Kroeker
cfbb701497 Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
Loongarch64 cgemv zgemv opt
2024-02-28 10:15:34 +01:00
gxw
8e05c053be LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions 2024-02-27 22:19:26 -05:00
gxw
3f22fc2233 LoongArch64: Add zgemv LSX opt 2024-02-27 22:19:04 -05:00
gxw
c508a10cf2 LoongArch64: Add cgemv LSX opt 2024-02-27 22:17:30 -05:00
Martin Kroeker
accea15551 Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
Martin Kroeker
8e872a91a9 Fix erroneous mapping of SUM kernels to ASUM 2024-02-27 11:28:50 +01:00
Martin Kroeker
6699227d45 Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw
8dea25ffff LoongArch64: Fixed utest kernel_regress:skx_avx 2024-02-26 02:04:37 -05:00
Martin Kroeker
7d506984fa fix assignment of default CSUM kernel 2024-02-25 17:57:11 +01:00
Martin Kroeker
12787775d9 add csum/zsum kernels (trivially derived from the asum ones)s) 2024-02-25 17:55:36 +01:00
Martin Kroeker
8f8ef3492a Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts) 2024-02-24 23:57:50 +01:00
Martin Kroeker
be5e18c6f9 Add kernel definitions for CSUM and ZSUM 2024-02-24 23:55:43 +01:00