Martin Kroeker
d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
...
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
...
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac
loongarch64: Fixed clang compilation issues
2024-04-23 19:19:11 +08:00
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
...
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw
9c39e969f5
mips64: Fixed MSA optimization bugs for zgemv and cgemv
2024-04-15 15:17:29 +08:00
Martin Kroeker
4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 ( #4640 )
...
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
2024-04-12 15:34:04 +02:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd
Get the l2 cache size via environment variable on confidential VM
...
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.
Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.
Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e
Add support for Cortex-A76
2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
2024-04-02 15:59:20 +02:00
gxw
96607cbb98
loongarch: Fixed dzamax
...
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8
loongarch: Fixed zrot LSX opt
2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac
loongarch: Fixed {sc/dz}amax LSX opt
2024-03-19 09:56:11 +08:00
gxw
ad13e04669
loongarch: Fixed {s/d/sc/dz}amin LSX opt
2024-03-19 09:18:44 +08:00
gxw
bbf82cb624
loongarch: Fixed {s/d}axpby LSX opt
2024-03-18 17:51:42 +08:00
gxw
ac460eb42a
loongarch: Fixed i{c/z}amin LSX opt
2024-03-18 17:15:58 +08:00
gxw
60e251a1f8
loongarch: Fixed {sc/dz}amax LASX opt
2024-03-16 14:52:17 +08:00
gxw
a10dde5554
loongarch: Fixed {s/d/sc/dz}amin LASX opt
2024-03-16 14:52:14 +08:00
gxw
6534d378b7
loongarch: Fixed {s/d/c/z}sum LASX opt
2024-03-16 14:52:10 +08:00
gxw
6159cffc58
loongarch: Fixed i{s/c/z}amin LASX opt
2024-03-16 14:52:06 +08:00
gxw
7d755912b9
loongarch: Fixed {s/d/c/z}axpby LASX opt
2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500
Update nrm2_rvv.c
2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905
Update nrm2_vector.c
2024-03-13 11:40:14 +01:00
Martin Kroeker
18a6db6862
Update nrm2_vector.c
2024-03-13 11:10:26 +01:00
Martin Kroeker
3752e73919
handle incx < 0
2024-03-12 20:44:01 +01:00
Martin Kroeker
db70c7f7fb
handle incx < 0
2024-03-12 20:42:11 +01:00
Martin Kroeker
dee8557d58
handle incx < 0
2024-03-12 20:40:29 +01:00
Martin Kroeker
d9dff17aec
handle incx < 0
2024-03-12 20:38:23 +01:00
Martin Kroeker
552c521353
remove another early exit for incx < 0
2024-03-12 18:49:27 +01:00
Martin Kroeker
ed532dc75b
remove another early exit for incx < 0
2024-03-12 18:47:00 +01:00
Martin Kroeker
6b89e1f1d7
fix loop condition for incx < 0
2024-03-12 15:49:41 +01:00
Martin Kroeker
20016a0096
fix loop condition for incx < 0
2024-03-12 15:48:55 +01:00
Martin Kroeker
09e84bd29a
fix loop condition for incx < 0
2024-03-12 15:48:00 +01:00
Martin Kroeker
f747aedb52
fix loop condition for incx < 0
2024-03-12 15:47:17 +01:00
Martin Kroeker
23796f8d31
fix loop condition for incx < 0
2024-03-12 15:46:23 +01:00
Martin Kroeker
bf93459746
fix loop condition for incx < 0
2024-03-12 15:45:23 +01:00
Martin Kroeker
e41d01bad9
remove early exit on negative inc_x
2024-03-11 22:53:54 +01:00
Martin Kroeker
02a025f9c1
remove early exit on negative inc_x
2024-03-11 22:52:18 +01:00
pengxu
680a77fafc
Optimized ssymv and dsymv kernel LSX for LoongArch
2024-03-05 20:36:59 +08:00
Chris Sidebottom
7a6fa699f2
Small GEMM for AArch64
...
This is a fairly conservative addition of small matrix kernels using
SVE.
2024-03-04 15:48:47 +00:00
pengxu
6546600342
Optimized ssymv and dsymv kernel LASX for LoongArch
2024-03-04 16:18:39 +08:00
Chip-Kerchner
99384933ff
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
...
This reverts commit accea15551 , reversing
changes made to b925353006 .
2024-03-01 07:57:39 -06:00
Martin Kroeker
577d480c62
Merge pull request #4529 from ErnstPeng/feature-branch
...
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 13:49:54 +01:00
pengxu
b2db064285
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 18:07:27 +08:00
Martin Kroeker
cfbb701497
Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
...
Loongarch64 cgemv zgemv opt
2024-02-28 10:15:34 +01:00
gxw
8e05c053be
LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions
2024-02-27 22:19:26 -05:00
gxw
3f22fc2233
LoongArch64: Add zgemv LSX opt
2024-02-27 22:19:04 -05:00
gxw
c508a10cf2
LoongArch64: Add cgemv LSX opt
2024-02-27 22:17:30 -05:00
Martin Kroeker
accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
...
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
Martin Kroeker
8e872a91a9
Fix erroneous mapping of SUM kernels to ASUM
2024-02-27 11:28:50 +01:00
Martin Kroeker
6699227d45
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
...
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw
8dea25ffff
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 02:04:37 -05:00
Martin Kroeker
7d506984fa
fix assignment of default CSUM kernel
2024-02-25 17:57:11 +01:00
Martin Kroeker
12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s)
2024-02-25 17:55:36 +01:00
Martin Kroeker
8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)
2024-02-24 23:57:50 +01:00
Martin Kroeker
be5e18c6f9
Add kernel definitions for CSUM and ZSUM
2024-02-24 23:55:43 +01:00
gxw
990507e3b8
LoongArch64: Opt zgemv with LASX
2024-02-22 11:58:02 +08:00
gxw
d51ffec3a2
LoongArch64: Opt cgemv with LASX
2024-02-22 11:56:04 +08:00
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
2024-02-21 15:28:47 +08:00
Sergei Lewis
ba17758c02
fix axpy implementations where y has a stride of 0
2024-02-16 16:00:38 +00:00
Dmitry Mikushin
d0f5dc763b
Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500
2024-02-12 02:29:58 +01:00
Sergei Lewis
ff1523163f
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
2024-02-09 12:59:14 +00:00
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
...
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker
b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
...
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker
6d8a273cca
Handle zero increment(s) in C910V ?AXPBY ( #4483 )
...
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker
dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
...
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker
dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
...
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw
adde725321
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-04 14:44:47 +08:00
gxw
7bc93d95a1
LoongArch64: Opt {c/z}axpby
2024-02-04 11:23:31 +08:00
gxw
1e1f487dc7
LoongArch64: Fixed {s/d}axpby
2024-02-04 09:41:37 +08:00
Martin Kroeker
4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels
2024-02-04 01:05:03 +01:00
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
2024-02-01 18:46:07 -06:00
austinpagan
461cf9083c
Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code
2024-02-01 12:40:04 -06:00
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
2024-02-01 12:24:25 -06:00
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2024-02-01 08:11:43 -06:00
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
2024-02-01 11:29:41 +00:00
Sergei Lewis
a3b0ef6596
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
2024-02-01 10:32:00 +00:00
Martin Kroeker
d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
...
utest: Add tests for zscal
2024-01-31 14:19:19 +01:00
gxw
969601a1dc
X86_64: Fixed bug in zscal
...
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker
98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
...
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2024-01-30 09:13:16 -06:00
gxw
83ce97a4ca
LoongArch64: Handle NAN and INF
2024-01-30 17:17:30 +08:00
gxw
a79d117405
LoogArch64: Fixed bug for {s/d}amin
2024-01-30 11:32:57 +08:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker
4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
...
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw
276e3ebf9e
LoongArch64: Add dzamax and dzamin opt
2024-01-26 10:03:50 +08:00
Martin Kroeker
a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
...
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov
9c49a81d54
Resolve conflicts
2024-01-23 19:08:53 +03:00
kseniyazaytseva
e1afb23811
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
...
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar
deecfb1a39
Merge branch 'risc-v' into img-riscv64-zvl128b
2024-01-19 12:26:38 +00:00
kseniyazaytseva
5222b5fc18
Added axpby kernels for GENERIC RISC-V target
2024-01-18 23:22:26 +03:00
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
...
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker
88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
...
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00
Sergei Lewis
9edb805e64
fix builds with t-head toolchains that use old versions of the intrinsics spec
2024-01-16 14:33:08 +00:00
Martin Kroeker
0d2e486edf
Handle NAN and INF
2024-01-15 11:18:59 +01:00
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
...
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
...
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker
20413ee6ec
Update zscal.c
2024-01-12 13:11:13 +01:00
Martin Kroeker
b57627c27f
Handle NAN and INF
2024-01-12 12:03:08 +01:00
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
2024-01-12 00:12:46 +01:00
Martin Kroeker
7df363e1e2
temporarily disable the MSA C/ZSCAL kernels
2024-01-12 00:08:52 +01:00
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2024-01-08 14:16:09 -06:00
Martin Kroeker
1c31f56e5a
Handle NAN
2024-01-08 16:11:25 +01:00
Martin Kroeker
7ee1ee38e2
Handle NaN in input
2024-01-08 14:20:07 +01:00
Martin Kroeker
f637e12713
Handle INF and NAN
2024-01-08 09:52:38 +01:00
Martin Kroeker
25b0c48082
Update zscal.c
2024-01-08 09:49:18 +01:00
Martin Kroeker
5e7f714e93
Update zscal.c
2024-01-08 08:17:40 +01:00
Martin Kroeker
cf8b03ae8b
Use NAN rather than SNAN for portability
2024-01-07 23:09:57 +01:00
Martin Kroeker
f0808d856b
Handle NAN in input
2024-01-07 20:27:29 +01:00
Martin Kroeker
acf17a825d
Handle NAN in input
2024-01-07 20:26:16 +01:00
Martin Kroeker
c9df62e883
Fix handling of NAN
2024-01-07 17:49:40 +01:00
Martin Kroeker
def4996170
Fix handling of NAN and INF arguments
2024-01-07 15:29:42 +01:00
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
...
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2023-12-29 17:30:57 +08:00
guxiwei
e771be185e
Optimize copy functions with lsx.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
179ed51d3b
Add dgemm_kernel_8x4.S file.
2023-12-29 17:30:57 +08:00
Hao Chen
173a65d4e6
loongarch64: Add and refine iamax optimization functions.
2023-12-29 17:30:57 +08:00
zhoupeng
ea70e165c7
loongarch64: Refine rot optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
116aee7527
loongarch64: Refine imin optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
8be2654193
loongarch64: Refine imax optimization.
2023-12-29 17:30:57 +08:00
zhoupeng
154baad454
loongarch64: Refine iamin optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
36c12c4971
loongarch64: Refine copy,swap,nrm2,sum optimization.
2023-12-29 17:30:57 +08:00
Shiyou Yin
c6996a80e9
loongarch64: Refine amax,amin,max,min optimization.
2023-12-29 17:30:57 +08:00
Chris Sidebottom
ecae1389df
Reduce duplication in kernel definitions
...
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Chris Sidebottom
60e66725e4
Use numeric labels to allow repeated inlining
2023-12-19 13:11:06 +00:00
Chris Sidebottom
7a4fef4f60
Tweak SVE dot kernel
...
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Martin Kroeker
f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one
2023-12-15 09:58:44 +01:00
barracuda156
d9653af018
KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
...
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner
93747fb377
Merge remote-tracking branch 'origin/develop' into power10Copies
2023-12-12 09:32:49 -06:00
Chip-Kerchner
4e738e561a
Replace two vector loads with one vector pair load and fix endianess of stores.
2023-12-08 12:36:08 -06:00
yancheng
d32f38fb37
loongarch64: Add optimizations for nrm2.
2023-12-07 14:36:26 +08:00
yancheng
f9b468990e
loongarch64: Add optimizations for rot.
2023-12-07 14:36:26 +08:00
yancheng
c80e7e27d1
loongarch64: Add optimizations for sum and asum.
2023-12-07 14:36:26 +08:00