Martin Kroeker
020b3e1682
fix handling of INF arguments
2024-06-01 00:51:18 +02:00
Martin Kroeker
8c05765a5a
fix other corner cases where x=INF
2024-05-31 18:06:36 +02:00
Martin Kroeker
516743f7dc
fix other instances of mishandling INF
2024-05-31 16:02:12 +02:00
Martin Kroeker
9ff4e9714e
additional fixes for handling INF arguments
2024-05-31 15:44:07 +02:00
Martin Kroeker
ce130f11d2
Update zscal.c
2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93
more fixes for infinite x
2024-05-31 14:34:49 +02:00
Martin Kroeker
ad2b5c67c8
fix another corner case involving infinity
2024-05-31 01:06:58 +02:00
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
...
Commit def4996
fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
...
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
2024-05-16 09:59:52 +00:00
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
2024-05-15 19:36:46 -05:00
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
...
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
...
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw
f9a26240a7
loongarch64: Fixed icamax_lsx
2024-05-10 14:16:40 +08:00
gxw
cb0f707409
loongarch64: Fixed utest fork:safety
2024-05-10 14:16:36 +08:00
Martin Kroeker
b45d8e1ab2
remove stray comma
2024-05-09 12:33:19 +02:00
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
2024-05-08 10:10:26 +08:00
Martin Kroeker
992b71fea2
remove stray comma
2024-04-23 21:52:26 +02:00
Martin Kroeker
d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
...
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker
ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
...
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw
7cd438a5ac
loongarch64: Fixed clang compilation issues
2024-04-23 19:19:11 +08:00
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
...
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw
9c39e969f5
mips64: Fixed MSA optimization bugs for zgemv and cgemv
2024-04-15 15:17:29 +08:00
Martin Kroeker
4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 ( #4640 )
...
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker
7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well
2024-04-12 15:34:04 +02:00
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
...
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
2024-04-09 09:03:34 -04:00
Chen Yu
8e39c05efd
Get the l2 cache size via environment variable on confidential VM
...
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.
Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.
Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker
441c81026e
Add support for Cortex-A76
2024-04-02 19:41:44 +02:00
Martin Kroeker
9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss
2024-04-02 15:59:20 +02:00
gxw
96607cbb98
loongarch: Fixed dzamax
...
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw
50869f6ca8
loongarch: Fixed zrot LSX opt
2024-03-19 10:08:11 +08:00
gxw
b5eb9d6bac
loongarch: Fixed {sc/dz}amax LSX opt
2024-03-19 09:56:11 +08:00
gxw
ad13e04669
loongarch: Fixed {s/d/sc/dz}amin LSX opt
2024-03-19 09:18:44 +08:00
gxw
bbf82cb624
loongarch: Fixed {s/d}axpby LSX opt
2024-03-18 17:51:42 +08:00
gxw
ac460eb42a
loongarch: Fixed i{c/z}amin LSX opt
2024-03-18 17:15:58 +08:00
gxw
60e251a1f8
loongarch: Fixed {sc/dz}amax LASX opt
2024-03-16 14:52:17 +08:00
gxw
a10dde5554
loongarch: Fixed {s/d/sc/dz}amin LASX opt
2024-03-16 14:52:14 +08:00
gxw
6534d378b7
loongarch: Fixed {s/d/c/z}sum LASX opt
2024-03-16 14:52:10 +08:00
gxw
6159cffc58
loongarch: Fixed i{s/c/z}amin LASX opt
2024-03-16 14:52:06 +08:00
gxw
7d755912b9
loongarch: Fixed {s/d/c/z}axpby LASX opt
2024-03-16 14:51:56 +08:00
Martin Kroeker
cf80bd8500
Update nrm2_rvv.c
2024-03-13 13:07:26 +01:00
Martin Kroeker
9baa757905
Update nrm2_vector.c
2024-03-13 11:40:14 +01:00
Martin Kroeker
18a6db6862
Update nrm2_vector.c
2024-03-13 11:10:26 +01:00
Martin Kroeker
3752e73919
handle incx < 0
2024-03-12 20:44:01 +01:00
Martin Kroeker
db70c7f7fb
handle incx < 0
2024-03-12 20:42:11 +01:00
Martin Kroeker
dee8557d58
handle incx < 0
2024-03-12 20:40:29 +01:00
Martin Kroeker
d9dff17aec
handle incx < 0
2024-03-12 20:38:23 +01:00
Martin Kroeker
552c521353
remove another early exit for incx < 0
2024-03-12 18:49:27 +01:00
Martin Kroeker
ed532dc75b
remove another early exit for incx < 0
2024-03-12 18:47:00 +01:00
Martin Kroeker
6b89e1f1d7
fix loop condition for incx < 0
2024-03-12 15:49:41 +01:00
Martin Kroeker
20016a0096
fix loop condition for incx < 0
2024-03-12 15:48:55 +01:00
Martin Kroeker
09e84bd29a
fix loop condition for incx < 0
2024-03-12 15:48:00 +01:00
Martin Kroeker
f747aedb52
fix loop condition for incx < 0
2024-03-12 15:47:17 +01:00
Martin Kroeker
23796f8d31
fix loop condition for incx < 0
2024-03-12 15:46:23 +01:00
Martin Kroeker
bf93459746
fix loop condition for incx < 0
2024-03-12 15:45:23 +01:00
Martin Kroeker
e41d01bad9
remove early exit on negative inc_x
2024-03-11 22:53:54 +01:00
Martin Kroeker
02a025f9c1
remove early exit on negative inc_x
2024-03-11 22:52:18 +01:00
pengxu
680a77fafc
Optimized ssymv and dsymv kernel LSX for LoongArch
2024-03-05 20:36:59 +08:00
Chris Sidebottom
7a6fa699f2
Small GEMM for AArch64
...
This is a fairly conservative addition of small matrix kernels using
SVE.
2024-03-04 15:48:47 +00:00
pengxu
6546600342
Optimized ssymv and dsymv kernel LASX for LoongArch
2024-03-04 16:18:39 +08:00
Chip-Kerchner
99384933ff
Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
...
This reverts commit accea15551
, reversing
changes made to b925353006
.
2024-03-01 07:57:39 -06:00
Martin Kroeker
577d480c62
Merge pull request #4529 from ErnstPeng/feature-branch
...
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 13:49:54 +01:00
pengxu
b2db064285
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 18:07:27 +08:00
Martin Kroeker
cfbb701497
Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
...
Loongarch64 cgemv zgemv opt
2024-02-28 10:15:34 +01:00
gxw
8e05c053be
LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions
2024-02-27 22:19:26 -05:00
gxw
3f22fc2233
LoongArch64: Add zgemv LSX opt
2024-02-27 22:19:04 -05:00
gxw
c508a10cf2
LoongArch64: Add cgemv LSX opt
2024-02-27 22:17:30 -05:00
Martin Kroeker
accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
...
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
Martin Kroeker
8e872a91a9
Fix erroneous mapping of SUM kernels to ASUM
2024-02-27 11:28:50 +01:00
Martin Kroeker
6699227d45
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
...
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw
8dea25ffff
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 02:04:37 -05:00
Martin Kroeker
7d506984fa
fix assignment of default CSUM kernel
2024-02-25 17:57:11 +01:00
Martin Kroeker
12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s)
2024-02-25 17:55:36 +01:00
Martin Kroeker
8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)
2024-02-24 23:57:50 +01:00
Martin Kroeker
be5e18c6f9
Add kernel definitions for CSUM and ZSUM
2024-02-24 23:55:43 +01:00
gxw
990507e3b8
LoongArch64: Opt zgemv with LASX
2024-02-22 11:58:02 +08:00
gxw
d51ffec3a2
LoongArch64: Opt cgemv with LASX
2024-02-22 11:56:04 +08:00
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
2024-02-21 15:28:47 +08:00
Sergei Lewis
ba17758c02
fix axpy implementations where y has a stride of 0
2024-02-16 16:00:38 +00:00
Dmitry Mikushin
d0f5dc763b
Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500
2024-02-12 02:29:58 +01:00
Sergei Lewis
ff1523163f
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
2024-02-09 12:59:14 +00:00
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
2024-02-06 11:49:01 +08:00
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
...
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker
b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
...
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker
6d8a273cca
Handle zero increment(s) in C910V ?AXPBY ( #4483 )
...
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker
dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
...
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker
dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
...
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw
adde725321
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-04 14:44:47 +08:00
gxw
7bc93d95a1
LoongArch64: Opt {c/z}axpby
2024-02-04 11:23:31 +08:00
gxw
1e1f487dc7
LoongArch64: Fixed {s/d}axpby
2024-02-04 09:41:37 +08:00
Martin Kroeker
4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels
2024-02-04 01:05:03 +01:00
austinpagan
87ba528d8b
Changed C files to straighten out indentation. Removed commented lines from other file.
2024-02-01 18:46:07 -06:00
austinpagan
461cf9083c
Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code
2024-02-01 12:40:04 -06:00
austinpagan
ddac75e0ef
Adding .C versions of CGEMM and ZGEMM
2024-02-01 12:24:25 -06:00
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
2024-02-01 08:11:43 -06:00
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
2024-02-01 11:29:41 +00:00
Sergei Lewis
a3b0ef6596
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
2024-02-01 10:32:00 +00:00
Martin Kroeker
d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
...
utest: Add tests for zscal
2024-01-31 14:19:19 +01:00
gxw
969601a1dc
X86_64: Fixed bug in zscal
...
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker
98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
...
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
2024-01-30 09:13:16 -06:00
gxw
83ce97a4ca
LoongArch64: Handle NAN and INF
2024-01-30 17:17:30 +08:00
gxw
a79d117405
LoogArch64: Fixed bug for {s/d}amin
2024-01-30 11:32:57 +08:00
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
2024-01-29 11:11:39 +00:00
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
...
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker
4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
...
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw
276e3ebf9e
LoongArch64: Add dzamax and dzamin opt
2024-01-26 10:03:50 +08:00
Martin Kroeker
a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
...
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov
9c49a81d54
Resolve conflicts
2024-01-23 19:08:53 +03:00
kseniyazaytseva
e1afb23811
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
...
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar
deecfb1a39
Merge branch 'risc-v' into img-riscv64-zvl128b
2024-01-19 12:26:38 +00:00
kseniyazaytseva
5222b5fc18
Added axpby kernels for GENERIC RISC-V target
2024-01-18 23:22:26 +03:00
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
...
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
...
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker
88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
...
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke
ec89466e14
Add CSKY support
2024-01-16 23:45:06 +08:00
Sergei Lewis
9edb805e64
fix builds with t-head toolchains that use old versions of the intrinsics spec
2024-01-16 14:33:08 +00:00
Martin Kroeker
0d2e486edf
Handle NAN and INF
2024-01-15 11:18:59 +01:00
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
...
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker
f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
...
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker
20413ee6ec
Update zscal.c
2024-01-12 13:11:13 +01:00
Martin Kroeker
b57627c27f
Handle NAN and INF
2024-01-12 12:03:08 +01:00
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
2024-01-12 00:12:46 +01:00
Martin Kroeker
7df363e1e2
temporarily disable the MSA C/ZSCAL kernels
2024-01-12 00:08:52 +01:00
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
2024-01-08 14:16:09 -06:00
Martin Kroeker
1c31f56e5a
Handle NAN
2024-01-08 16:11:25 +01:00
Martin Kroeker
7ee1ee38e2
Handle NaN in input
2024-01-08 14:20:07 +01:00
Martin Kroeker
f637e12713
Handle INF and NAN
2024-01-08 09:52:38 +01:00
Martin Kroeker
25b0c48082
Update zscal.c
2024-01-08 09:49:18 +01:00
Martin Kroeker
5e7f714e93
Update zscal.c
2024-01-08 08:17:40 +01:00
Martin Kroeker
cf8b03ae8b
Use NAN rather than SNAN for portability
2024-01-07 23:09:57 +01:00
Martin Kroeker
f0808d856b
Handle NAN in input
2024-01-07 20:27:29 +01:00
Martin Kroeker
acf17a825d
Handle NAN in input
2024-01-07 20:26:16 +01:00
Martin Kroeker
c9df62e883
Fix handling of NAN
2024-01-07 17:49:40 +01:00
Martin Kroeker
def4996170
Fix handling of NAN and INF arguments
2024-01-07 15:29:42 +01:00
Martin Kroeker
519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
...
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu
a5d0d21378
loongarch64: Add zgemm and cgemm optimization
2023-12-29 18:06:26 +08:00
gxw
546f13558c
loongarch64: Add {c/z}swap and {c/z}sum optimization
2023-12-29 17:30:57 +08:00
Hao Chen
edabb93668
loongarch64: Refine axpby optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
1ec5dded43
loongarch64: Add c/zrot optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
3c53ded315
loongarch64: Add c/znrm2 optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
fbd612f8c4
loongarch64: Add ic/zamin optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
d97272cb35
loongarch64: Add c/zdot optimization functions.
2023-12-29 17:30:57 +08:00
Hao Chen
65a0aeb128
loongarch64: Add c/zcopy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
2a34fb4b80
loongarch64: Add and refine scal optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
8785e948b5
loongarch64: Add camin optimization function.
2023-12-29 17:30:57 +08:00
Hao Chen
0753848e03
loongarch64: Refine and add axpy optimization functions.
...
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen
06fd5b5995
loongarch64: Add and Refine asum optimization functions.
2023-12-29 17:30:57 +08:00