Commit Graph

2343 Commits

Author SHA1 Message Date
Martin Kroeker 020b3e1682
fix handling of INF arguments 2024-06-01 00:51:18 +02:00
Martin Kroeker 8c05765a5a
fix other corner cases where x=INF 2024-05-31 18:06:36 +02:00
Martin Kroeker 516743f7dc
fix other instances of mishandling INF 2024-05-31 16:02:12 +02:00
Martin Kroeker 9ff4e9714e
additional fixes for handling INF arguments 2024-05-31 15:44:07 +02:00
Martin Kroeker ce130f11d2
Update zscal.c 2024-05-31 15:09:03 +02:00
Martin Kroeker ab13cfef93
more fixes for infinite x 2024-05-31 14:34:49 +02:00
Martin Kroeker ad2b5c67c8
fix another corner case involving infinity 2024-05-31 01:06:58 +02:00
Bart Oldeman 62f7b244ff Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996 fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.

isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Rajalakshmi Srinivasaraghavan e112191b54 POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX:   18 out of  5190 tests failed to pass the threshold
zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
zgd.out: ZGV drivers:      6 out of   1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Matthias Langer 0050a9660b Correctly detect ARM Neoverse V2 CPUs. 2024-05-16 09:59:52 +00:00
Chip Kerchner 3a1417671a POWER: Fixing endianness issue in cswap/zswap kernel for AIX 2024-05-15 19:36:46 -05:00
Amrita H S 87b3d9054f Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2024-05-12 23:27:55 -05:00
Martin Kroeker 8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
2024-05-10 11:29:12 +02:00
gxw f9a26240a7 loongarch64: Fixed icamax_lsx 2024-05-10 14:16:40 +08:00
gxw cb0f707409 loongarch64: Fixed utest fork:safety 2024-05-10 14:16:36 +08:00
Martin Kroeker b45d8e1ab2
remove stray comma 2024-05-09 12:33:19 +02:00
gxw 6017ad7146 loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6 2024-05-08 10:10:26 +08:00
Martin Kroeker 992b71fea2
remove stray comma 2024-04-23 21:52:26 +02:00
Martin Kroeker d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
2024-04-23 21:05:08 +02:00
Martin Kroeker ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
2024-04-23 18:25:49 +02:00
gxw 7cd438a5ac loongarch64: Fixed clang compilation issues 2024-04-23 19:19:11 +08:00
Zoltán Böszörményi ca64861ce8 Add forgotten conditional uses of PREFETCH
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.

Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
2024-04-19 10:52:28 +02:00
gxw 9c39e969f5 mips64: Fixed MSA optimization bugs for zgemv and cgemv 2024-04-15 15:17:29 +08:00
Martin Kroeker 4c03ed437f
Fix SICORTEX ASUM/ZASUM and SUM/ZSUM for INCX <=0 (#4640)
* Exit early if INCX <= 0
2024-04-14 15:39:11 +02:00
Martin Kroeker 7cfd433d0c
revert the C/Z NRM2 kernels to the base NEON kernel as well 2024-04-12 15:34:04 +02:00
Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
2024-04-10 14:23:31 +02:00
gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 2024-04-09 09:03:34 -04:00
Chen Yu 8e39c05efd Get the l2 cache size via environment variable on confidential VM
The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential
VMs. As a result the get_l2_size() returns the default 512M which brings
performance issues.

Introduce the environment variable OPENBLAS_L2_SIZE provided by the user
to get the l2 cache size.

Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2024-04-05 11:39:01 +08:00
Martin Kroeker 441c81026e
Add support for Cortex-A76 2024-04-02 19:41:44 +02:00
Martin Kroeker 9ead81bd39
Revert S/DNRM2 to the base NEON kernel to fix precision loss 2024-04-02 15:59:20 +02:00
gxw 96607cbb98 loongarch: Fixed dzamax
Initialize the registers to prevent sporadic errors.
2024-03-25 23:17:53 -04:00
gxw 50869f6ca8 loongarch: Fixed zrot LSX opt 2024-03-19 10:08:11 +08:00
gxw b5eb9d6bac loongarch: Fixed {sc/dz}amax LSX opt 2024-03-19 09:56:11 +08:00
gxw ad13e04669 loongarch: Fixed {s/d/sc/dz}amin LSX opt 2024-03-19 09:18:44 +08:00
gxw bbf82cb624 loongarch: Fixed {s/d}axpby LSX opt 2024-03-18 17:51:42 +08:00
gxw ac460eb42a loongarch: Fixed i{c/z}amin LSX opt 2024-03-18 17:15:58 +08:00
gxw 60e251a1f8 loongarch: Fixed {sc/dz}amax LASX opt 2024-03-16 14:52:17 +08:00
gxw a10dde5554 loongarch: Fixed {s/d/sc/dz}amin LASX opt 2024-03-16 14:52:14 +08:00
gxw 6534d378b7 loongarch: Fixed {s/d/c/z}sum LASX opt 2024-03-16 14:52:10 +08:00
gxw 6159cffc58 loongarch: Fixed i{s/c/z}amin LASX opt 2024-03-16 14:52:06 +08:00
gxw 7d755912b9 loongarch: Fixed {s/d/c/z}axpby LASX opt 2024-03-16 14:51:56 +08:00
Martin Kroeker cf80bd8500
Update nrm2_rvv.c 2024-03-13 13:07:26 +01:00
Martin Kroeker 9baa757905
Update nrm2_vector.c 2024-03-13 11:40:14 +01:00
Martin Kroeker 18a6db6862
Update nrm2_vector.c 2024-03-13 11:10:26 +01:00
Martin Kroeker 3752e73919
handle incx < 0 2024-03-12 20:44:01 +01:00
Martin Kroeker db70c7f7fb
handle incx < 0 2024-03-12 20:42:11 +01:00
Martin Kroeker dee8557d58
handle incx < 0 2024-03-12 20:40:29 +01:00
Martin Kroeker d9dff17aec
handle incx < 0 2024-03-12 20:38:23 +01:00
Martin Kroeker 552c521353
remove another early exit for incx < 0 2024-03-12 18:49:27 +01:00
Martin Kroeker ed532dc75b
remove another early exit for incx < 0 2024-03-12 18:47:00 +01:00
Martin Kroeker 6b89e1f1d7
fix loop condition for incx < 0 2024-03-12 15:49:41 +01:00
Martin Kroeker 20016a0096
fix loop condition for incx < 0 2024-03-12 15:48:55 +01:00
Martin Kroeker 09e84bd29a
fix loop condition for incx < 0 2024-03-12 15:48:00 +01:00
Martin Kroeker f747aedb52
fix loop condition for incx < 0 2024-03-12 15:47:17 +01:00
Martin Kroeker 23796f8d31
fix loop condition for incx < 0 2024-03-12 15:46:23 +01:00
Martin Kroeker bf93459746
fix loop condition for incx < 0 2024-03-12 15:45:23 +01:00
Martin Kroeker e41d01bad9
remove early exit on negative inc_x 2024-03-11 22:53:54 +01:00
Martin Kroeker 02a025f9c1
remove early exit on negative inc_x 2024-03-11 22:52:18 +01:00
pengxu 680a77fafc Optimized ssymv and dsymv kernel LSX for LoongArch 2024-03-05 20:36:59 +08:00
Chris Sidebottom 7a6fa699f2 Small GEMM for AArch64
This is a fairly conservative addition of small matrix kernels using
SVE.
2024-03-04 15:48:47 +00:00
pengxu 6546600342 Optimized ssymv and dsymv kernel LASX for LoongArch 2024-03-04 16:18:39 +08:00
Chip-Kerchner 99384933ff Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code"
This reverts commit accea15551, reversing
changes made to b925353006.
2024-03-01 07:57:39 -06:00
Martin Kroeker 577d480c62
Merge pull request #4529 from ErnstPeng/feature-branch
Optimized sgemv and dgemv kernel LSX for LoongArch
2024-02-28 13:49:54 +01:00
pengxu b2db064285 Optimized sgemv and dgemv kernel LSX for LoongArch 2024-02-28 18:07:27 +08:00
Martin Kroeker cfbb701497
Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt
Loongarch64 cgemv zgemv opt
2024-02-28 10:15:34 +01:00
gxw 8e05c053be LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions 2024-02-27 22:19:26 -05:00
gxw 3f22fc2233 LoongArch64: Add zgemv LSX opt 2024-02-27 22:19:04 -05:00
gxw c508a10cf2 LoongArch64: Add cgemv LSX opt 2024-02-27 22:17:30 -05:00
Martin Kroeker accea15551
Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code
Cgemm zgemm c code
2024-02-27 22:07:07 +01:00
Martin Kroeker 8e872a91a9
Fix erroneous mapping of SUM kernels to ASUM 2024-02-27 11:28:50 +01:00
Martin Kroeker 6699227d45
Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx
LoongArch64: Fixed utest kernel_regress:skx_avx
2024-02-26 09:49:34 +01:00
gxw 8dea25ffff LoongArch64: Fixed utest kernel_regress:skx_avx 2024-02-26 02:04:37 -05:00
Martin Kroeker 7d506984fa
fix assignment of default CSUM kernel 2024-02-25 17:57:11 +01:00
Martin Kroeker 12787775d9
add csum/zsum kernels (trivially derived from the asum ones)s) 2024-02-25 17:55:36 +01:00
Martin Kroeker 8f8ef3492a
Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts) 2024-02-24 23:57:50 +01:00
Martin Kroeker be5e18c6f9
Add kernel definitions for CSUM and ZSUM 2024-02-24 23:55:43 +01:00
gxw 990507e3b8 LoongArch64: Opt zgemv with LASX 2024-02-22 11:58:02 +08:00
gxw d51ffec3a2 LoongArch64: Opt cgemv with LASX 2024-02-22 11:56:04 +08:00
pengxu 4787a55c64 Optimized cgemm kernel 16x4 LASX for LoongArch 2024-02-21 15:28:47 +08:00
Sergei Lewis ba17758c02 fix axpy implementations where y has a stride of 0 2024-02-16 16:00:38 +00:00
Dmitry Mikushin d0f5dc763b Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500 2024-02-12 02:29:58 +01:00
Sergei Lewis ff1523163f Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V. 2024-02-09 12:59:14 +00:00
pengxu fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker 6d8a273cca
Handle zero increment(s) in C910V ?AXPBY (#4483)
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
gxw adde725321 LoongArch64: Fixed {s/d}amin LSX optimization 2024-02-04 14:44:47 +08:00
gxw 7bc93d95a1 LoongArch64: Opt {c/z}axpby 2024-02-04 11:23:31 +08:00
gxw 1e1f487dc7 LoongArch64: Fixed {s/d}axpby 2024-02-04 09:41:37 +08:00
Martin Kroeker 4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels 2024-02-04 01:05:03 +01:00
austinpagan 87ba528d8b Changed C files to straighten out indentation. Removed commented lines from other file. 2024-02-01 18:46:07 -06:00
austinpagan 461cf9083c Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code 2024-02-01 12:40:04 -06:00
austinpagan ddac75e0ef Adding .C versions of CGEMM and ZGEMM 2024-02-01 12:24:25 -06:00
Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 2024-02-01 08:11:43 -06:00
Sergei Lewis 3ffd6868d7 Merge branch 'develop' into dev/slewis/merge-from-riscv 2024-02-01 11:29:41 +00:00
Sergei Lewis a3b0ef6596 Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling 2024-02-01 10:32:00 +00:00
Martin Kroeker d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
2024-01-31 14:19:19 +01:00
gxw 969601a1dc X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
2024-01-31 11:23:59 +08:00
Martin Kroeker 98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
2024-01-30 22:53:29 +01:00
Chip Kerchner 09bb48d1b9 Vectorize in-copy packing/copying for SGEMM - 4X faster. 2024-01-30 09:13:16 -06:00
gxw 83ce97a4ca LoongArch64: Handle NAN and INF 2024-01-30 17:17:30 +08:00
gxw a79d117405 LoogArch64: Fixed bug for {s/d}amin 2024-01-30 11:32:57 +08:00
Sergei Lewis 1093def0d1 Merge branch 'risc-v' into develop 2024-01-29 11:11:39 +00:00
Martin Kroeker 889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker 4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
gxw 276e3ebf9e LoongArch64: Add dzamax and dzamin opt 2024-01-26 10:03:50 +08:00
Martin Kroeker a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov 9c49a81d54 Resolve conflicts 2024-01-23 19:08:53 +03:00
kseniyazaytseva e1afb23811 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar deecfb1a39 Merge branch 'risc-v' into img-riscv64-zvl128b 2024-01-19 12:26:38 +00:00
kseniyazaytseva 5222b5fc18 Added axpby kernels for GENERIC RISC-V target 2024-01-18 23:22:26 +03:00
kseniyazaytseva ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker 88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Dirreke ec89466e14 Add CSKY support 2024-01-16 23:45:06 +08:00
Sergei Lewis 9edb805e64 fix builds with t-head toolchains that use old versions of the intrinsics spec 2024-01-16 14:33:08 +00:00
Martin Kroeker 0d2e486edf
Handle NAN and INF 2024-01-15 11:18:59 +01:00
Martin Kroeker 5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
2024-01-12 16:30:50 +01:00
Martin Kroeker f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
2024-01-12 14:27:08 +01:00
Martin Kroeker 20413ee6ec
Update zscal.c 2024-01-12 13:11:13 +01:00
Martin Kroeker b57627c27f
Handle NAN and INF 2024-01-12 12:03:08 +01:00
Martin Kroeker 995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability 2024-01-12 00:12:46 +01:00
Martin Kroeker 7df363e1e2
temporarily disable the MSA C/ZSCAL kernels 2024-01-12 00:08:52 +01:00
Chip-Kerchner 058dd2a4cb Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions. 2024-01-08 14:16:09 -06:00
Martin Kroeker 1c31f56e5a
Handle NAN 2024-01-08 16:11:25 +01:00
Martin Kroeker 7ee1ee38e2
Handle NaN in input 2024-01-08 14:20:07 +01:00
Martin Kroeker f637e12713
Handle INF and NAN 2024-01-08 09:52:38 +01:00
Martin Kroeker 25b0c48082
Update zscal.c 2024-01-08 09:49:18 +01:00
Martin Kroeker 5e7f714e93
Update zscal.c 2024-01-08 08:17:40 +01:00
Martin Kroeker cf8b03ae8b
Use NAN rather than SNAN for portability 2024-01-07 23:09:57 +01:00
Martin Kroeker f0808d856b
Handle NAN in input 2024-01-07 20:27:29 +01:00
Martin Kroeker acf17a825d
Handle NAN in input 2024-01-07 20:26:16 +01:00
Martin Kroeker c9df62e883
Fix handling of NAN 2024-01-07 17:49:40 +01:00
Martin Kroeker def4996170
Fix handling of NAN and INF arguments 2024-01-07 15:29:42 +01:00
Martin Kroeker 519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
gxw 546f13558c loongarch64: Add {c/z}swap and {c/z}sum optimization 2023-12-29 17:30:57 +08:00
Hao Chen edabb93668 loongarch64: Refine axpby optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 1ec5dded43 loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 3c53ded315 loongarch64: Add c/znrm2 optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen d97272cb35 loongarch64: Add c/zdot optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 65a0aeb128 loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 2a34fb4b80 loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00