Commit Graph

8533 Commits

Author SHA1 Message Date
Martin Kroeker 15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00
Martin Kroeker a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
Martin Kroeker 949a7f9393
Merge pull request #4811 from yamazakimitsufumi/add_a64fx_to_dynamic_arch
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
2024-07-25 19:13:04 +02:00
yamazaki-mitsufumi 88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Martin Kroeker 4140ac45d7
Merge pull request #4813 from martin-frbg/issue4812
Fix incompatible definitions of MAXLOC in f2c-converted LAPACK sources
2024-07-23 21:35:06 +02:00
Martin Kroeker 0096482f03
fix incompatible definitions of MAXLOC 2024-07-23 15:01:26 +02:00
Martin Kroeker ed82fd24fc
Merge pull request #4810 from martin-frbg/issue4805
Work around a gcc14.1 bug that breaks utest on Loongarch
2024-07-23 14:54:55 +02:00
yamazaki-mitsufumi 821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Martin Kroeker 29f3e759b9
work around a gcc14.1 bug observed on Loongarch 2024-07-23 11:20:48 +02:00
Chris Sidebottom ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom 7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Martin Kroeker a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker 5a845ef1f4
Merge pull request #4809 from penghongbo/reorder_gemm_gemvt
Change computational order in GEMV and GEMM Power6 kernel
2024-07-19 13:06:42 +02:00
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Chris Sidebottom a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00
Chris Sidebottom 9984c5ce9d Clean up k2 removal more and unroll SGEMM more 2024-07-18 18:35:43 +01:00
Chris Sidebottom b1c9fafabb Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 2024-07-18 17:37:18 +01:00
Martin Kroeker ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
gxw 34b80ce03f mips64: Fixed numpy CI failure 2024-07-17 10:32:22 +08:00
gxw f6d6c14a96 mips: Fixed numpy CI failure 2024-07-17 10:31:49 +08:00
Martin Kroeker e9f6aa46a4
Merge pull request #4800 from vlad0x00/patch-2
Add missing parentheses
2024-07-16 16:32:04 +02:00
Martin Kroeker b1aa2e1768
Merge pull request #4802 from markdryan/markdryan/rvv_axpby_incy0
Fix axpby_rvv kernels for cases where inc_y = 0
2024-07-16 14:22:38 +02:00
iha fujitsu 0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Vladimir Nikolić 56e1782ffb
Add another missing parenthesis 2024-07-15 15:15:23 -07:00
Vladimir Nikolić 127ea5d0d9
Add missing parenthesis 2024-07-15 15:12:21 -07:00
Martin Kroeker a3c10c6c25
Merge pull request #4799 from martin-frbg/issue4762
Improve the error message for (p)thread creation failure
2024-07-15 20:57:56 +02:00
Martin Kroeker a373d0f107
Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00
Mark Ryan 67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan 3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw 9b3e80efe2 utest: Add test_gemv 2024-07-15 16:33:09 +08:00
gxw 3f39c8f94f LoongArch: Fixed numpy CI failure 2024-07-15 11:43:08 +08:00
Martin Kroeker 6013b36b16
Merge pull request #4796 from martin-frbg/ppcbuf
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
2024-07-12 11:06:45 +02:00
Martin Kroeker 9789034281
Merge branch 'OpenMathLib:develop' into ppcbuf 2024-07-12 11:05:46 +02:00
gxw f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker 5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Martin Kroeker dfc11ef248
Merge pull request #4791 from ChipKerchner/vectorizeSBGEMMincopy
Vectorize SBGEMM incopy for Power10 - 4x faster.
2024-07-11 21:38:57 +02:00
Martin Kroeker 2fefdfa2b8
Merge branch 'OpenMathLib:develop' into azurewincl 2024-07-11 21:38:21 +02:00
Martin Kroeker 475bd2452b
Suffix BUFFERSIZEs as UL to prevent int overflow in computations 2024-07-11 20:13:57 +02:00
Martin Kroeker b70227ad62
Merge pull request #4795 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64*
2024-07-11 19:00:07 +02:00
Martin Kroeker 8277828fdc
Merge pull request #4785 from rgommers/docs-install
Rewrite "Install OpenBLAS" docs page
2024-07-11 18:49:07 +02:00
Martin Kroeker f0fc7249f1
Merge pull request #4792 from martin-frbg/issue4790
Fix core assignment in cpu detection for Intel family 15
2024-07-11 17:38:43 +02:00
Martin Kroeker 362856fece
Merge pull request #4778 from JAicewizard/develop
Add support for RISCV64_GENERIC in cmake
2024-07-11 15:12:46 +02:00
Martin Kroeker 1d77647d1b
Merge pull request #4769 from drupol/fix-buffersize-value
openblas: fix `BUFFERSIZE` value
2024-07-11 14:45:50 +02:00
Piotr Kubaj 4c12090776
Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
Chip Kerchner f708944fea Add all 4 variations of the SBGEMM to compare_sgemm_sbgemm 2024-07-10 13:07:48 -05:00
Martin Kroeker e706bc1ec0
Fix core assignment for Intel family 15 2024-07-09 20:22:56 +02:00
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker a5c04e326a
Update scal.c 2024-07-04 22:28:01 +02:00
Ralf Gommers 268dcd8f45 docs: convert remaining install sections (Android, iOS, FreeBSD, Cortex-M) 2024-07-04 19:16:30 +02:00
Ralf Gommers 452014341e docs: rework building from source on Windows section 2024-07-04 19:16:30 +02:00