Martin Kroeker
5a845ef1f4
Merge pull request #4809 from penghongbo/reorder_gemm_gemvt
...
Change computational order in GEMV and GEMM Power6 kernel
2024-07-19 13:06:42 +02:00
Hong Bo Peng
db98f8753f
Try to fix LAPACK testing failures on P7.
...
1. Remove the FADD insn from the GEMV Transpose code.
2. Remove the FADD insn from GEMM and ZGEMM code.
3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Chris Sidebottom
a9edddb695
Unroll TN further
2024-07-18 20:04:15 +01:00
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
2024-07-18 18:35:43 +01:00
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
2024-07-18 17:37:18 +01:00
Martin Kroeker
2020569705
fix NAN handling and make it depend on dummy2 parameter
2024-07-17 23:55:54 +02:00
Martin Kroeker
3870995f01
make NAN handling depend on dummy2 parameter
2024-07-17 23:54:24 +02:00
Martin Kroeker
7284c533b5
make NAN handling depend on dummy2 parameter
2024-07-17 23:50:40 +02:00
Martin Kroeker
73751218a4
make NAN handling depend on dummy2 parameter
2024-07-17 23:41:26 +02:00
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
2024-07-17 23:29:50 +02:00
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
2024-07-17 23:24:19 +02:00
Martin Kroeker
ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
...
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
gxw
34b80ce03f
mips64: Fixed numpy CI failure
2024-07-17 10:32:22 +08:00
gxw
f6d6c14a96
mips: Fixed numpy CI failure
2024-07-17 10:31:49 +08:00
Martin Kroeker
e9f6aa46a4
Merge pull request #4800 from vlad0x00/patch-2
...
Add missing parentheses
2024-07-16 16:32:04 +02:00
Martin Kroeker
b1aa2e1768
Merge pull request #4802 from markdryan/markdryan/rvv_axpby_incy0
...
Fix axpby_rvv kernels for cases where inc_y = 0
2024-07-16 14:22:38 +02:00
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-16 17:31:33 +09:00
Vladimir Nikolić
56e1782ffb
Add another missing parenthesis
2024-07-15 15:15:23 -07:00
Vladimir Nikolić
127ea5d0d9
Add missing parenthesis
2024-07-15 15:12:21 -07:00
Martin Kroeker
a3c10c6c25
Merge pull request #4799 from martin-frbg/issue4762
...
Improve the error message for (p)thread creation failure
2024-07-15 20:57:56 +02:00
Martin Kroeker
a373d0f107
Improve the error message for thread creation failure
2024-07-15 18:32:21 +02:00
Mark Ryan
67bf4b6998
Fix axpby_rvv kernels for cases where inc_y = 0
...
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.
TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]
The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan
3b715e6162
Add autodetection for riscv64
...
Implement DYNAMIC_ARCH support for riscv64. Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly. Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0. The
approach taken is to first try hwprobe. If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.
Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.
A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw
9b3e80efe2
utest: Add test_gemv
2024-07-15 16:33:09 +08:00
gxw
3f39c8f94f
LoongArch: Fixed numpy CI failure
2024-07-15 11:43:08 +08:00
Martin Kroeker
6013b36b16
Merge pull request #4796 from martin-frbg/ppcbuf
...
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
2024-07-12 11:06:45 +02:00
Martin Kroeker
9789034281
Merge branch 'OpenMathLib:develop' into ppcbuf
2024-07-12 11:05:46 +02:00
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
2024-07-12 16:09:30 +08:00
Martin Kroeker
5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
...
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Martin Kroeker
dfc11ef248
Merge pull request #4791 from ChipKerchner/vectorizeSBGEMMincopy
...
Vectorize SBGEMM incopy for Power10 - 4x faster.
2024-07-11 21:38:57 +02:00
Martin Kroeker
2fefdfa2b8
Merge branch 'OpenMathLib:develop' into azurewincl
2024-07-11 21:38:21 +02:00
Martin Kroeker
475bd2452b
Suffix BUFFERSIZEs as UL to prevent int overflow in computations
2024-07-11 20:13:57 +02:00
Martin Kroeker
b70227ad62
Merge pull request #4795 from pkubaj/patch-1
...
Fix build on FreeBSD/powerpc64*
2024-07-11 19:00:07 +02:00
Martin Kroeker
8277828fdc
Merge pull request #4785 from rgommers/docs-install
...
Rewrite "Install OpenBLAS" docs page
2024-07-11 18:49:07 +02:00
Martin Kroeker
f0fc7249f1
Merge pull request #4792 from martin-frbg/issue4790
...
Fix core assignment in cpu detection for Intel family 15
2024-07-11 17:38:43 +02:00
Martin Kroeker
362856fece
Merge pull request #4778 from JAicewizard/develop
...
Add support for RISCV64_GENERIC in cmake
2024-07-11 15:12:46 +02:00
Martin Kroeker
1d77647d1b
Merge pull request #4769 from drupol/fix-buffersize-value
...
openblas: fix `BUFFERSIZE` value
2024-07-11 14:45:50 +02:00
Piotr Kubaj
4c12090776
Fix build on FreeBSD/powerpc64*
2024-07-10 22:21:48 +00:00
Chip Kerchner
f708944fea
Add all 4 variations of the SBGEMM to compare_sgemm_sbgemm
2024-07-10 13:07:48 -05:00
Martin Kroeker
e706bc1ec0
Fix core assignment for Intel family 15
2024-07-09 20:22:56 +02:00
Chip Kerchner
cb154832f8
Vectorize SBGEMM incopy - 4x faster.
2024-07-09 13:10:03 -05:00
Martin Kroeker
a5c04e326a
Update scal.c
2024-07-04 22:28:01 +02:00
Ralf Gommers
268dcd8f45
docs: convert remaining install sections (Android, iOS, FreeBSD, Cortex-M)
2024-07-04 19:16:30 +02:00
Ralf Gommers
452014341e
docs: rework building from source on Windows section
2024-07-04 19:16:30 +02:00
Ralf Gommers
4547908901
docs: rewrite "Install OpenBLAS" page (part 1: binaries, basic from source)
2024-07-04 19:15:51 +02:00
Martin Kroeker
e1eef56e05
Merge pull request #4783 from martin-frbg/cpuid_meteor
...
Add another CPUID for Intel Meteor Lake
2024-07-04 18:09:27 +02:00
Martin Kroeker
536200bc9e
fix handling of INF or NAN
2024-07-04 17:47:19 +02:00
Martin Kroeker
3063d03021
Add another CPUID for Meteor Lake
2024-07-04 16:05:05 +02:00
Martin Kroeker
b422742899
collect error output from ctest, if any
2024-07-04 15:42:34 +02:00
Jaap Aarts
cea4abcac0
Fix compiling on mingw
2024-07-04 14:56:16 +02:00