8807 Commits

Author SHA1 Message Date
Mark Ryan
67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Mark Ryan
3b715e6162 Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64.  Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly.  Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0.  The
approach taken is to first try hwprobe.  If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.

Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.

A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:22 +00:00
gxw
9b3e80efe2 utest: Add test_gemv 2024-07-15 16:33:09 +08:00
gxw
3f39c8f94f LoongArch: Fixed numpy CI failure 2024-07-15 11:43:08 +08:00
Martin Kroeker
6013b36b16 Merge pull request #4796 from martin-frbg/ppcbuf
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
2024-07-12 11:06:45 +02:00
Martin Kroeker
9789034281 Merge branch 'OpenMathLib:develop' into ppcbuf 2024-07-12 11:05:46 +02:00
gxw
f3cebb3ca3 x86: Fixed numpy CI failure when the target is ZEN. 2024-07-12 16:09:30 +08:00
Martin Kroeker
5d08ec7ff3 Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Martin Kroeker
dfc11ef248 Merge pull request #4791 from ChipKerchner/vectorizeSBGEMMincopy
Vectorize SBGEMM incopy for Power10 - 4x faster.
2024-07-11 21:38:57 +02:00
Martin Kroeker
2fefdfa2b8 Merge branch 'OpenMathLib:develop' into azurewincl 2024-07-11 21:38:21 +02:00
Martin Kroeker
475bd2452b Suffix BUFFERSIZEs as UL to prevent int overflow in computations 2024-07-11 20:13:57 +02:00
Martin Kroeker
b70227ad62 Merge pull request #4795 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64*
2024-07-11 19:00:07 +02:00
Martin Kroeker
8277828fdc Merge pull request #4785 from rgommers/docs-install
Rewrite "Install OpenBLAS" docs page
2024-07-11 18:49:07 +02:00
Martin Kroeker
f0fc7249f1 Merge pull request #4792 from martin-frbg/issue4790
Fix core assignment in cpu detection for Intel family 15
2024-07-11 17:38:43 +02:00
Martin Kroeker
362856fece Merge pull request #4778 from JAicewizard/develop
Add support for RISCV64_GENERIC in cmake
2024-07-11 15:12:46 +02:00
Martin Kroeker
1d77647d1b Merge pull request #4769 from drupol/fix-buffersize-value
openblas: fix `BUFFERSIZE` value
2024-07-11 14:45:50 +02:00
Piotr Kubaj
4c12090776 Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
Chip Kerchner
f708944fea Add all 4 variations of the SBGEMM to compare_sgemm_sbgemm 2024-07-10 13:07:48 -05:00
Martin Kroeker
e706bc1ec0 Fix core assignment for Intel family 15 2024-07-09 20:22:56 +02:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Vladimir Nikolić
c92104a605 Update cross compile info 2024-07-05 12:32:58 -07:00
Martin Kroeker
a5c04e326a Update scal.c 2024-07-04 22:28:01 +02:00
Ralf Gommers
268dcd8f45 docs: convert remaining install sections (Android, iOS, FreeBSD, Cortex-M) 2024-07-04 19:16:30 +02:00
Ralf Gommers
452014341e docs: rework building from source on Windows section 2024-07-04 19:16:30 +02:00
Ralf Gommers
4547908901 docs: rewrite "Install OpenBLAS" page (part 1: binaries, basic from source) 2024-07-04 19:15:51 +02:00
Martin Kroeker
e1eef56e05 Merge pull request #4783 from martin-frbg/cpuid_meteor
Add another CPUID for Intel Meteor Lake
2024-07-04 18:09:27 +02:00
Martin Kroeker
536200bc9e fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker
3063d03021 Add another CPUID for Meteor Lake 2024-07-04 16:05:05 +02:00
Martin Kroeker
b422742899 collect error output from ctest, if any 2024-07-04 15:42:34 +02:00
Jaap Aarts
cea4abcac0 Fix compiling on mingw 2024-07-04 14:56:16 +02:00
Martin Kroeker
f729013d2e Merge pull request #4781 from rgommers/fix-docs-deployment
fix CI job to deploy docs, and make it run on pull requests too
2024-07-03 21:00:18 +02:00
Ralf Gommers
6ede8b14c6 ci: fix CI job to deploy docs, and make it run on pull requests too 2024-07-03 20:14:02 +02:00
Martin Kroeker
9836883ee9 Merge pull request #4780 from martin-frbg/azureosx12
AzureCI: Update OSX jobs to use the macos-12 image
2024-07-03 19:53:05 +02:00
Martin Kroeker
df81b159e8 Merge pull request #4774 from rgommers/improve-docs
Improve documention content, formatting, and html theme
2024-07-03 17:10:44 +02:00
Martin Kroeker
2df4007425 Update compiler and sdk versions for osx 2024-07-03 16:48:43 +02:00
Martin Kroeker
acf0c3ccaf Merge pull request #4777 from ev-br/sgesdd_ci_err
ignore the gesdd failure on codspeed
2024-07-03 15:21:33 +02:00
Martin Kroeker
74f059a3ce Update OSX jobs to use the macos-12 image 2024-07-03 13:24:02 +02:00
Evgeni Burovski
cd3c167c28 ignore sgesdd failure on codspeed
In https://github.com/OpenMathLib/OpenBLAS/issues/4776
we're hitting
** On entry to SLASCL parameter number  4 had an illegal value

on codspeed, but not outside (either locally or on github runners)
2024-07-03 12:35:26 +03:00
Jaap Aarts
9d0abe2d26 Add support for RISCV64_GENERIC in cmake 2024-07-03 01:49:37 +02:00
Evgeni Burovski
5b385fd453 WIP: fish out the gesdd failure? 2024-07-02 20:10:26 +03:00
Ralf Gommers
c1c0dbfd60 docs: address review comments on PR 4774 2024-07-02 14:05:47 +02:00
Martin Kroeker
bdb6069051 Merge pull request #4775 from martin-frbg/issue4770
Guard against invalid thread_status.queue
2024-07-01 00:35:30 +02:00
Martin Kroeker
4052b312b2 Merge pull request #4763 from ev-br/sync-codspeed
BENCH: sync codspeed-benchmarks with BLAS-benchmarks
2024-07-01 00:18:08 +02:00
Martin Kroeker
3677b3886c Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker
d0b9948b23 Guard against invalid thread_status.queue 2024-06-30 19:31:15 +02:00
Ralf Gommers
ca9a0c28e8 docs: improve extensions page 2024-06-30 18:04:12 +02:00
Ralf Gommers
3eba16c583 docs: improve the Developer manual 2024-06-30 18:04:08 +02:00
Ralf Gommers
a8e1ff84ce docs: improve the "Build system" page 2024-06-30 16:27:54 +02:00
Ralf Gommers
c1b9bb8519 docs: improvements to the User Manual 2024-06-30 16:18:24 +02:00
Ralf Gommers
237c2c4130 docs: fix footnote rendering on "Redistributing OpenBLAS" page 2024-06-30 16:09:00 +02:00