Commit Graph

8509 Commits

Author SHA1 Message Date
Martin Kroeker
eb4879e04c make NAN handling depend on the dummy2 parameter 2024-07-17 23:24:19 +02:00
Martin Kroeker
ee87cb90d0 Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
Martin Kroeker
e9f6aa46a4 Merge pull request #4800 from vlad0x00/patch-2
Add missing parentheses
2024-07-16 16:32:04 +02:00
Martin Kroeker
b1aa2e1768 Merge pull request #4802 from markdryan/markdryan/rvv_axpby_incy0
Fix axpby_rvv kernels for cases where inc_y = 0
2024-07-16 14:22:38 +02:00
iha fujitsu
0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Vladimir Nikolić
56e1782ffb Add another missing parenthesis 2024-07-15 15:15:23 -07:00
Vladimir Nikolić
127ea5d0d9 Add missing parenthesis 2024-07-15 15:12:21 -07:00
Martin Kroeker
a3c10c6c25 Merge pull request #4799 from martin-frbg/issue4762
Improve the error message for (p)thread creation failure
2024-07-15 20:57:56 +02:00
Martin Kroeker
a373d0f107 Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00
Mark Ryan
67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Martin Kroeker
6013b36b16 Merge pull request #4796 from martin-frbg/ppcbuf
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
2024-07-12 11:06:45 +02:00
Martin Kroeker
9789034281 Merge branch 'OpenMathLib:develop' into ppcbuf 2024-07-12 11:05:46 +02:00
Martin Kroeker
5d08ec7ff3 Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Martin Kroeker
dfc11ef248 Merge pull request #4791 from ChipKerchner/vectorizeSBGEMMincopy
Vectorize SBGEMM incopy for Power10 - 4x faster.
2024-07-11 21:38:57 +02:00
Martin Kroeker
2fefdfa2b8 Merge branch 'OpenMathLib:develop' into azurewincl 2024-07-11 21:38:21 +02:00
Martin Kroeker
475bd2452b Suffix BUFFERSIZEs as UL to prevent int overflow in computations 2024-07-11 20:13:57 +02:00
Martin Kroeker
b70227ad62 Merge pull request #4795 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64*
2024-07-11 19:00:07 +02:00
Martin Kroeker
8277828fdc Merge pull request #4785 from rgommers/docs-install
Rewrite "Install OpenBLAS" docs page
2024-07-11 18:49:07 +02:00
Martin Kroeker
f0fc7249f1 Merge pull request #4792 from martin-frbg/issue4790
Fix core assignment in cpu detection for Intel family 15
2024-07-11 17:38:43 +02:00
Martin Kroeker
362856fece Merge pull request #4778 from JAicewizard/develop
Add support for RISCV64_GENERIC in cmake
2024-07-11 15:12:46 +02:00
Martin Kroeker
1d77647d1b Merge pull request #4769 from drupol/fix-buffersize-value
openblas: fix `BUFFERSIZE` value
2024-07-11 14:45:50 +02:00
Piotr Kubaj
4c12090776 Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
Chip Kerchner
f708944fea Add all 4 variations of the SBGEMM to compare_sgemm_sbgemm 2024-07-10 13:07:48 -05:00
Martin Kroeker
e706bc1ec0 Fix core assignment for Intel family 15 2024-07-09 20:22:56 +02:00
Chip Kerchner
cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker
a5c04e326a Update scal.c 2024-07-04 22:28:01 +02:00
Ralf Gommers
268dcd8f45 docs: convert remaining install sections (Android, iOS, FreeBSD, Cortex-M) 2024-07-04 19:16:30 +02:00
Ralf Gommers
452014341e docs: rework building from source on Windows section 2024-07-04 19:16:30 +02:00
Ralf Gommers
4547908901 docs: rewrite "Install OpenBLAS" page (part 1: binaries, basic from source) 2024-07-04 19:15:51 +02:00
Martin Kroeker
e1eef56e05 Merge pull request #4783 from martin-frbg/cpuid_meteor
Add another CPUID for Intel Meteor Lake
2024-07-04 18:09:27 +02:00
Martin Kroeker
536200bc9e fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker
3063d03021 Add another CPUID for Meteor Lake 2024-07-04 16:05:05 +02:00
Martin Kroeker
b422742899 collect error output from ctest, if any 2024-07-04 15:42:34 +02:00
Jaap Aarts
cea4abcac0 Fix compiling on mingw 2024-07-04 14:56:16 +02:00
Martin Kroeker
f729013d2e Merge pull request #4781 from rgommers/fix-docs-deployment
fix CI job to deploy docs, and make it run on pull requests too
2024-07-03 21:00:18 +02:00
Ralf Gommers
6ede8b14c6 ci: fix CI job to deploy docs, and make it run on pull requests too 2024-07-03 20:14:02 +02:00
Martin Kroeker
9836883ee9 Merge pull request #4780 from martin-frbg/azureosx12
AzureCI: Update OSX jobs to use the macos-12 image
2024-07-03 19:53:05 +02:00
Martin Kroeker
df81b159e8 Merge pull request #4774 from rgommers/improve-docs
Improve documention content, formatting, and html theme
2024-07-03 17:10:44 +02:00
Martin Kroeker
2df4007425 Update compiler and sdk versions for osx 2024-07-03 16:48:43 +02:00
Martin Kroeker
acf0c3ccaf Merge pull request #4777 from ev-br/sgesdd_ci_err
ignore the gesdd failure on codspeed
2024-07-03 15:21:33 +02:00
Martin Kroeker
74f059a3ce Update OSX jobs to use the macos-12 image 2024-07-03 13:24:02 +02:00
Evgeni Burovski
cd3c167c28 ignore sgesdd failure on codspeed
In https://github.com/OpenMathLib/OpenBLAS/issues/4776
we're hitting
** On entry to SLASCL parameter number  4 had an illegal value

on codspeed, but not outside (either locally or on github runners)
2024-07-03 12:35:26 +03:00
Jaap Aarts
9d0abe2d26 Add support for RISCV64_GENERIC in cmake 2024-07-03 01:49:37 +02:00
Evgeni Burovski
5b385fd453 WIP: fish out the gesdd failure? 2024-07-02 20:10:26 +03:00
Ralf Gommers
c1c0dbfd60 docs: address review comments on PR 4774 2024-07-02 14:05:47 +02:00
Martin Kroeker
bdb6069051 Merge pull request #4775 from martin-frbg/issue4770
Guard against invalid thread_status.queue
2024-07-01 00:35:30 +02:00
Martin Kroeker
4052b312b2 Merge pull request #4763 from ev-br/sync-codspeed
BENCH: sync codspeed-benchmarks with BLAS-benchmarks
2024-07-01 00:18:08 +02:00
Martin Kroeker
3677b3886c Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker
d0b9948b23 Guard against invalid thread_status.queue 2024-06-30 19:31:15 +02:00
Ralf Gommers
ca9a0c28e8 docs: improve extensions page 2024-06-30 18:04:12 +02:00