Commit Graph

8509 Commits

Author SHA1 Message Date
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Martin Kroeker ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
2024-07-17 23:14:21 +02:00
Martin Kroeker e9f6aa46a4
Merge pull request #4800 from vlad0x00/patch-2
Add missing parentheses
2024-07-16 16:32:04 +02:00
Martin Kroeker b1aa2e1768
Merge pull request #4802 from markdryan/markdryan/rvv_axpby_incy0
Fix axpby_rvv kernels for cases where inc_y = 0
2024-07-16 14:22:38 +02:00
iha fujitsu 0985fdc82b A64FX: Add support for SVE to SGEMV/DGEMV kernels. 2024-07-16 17:31:33 +09:00
Vladimir Nikolić 56e1782ffb
Add another missing parenthesis 2024-07-15 15:15:23 -07:00
Vladimir Nikolić 127ea5d0d9
Add missing parenthesis 2024-07-15 15:12:21 -07:00
Martin Kroeker a3c10c6c25
Merge pull request #4799 from martin-frbg/issue4762
Improve the error message for (p)thread creation failure
2024-07-15 20:57:56 +02:00
Martin Kroeker a373d0f107
Improve the error message for thread creation failure 2024-07-15 18:32:21 +02:00
Mark Ryan 67bf4b6998 Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.

TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]

The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.

Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
2024-07-15 14:24:47 +00:00
Martin Kroeker 6013b36b16
Merge pull request #4796 from martin-frbg/ppcbuf
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
2024-07-12 11:06:45 +02:00
Martin Kroeker 9789034281
Merge branch 'OpenMathLib:develop' into ppcbuf 2024-07-12 11:05:46 +02:00
Martin Kroeker 5d08ec7ff3
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
2024-07-11 23:55:15 +02:00
Martin Kroeker dfc11ef248
Merge pull request #4791 from ChipKerchner/vectorizeSBGEMMincopy
Vectorize SBGEMM incopy for Power10 - 4x faster.
2024-07-11 21:38:57 +02:00
Martin Kroeker 2fefdfa2b8
Merge branch 'OpenMathLib:develop' into azurewincl 2024-07-11 21:38:21 +02:00
Martin Kroeker 475bd2452b
Suffix BUFFERSIZEs as UL to prevent int overflow in computations 2024-07-11 20:13:57 +02:00
Martin Kroeker b70227ad62
Merge pull request #4795 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64*
2024-07-11 19:00:07 +02:00
Martin Kroeker 8277828fdc
Merge pull request #4785 from rgommers/docs-install
Rewrite "Install OpenBLAS" docs page
2024-07-11 18:49:07 +02:00
Martin Kroeker f0fc7249f1
Merge pull request #4792 from martin-frbg/issue4790
Fix core assignment in cpu detection for Intel family 15
2024-07-11 17:38:43 +02:00
Martin Kroeker 362856fece
Merge pull request #4778 from JAicewizard/develop
Add support for RISCV64_GENERIC in cmake
2024-07-11 15:12:46 +02:00
Martin Kroeker 1d77647d1b
Merge pull request #4769 from drupol/fix-buffersize-value
openblas: fix `BUFFERSIZE` value
2024-07-11 14:45:50 +02:00
Piotr Kubaj 4c12090776
Fix build on FreeBSD/powerpc64* 2024-07-10 22:21:48 +00:00
Chip Kerchner f708944fea Add all 4 variations of the SBGEMM to compare_sgemm_sbgemm 2024-07-10 13:07:48 -05:00
Martin Kroeker e706bc1ec0
Fix core assignment for Intel family 15 2024-07-09 20:22:56 +02:00
Chip Kerchner cb154832f8 Vectorize SBGEMM incopy - 4x faster. 2024-07-09 13:10:03 -05:00
Martin Kroeker a5c04e326a
Update scal.c 2024-07-04 22:28:01 +02:00
Ralf Gommers 268dcd8f45 docs: convert remaining install sections (Android, iOS, FreeBSD, Cortex-M) 2024-07-04 19:16:30 +02:00
Ralf Gommers 452014341e docs: rework building from source on Windows section 2024-07-04 19:16:30 +02:00
Ralf Gommers 4547908901 docs: rewrite "Install OpenBLAS" page (part 1: binaries, basic from source) 2024-07-04 19:15:51 +02:00
Martin Kroeker e1eef56e05
Merge pull request #4783 from martin-frbg/cpuid_meteor
Add another CPUID for Intel Meteor Lake
2024-07-04 18:09:27 +02:00
Martin Kroeker 536200bc9e
fix handling of INF or NAN 2024-07-04 17:47:19 +02:00
Martin Kroeker 3063d03021
Add another CPUID for Meteor Lake 2024-07-04 16:05:05 +02:00
Martin Kroeker b422742899
collect error output from ctest, if any 2024-07-04 15:42:34 +02:00
Jaap Aarts cea4abcac0
Fix compiling on mingw 2024-07-04 14:56:16 +02:00
Martin Kroeker f729013d2e
Merge pull request #4781 from rgommers/fix-docs-deployment
fix CI job to deploy docs, and make it run on pull requests too
2024-07-03 21:00:18 +02:00
Ralf Gommers 6ede8b14c6 ci: fix CI job to deploy docs, and make it run on pull requests too 2024-07-03 20:14:02 +02:00
Martin Kroeker 9836883ee9
Merge pull request #4780 from martin-frbg/azureosx12
AzureCI: Update OSX jobs to use the macos-12 image
2024-07-03 19:53:05 +02:00
Martin Kroeker df81b159e8
Merge pull request #4774 from rgommers/improve-docs
Improve documention content, formatting, and html theme
2024-07-03 17:10:44 +02:00
Martin Kroeker 2df4007425
Update compiler and sdk versions for osx 2024-07-03 16:48:43 +02:00
Martin Kroeker acf0c3ccaf
Merge pull request #4777 from ev-br/sgesdd_ci_err
ignore the gesdd failure on codspeed
2024-07-03 15:21:33 +02:00
Martin Kroeker 74f059a3ce
Update OSX jobs to use the macos-12 image 2024-07-03 13:24:02 +02:00
Evgeni Burovski cd3c167c28 ignore sgesdd failure on codspeed
In https://github.com/OpenMathLib/OpenBLAS/issues/4776
we're hitting
** On entry to SLASCL parameter number  4 had an illegal value

on codspeed, but not outside (either locally or on github runners)
2024-07-03 12:35:26 +03:00
Jaap Aarts 9d0abe2d26 Add support for RISCV64_GENERIC in cmake 2024-07-03 01:49:37 +02:00
Evgeni Burovski 5b385fd453 WIP: fish out the gesdd failure? 2024-07-02 20:10:26 +03:00
Ralf Gommers c1c0dbfd60 docs: address review comments on PR 4774 2024-07-02 14:05:47 +02:00
Martin Kroeker bdb6069051
Merge pull request #4775 from martin-frbg/issue4770
Guard against invalid thread_status.queue
2024-07-01 00:35:30 +02:00
Martin Kroeker 4052b312b2
Merge pull request #4763 from ev-br/sync-codspeed
BENCH: sync codspeed-benchmarks with BLAS-benchmarks
2024-07-01 00:18:08 +02:00
Martin Kroeker 3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
2024-06-30 22:48:48 +02:00
Martin Kroeker d0b9948b23
Guard against invalid thread_status.queue 2024-06-30 19:31:15 +02:00
Ralf Gommers ca9a0c28e8 docs: improve extensions page 2024-06-30 18:04:12 +02:00