Commit Graph

7433 Commits

Author SHA1 Message Date
Martin Kroeker 51c218d17a
Update Jenkinsfile 2023-08-05 18:33:15 +02:00
Martin Kroeker df978c90cd
Update Jenkinsfile.pwr 2023-08-05 18:32:41 +02:00
Martin Kroeker ef4a7e3fca
Merge pull request #4127 from XiWeiGu/LoongArch64-CI
LoongArch64 CI
2023-08-05 18:19:47 +02:00
Martin Kroeker b63e4581a3
Merge pull request #4016 from mmuetzel/ci-msys2
Add support for LLVM Flang
2023-08-05 15:59:34 +02:00
Markus Mützel 53378296c8 CI: Build with NO_AVX512 for the runners that use Flang 16. 2023-08-05 13:47:38 +02:00
Markus Mützel 1c3fcaaf42 CI (MSYS2): Re-run failed tests verbosely. 2023-08-05 13:16:06 +02:00
Markus Mützel f334bd9041 CI (MSYS2): Use LLVM Flang on CLANG64 runners. Add CLANG32 runner. 2023-08-05 13:16:06 +02:00
Markus Mützel 57256623f4 fc.cmake: Add support for LLVM Flang. 2023-08-05 13:16:06 +02:00
gxw ec1e96aac8 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S 2023-08-05 10:24:17 +08:00
gxw 96bf226bca gh-actions: Add loongarch64 CI 2023-08-05 10:21:43 +08:00
gxw db9a42f8c3 LoongArch64: using getauxval to do runtime check
Using the getauxval instruction can prevent errors
caused by hardware supporting vector instructions
while the kernel does not support them
2023-08-05 10:21:43 +08:00
gxw d46772e037 LoongArch64: Add compiler feature checks 2023-08-05 10:21:43 +08:00
Martin Kroeker 8a171350db
Merge pull request #4178 from martin-frbg/llvm17
Add (gmake) support for LLVM17's new flang
2023-08-04 20:56:00 +02:00
Martin Kroeker ef23240ab8
Merge pull request #4177 from martin-frbg/issue4176
Fix ZAXPY calls with INCX=0 on pre-AVX x86_64 and add utest
2023-08-04 20:55:22 +02:00
Martin Kroeker e8bc8a0ee7
Add support for the new generation flang that comes with LLVM17 2023-08-04 15:32:19 +02:00
Martin Kroeker f2c9ae9c33
Identify the new generation of flang that comes with LLVM17 2023-08-04 15:31:03 +02:00
Martin Kroeker 862d06ab8a
Add INCX=0,INCY=1 test case for CAXPY 2023-08-04 15:28:02 +02:00
Martin Kroeker d64fa286f7
add test case for zaxpy with incx=0 incy=1 2023-08-04 12:26:36 +02:00
Martin Kroeker 4664b57e6e
use shortcut only when both incx and incy are zero 2023-08-04 12:25:34 +02:00
Martin Kroeker c2f4bdbbb4
Merge pull request #4163 from martin-frbg/issue4017
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker 09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker 6a428b5629
Update casum_microk_skylakex-2.c 2023-07-29 12:24:30 +02:00
Martin Kroeker ebb447e32e
Update zasum_microk_skylakex-2.c 2023-07-29 12:23:57 +02:00
Martin Kroeker 9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:50:16 +02:00
Martin Kroeker fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:48:38 +02:00
Aiden Grossman b209915121 Fix build with clang
There are two instances when building the tests where OpenBLAS fails to
build with OpenMP and clang due to library paths getting reset as flags
are set rather than appended. This seems to only affect certain
clang/libomp installations, but if it's already grabbing the correct
library paths we might as well use them.
2023-07-28 12:59:44 -07:00
Felix Yan f5506b002c
Add 64-bit flag on INTERFACE64 only 2023-07-28 16:19:14 +03:00
Felix Yan 4ed6414c17
Fix 64-bit fortran options for riscv64
64-bit builds are currently broken without this flag.

Makefiles have done this already: 5720fa02c5/Makefile.system (L831)
2023-07-28 04:53:27 +03:00
Felix Yan 007cd834c1
Use defined variable for riscv64 in arch.cmake
It's defined in #4137
2023-07-28 04:50:16 +03:00
Martin Kroeker 5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Martin Kroeker b3a5144a74
Merge pull request #4167 from Mousius/sve-zhemm-fix
Fix ZHEMM copy for SVE
2023-07-27 16:20:55 +02:00
Chris Sidebottom 84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom 730ca04b48 Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2023-07-27 13:27:28 +01:00
Martin Kroeker 9ba9c8bdc0
Merge pull request #4165 from rgommers/docs-packaging-and-ilp64
Add documentation on redistributing OpenBLAS
2023-07-27 10:36:24 +02:00
Ralf Gommers ee72575475 Add documentation on redistributing OpenBLAS
This touches on the following:

- build configurations
- naming of symbols, shared/static libraries and other build outputs
  like pkg-config and CMake files
- (in more detail) guidance on ILP64 builds

It tries to explain that, while this is only guidance and there may be
reasons to deviate from that, for some build options there are best
practices, and for some others there are choices to make.

It also links to a number of well-maintained build recipes in order
to help packagers of other distros make choices.

Closes gh-3798

[skip ci]
2023-07-26 23:37:28 +02:00
Martin Kroeker 2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
steppi 76aa6bac4d Fix cirun url [skip actions] 2023-07-26 12:01:12 -04:00
Martin Kroeker 849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2023-07-26 15:49:40 +02:00
Martin Kroeker b1f6c4a1e4
Merge pull request #4160 from Mousius/sve-sniff
Add ARMV8SVE to AArch64 Dynamic Dispatch
2023-07-26 13:46:16 +02:00
Martin Kroeker 9ff84dc3f2
remove unused status variable 2023-07-26 10:02:44 +02:00
Martin Kroeker 94adf98bb8
remove unused status variable 2023-07-26 08:31:37 +02:00
Martin Kroeker 3326b924b3
remove status variable blas_num_threads_set; initialize openmp thread maximum on startup 2023-07-26 00:31:24 +02:00
Martin Kroeker ea669c8ae9
simplify openmp thread limit handling 2023-07-26 00:27:14 +02:00
Chris Sidebottom 24586bc4ff Disambiguate whilelt 2023-07-25 20:15:44 +01:00
Chris Sidebottom f971ef55f2 Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.

To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Chris Sidebottom aea2a4622b Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2023-07-25 14:12:26 +01:00
steppi 42cbcf58bf EMPTY: [skip ci] [skip cirrus] 2023-07-24 16:38:52 -04:00
steppi b92033e3be EMPTY: [skip ci] 2023-07-24 16:20:56 -04:00
steppi 7c8ea130a3 Set up cirun workflow for arm64 graviton 2023-07-24 16:18:57 -04:00
martin-frbg 7976deff80 Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00