Commit Graph

7404 Commits

Author SHA1 Message Date
Aiden Grossman
b209915121 Fix build with clang
There are two instances when building the tests where OpenBLAS fails to
build with OpenMP and clang due to library paths getting reset as flags
are set rather than appended. This seems to only affect certain
clang/libomp installations, but if it's already grabbing the correct
library paths we might as well use them.
2023-07-28 12:59:44 -07:00
Felix Yan
f5506b002c Add 64-bit flag on INTERFACE64 only 2023-07-28 16:19:14 +03:00
Felix Yan
4ed6414c17 Fix 64-bit fortran options for riscv64
64-bit builds are currently broken without this flag.

Makefiles have done this already: 5720fa02c5/Makefile.system (L831)
2023-07-28 04:53:27 +03:00
Felix Yan
007cd834c1 Use defined variable for riscv64 in arch.cmake
It's defined in #4137
2023-07-28 04:50:16 +03:00
Martin Kroeker
5720fa02c5 Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Martin Kroeker
b3a5144a74 Merge pull request #4167 from Mousius/sve-zhemm-fix
Fix ZHEMM copy for SVE
2023-07-27 16:20:55 +02:00
Chris Sidebottom
84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom
730ca04b48 Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2023-07-27 13:27:28 +01:00
Martin Kroeker
9ba9c8bdc0 Merge pull request #4165 from rgommers/docs-packaging-and-ilp64
Add documentation on redistributing OpenBLAS
2023-07-27 10:36:24 +02:00
Ralf Gommers
ee72575475 Add documentation on redistributing OpenBLAS
This touches on the following:

- build configurations
- naming of symbols, shared/static libraries and other build outputs
  like pkg-config and CMake files
- (in more detail) guidance on ILP64 builds

It tries to explain that, while this is only guidance and there may be
reasons to deviate from that, for some build options there are best
practices, and for some others there are choices to make.

It also links to a number of well-maintained build recipes in order
to help packagers of other distros make choices.

Closes gh-3798

[skip ci]
2023-07-26 23:37:28 +02:00
Martin Kroeker
2a62d2df96 Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
Martin Kroeker
849c8806b8 Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2023-07-26 15:49:40 +02:00
Martin Kroeker
b1f6c4a1e4 Merge pull request #4160 from Mousius/sve-sniff
Add ARMV8SVE to AArch64 Dynamic Dispatch
2023-07-26 13:46:16 +02:00
Martin Kroeker
9ff84dc3f2 remove unused status variable 2023-07-26 10:02:44 +02:00
Martin Kroeker
94adf98bb8 remove unused status variable 2023-07-26 08:31:37 +02:00
Martin Kroeker
3326b924b3 remove status variable blas_num_threads_set; initialize openmp thread maximum on startup 2023-07-26 00:31:24 +02:00
Martin Kroeker
ea669c8ae9 simplify openmp thread limit handling 2023-07-26 00:27:14 +02:00
Chris Sidebottom
24586bc4ff Disambiguate whilelt 2023-07-25 20:15:44 +01:00
Chris Sidebottom
f971ef55f2 Add ARMV8SVE to AArch64 Dynamic Dispatch
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.

To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Chris Sidebottom
aea2a4622b Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2023-07-25 14:12:26 +01:00
martin-frbg
7976deff80 Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
martin-frbg
fec4867748 Fix file permissions (issue 4095) 2023-07-23 20:31:55 +02:00
Martin Kroeker
25037ae875 Fix actual arguments in some LAPACK procedure calls (Reference-LAPACK PR 885) (#4155)
* Fix actual arguments (Reference-LAPACK PR 885)
2023-07-22 23:14:25 +02:00
Martin Kroeker
bd01dc354b Merge pull request #4151 from martin-frbg/issue4101
Ensure that early calls to blas_set_num_threads will not overwrite unrelated memory
2023-07-20 13:21:07 +02:00
Martin Kroeker
3bdcf3259d Merge branch 'xianyi:develop' into issue4101 2023-07-20 08:23:20 +02:00
Martin Kroeker
5cb4f5940d Merge pull request #4152 from martin-frbg/shutup-4098
Override the C910V DSDOT with generic code to get rid of the qemu precision error in CI
2023-07-20 08:22:57 +02:00
Martin Kroeker
76ef1672f8 Override DSDOT with generic code to get rid of qemu precision error 2023-07-19 22:31:07 +02:00
Martin Kroeker
8a27a274a1 Merge pull request #4150 from martin-frbg/armsve
Fix runtime detection in ARMV8 DYNAMIC_ARCH to check SVE capability
2023-07-19 22:25:55 +02:00
Martin Kroeker
b34f19a365 Ensure that a premature call to set_num_threads will not overwrite unrelated memory 2023-07-19 22:19:22 +02:00
Martin Kroeker
66904f8148 Ensure that a premature call will not overwrite unrelated memory 2023-07-19 22:14:34 +02:00
Martin Kroeker
5c58994eb2 Add fallback warning 2023-07-19 18:27:41 +02:00
Martin Kroeker
ca7199f249 Treat newer Neoverse as N1 if SVE unavailable (may be disabled in container/cloud env) 2023-07-19 14:48:42 +02:00
Martin Kroeker
9e81a3a0a2 Merge pull request #4100 from martin-frbg/cirrusm1gccmake
Cirrus CI: Add Apple M1 build using gcc,gmake and OpenMP
2023-07-18 08:04:29 +02:00
Martin Kroeker
ada9e442eb Add Apple M1 build using gcc,gmake and OpenMP 2023-07-17 23:13:56 +02:00
Martin Kroeker
81228fc586 Merge pull request #4147 from martin-frbg/aldern
Support Alder Lake N (family 6 exmodel 11 model 14) as Haswell
2023-07-17 09:11:23 +02:00
Martin Kroeker
8da6aca2ec Support Alder Lake N (fam 6 exmodel 11 model 14) as Haswell 2023-07-16 22:15:15 +02:00
Martin Kroeker
b61e64da6f Merge pull request #4142 from exyntech/armv8-as-arm64
Fix armv8 detection in system_check.cmake
2023-07-15 23:15:49 +02:00
Martin Kroeker
f82a197143 Merge pull request #4137 from felixonmars/patch-1
Fix riscv64 detection in system_check.cmake
2023-07-15 19:41:06 +02:00
Martin Kroeker
0a637cc403 Fix workspace query corner cases to always return at least 1 (Reference-LAPACK PR 883) (#4146)
* Fix workspace query corner cases to always return at least 1
2023-07-15 16:37:42 +02:00
Martin Kroeker
4c43d1eeba Fix C prototypes and LAPACKE headers for ?GEDMD/?GEDMDQ (#4134)
* Fix prototypes for ?GEDMD/?GEDMDQ and their LAPACKE interfaces
2023-07-15 07:47:19 +02:00
Martin Kroeker
49077e7bde Merge pull request #4145 from martin-frbg/issue4144
Restore zero-initialization of variables in generic ztrsm_utcopy
2023-07-14 12:44:05 +02:00
Martin Kroeker
3d31191b0f Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH

* add casts to disambiguate svwhilelt for clang
2023-07-14 11:06:48 +02:00
Martin Kroeker
04cdf5efb4 fix typo and missing declaration 2023-07-14 00:05:00 +02:00
Martin Kroeker
5e1103b8d7 Update rotg.c 2023-07-13 23:35:38 +02:00
Martin Kroeker
cfa0a80664 Restore initialization of data variables 2023-07-13 23:23:12 +02:00
Martin Kroeker
9567305e4c Restore initialization of data01,data02 2023-07-13 23:21:18 +02:00
Martin Kroeker
4cc232bb07 Merge branch 'xianyi:develop' into issue4130 2023-07-13 21:40:22 +02:00
Martin Kroeker
7c75c8b2fe fix truncated edit 2023-07-13 21:40:12 +02:00
Martin Kroeker
0f2ce93904 typo fix 2023-07-13 10:56:59 +02:00
Martin Kroeker
affeef0b9c Fix gmake build not always picking the right ARM64 arch options for clang (#4136)
* Fix gcc version checks erroneously excluding clang

* Avoid some mtune names not supported by (Apple)Clang
2023-07-13 08:38:03 +02:00