gxw
db9a42f8c3
LoongArch64: using getauxval to do runtime check
...
Using the getauxval instruction can prevent errors
caused by hardware supporting vector instructions
while the kernel does not support them
2023-08-05 10:21:43 +08:00
gxw
d46772e037
LoongArch64: Add compiler feature checks
2023-08-05 10:21:43 +08:00
Martin Kroeker
8a171350db
Merge pull request #4178 from martin-frbg/llvm17
...
Add (gmake) support for LLVM17's new flang
2023-08-04 20:56:00 +02:00
Martin Kroeker
ef23240ab8
Merge pull request #4177 from martin-frbg/issue4176
...
Fix ZAXPY calls with INCX=0 on pre-AVX x86_64 and add utest
2023-08-04 20:55:22 +02:00
Martin Kroeker
e8bc8a0ee7
Add support for the new generation flang that comes with LLVM17
2023-08-04 15:32:19 +02:00
Martin Kroeker
f2c9ae9c33
Identify the new generation of flang that comes with LLVM17
2023-08-04 15:31:03 +02:00
Martin Kroeker
862d06ab8a
Add INCX=0,INCY=1 test case for CAXPY
2023-08-04 15:28:02 +02:00
Martin Kroeker
d64fa286f7
add test case for zaxpy with incx=0 incy=1
2023-08-04 12:26:36 +02:00
Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2023-08-04 12:25:34 +02:00
Martin Kroeker
c2f4bdbbb4
Merge pull request #4163 from martin-frbg/issue4017
...
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker
09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
...
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2023-07-29 12:24:30 +02:00
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2023-07-29 12:23:57 +02:00
Martin Kroeker
9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09
2023-07-29 11:50:16 +02:00
Martin Kroeker
fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09
2023-07-29 11:48:38 +02:00
Martin Kroeker
5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
...
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Martin Kroeker
b3a5144a74
Merge pull request #4167 from Mousius/sve-zhemm-fix
...
Fix ZHEMM copy for SVE
2023-07-27 16:20:55 +02:00
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
...
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom
730ca04b48
Fix ZHEMM copy for SVE
...
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2023-07-27 13:27:28 +01:00
Martin Kroeker
9ba9c8bdc0
Merge pull request #4165 from rgommers/docs-packaging-and-ilp64
...
Add documentation on redistributing OpenBLAS
2023-07-27 10:36:24 +02:00
Ralf Gommers
ee72575475
Add documentation on redistributing OpenBLAS
...
This touches on the following:
- build configurations
- naming of symbols, shared/static libraries and other build outputs
like pkg-config and CMake files
- (in more detail) guidance on ILP64 builds
It tries to explain that, while this is only guidance and there may be
reasons to deviate from that, for some build options there are best
practices, and for some others there are choices to make.
It also links to a number of well-maintained build recipes in order
to help packagers of other distros make choices.
Closes gh-3798
[skip ci]
2023-07-26 23:37:28 +02:00
Martin Kroeker
2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-26 19:39:11 +02:00
Martin Kroeker
849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
...
Use latest non-SVE kernels in ARMV8SVE
2023-07-26 15:49:40 +02:00
Martin Kroeker
b1f6c4a1e4
Merge pull request #4160 from Mousius/sve-sniff
...
Add ARMV8SVE to AArch64 Dynamic Dispatch
2023-07-26 13:46:16 +02:00
Martin Kroeker
9ff84dc3f2
remove unused status variable
2023-07-26 10:02:44 +02:00
Martin Kroeker
94adf98bb8
remove unused status variable
2023-07-26 08:31:37 +02:00
Martin Kroeker
3326b924b3
remove status variable blas_num_threads_set; initialize openmp thread maximum on startup
2023-07-26 00:31:24 +02:00
Martin Kroeker
ea669c8ae9
simplify openmp thread limit handling
2023-07-26 00:27:14 +02:00
Chris Sidebottom
24586bc4ff
Disambiguate whilelt
2023-07-25 20:15:44 +01:00
Chris Sidebottom
f971ef55f2
Add ARMV8SVE to AArch64 Dynamic Dispatch
...
In order to enable support for future cores which have similar tunings
(in this case I'm doing this for the Arm(R) Neoverse(TM) V2 core), this generically detects SVE support and enables it. This should better manage the size and complexity of dynamic dispatch rather than just copy pasting the same parameters.
To make `ARMV8SVE` more representive of the common 128-bit SVE case,
I've split it and similar parameters from A64FX which has the wider
512-bit SVE.
2023-07-25 18:35:15 +01:00
Chris Sidebottom
aea2a4622b
Use latest non-SVE kernels in ARMV8SVE
...
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2023-07-25 14:12:26 +01:00
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2023-07-23 20:37:07 +02:00
martin-frbg
fec4867748
Fix file permissions (issue 4095)
2023-07-23 20:31:55 +02:00
Martin Kroeker
25037ae875
Fix actual arguments in some LAPACK procedure calls (Reference-LAPACK PR 885) ( #4155 )
...
* Fix actual arguments (Reference-LAPACK PR 885)
2023-07-22 23:14:25 +02:00
Martin Kroeker
bd01dc354b
Merge pull request #4151 from martin-frbg/issue4101
...
Ensure that early calls to blas_set_num_threads will not overwrite unrelated memory
2023-07-20 13:21:07 +02:00
Martin Kroeker
3bdcf3259d
Merge branch 'xianyi:develop' into issue4101
2023-07-20 08:23:20 +02:00
Martin Kroeker
5cb4f5940d
Merge pull request #4152 from martin-frbg/shutup-4098
...
Override the C910V DSDOT with generic code to get rid of the qemu precision error in CI
2023-07-20 08:22:57 +02:00
Martin Kroeker
76ef1672f8
Override DSDOT with generic code to get rid of qemu precision error
2023-07-19 22:31:07 +02:00
Martin Kroeker
8a27a274a1
Merge pull request #4150 from martin-frbg/armsve
...
Fix runtime detection in ARMV8 DYNAMIC_ARCH to check SVE capability
2023-07-19 22:25:55 +02:00
Martin Kroeker
b34f19a365
Ensure that a premature call to set_num_threads will not overwrite unrelated memory
2023-07-19 22:19:22 +02:00
Martin Kroeker
66904f8148
Ensure that a premature call will not overwrite unrelated memory
2023-07-19 22:14:34 +02:00
Martin Kroeker
5c58994eb2
Add fallback warning
2023-07-19 18:27:41 +02:00
Martin Kroeker
ca7199f249
Treat newer Neoverse as N1 if SVE unavailable (may be disabled in container/cloud env)
2023-07-19 14:48:42 +02:00
Martin Kroeker
9e81a3a0a2
Merge pull request #4100 from martin-frbg/cirrusm1gccmake
...
Cirrus CI: Add Apple M1 build using gcc,gmake and OpenMP
2023-07-18 08:04:29 +02:00
Martin Kroeker
ada9e442eb
Add Apple M1 build using gcc,gmake and OpenMP
2023-07-17 23:13:56 +02:00
Martin Kroeker
81228fc586
Merge pull request #4147 from martin-frbg/aldern
...
Support Alder Lake N (family 6 exmodel 11 model 14) as Haswell
2023-07-17 09:11:23 +02:00
Martin Kroeker
8da6aca2ec
Support Alder Lake N (fam 6 exmodel 11 model 14) as Haswell
2023-07-16 22:15:15 +02:00
Martin Kroeker
b61e64da6f
Merge pull request #4142 from exyntech/armv8-as-arm64
...
Fix armv8 detection in system_check.cmake
2023-07-15 23:15:49 +02:00
Martin Kroeker
f82a197143
Merge pull request #4137 from felixonmars/patch-1
...
Fix riscv64 detection in system_check.cmake
2023-07-15 19:41:06 +02:00
Martin Kroeker
0a637cc403
Fix workspace query corner cases to always return at least 1 (Reference-LAPACK PR 883) ( #4146 )
...
* Fix workspace query corner cases to always return at least 1
2023-07-15 16:37:42 +02:00