Commit Graph

7341 Commits

Author SHA1 Message Date
Martin Kroeker
07e32c4cb8 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:00:18 +02:00
Martin Kroeker
c211da0688 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:58:57 +02:00
Martin Kroeker
a34a0a7abc Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:56:52 +02:00
Martin Kroeker
54d3246fc6 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:55:17 +02:00
Martin Kroeker
7dd441d5db Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:53:33 +02:00
Martin Kroeker
f692178792 Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:52:09 +02:00
Martin Kroeker
d15ffb7fdf Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:50:44 +02:00
Martin Kroeker
a2d867f4d1 Allow negative iNCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:49:05 +02:00
Martin Kroeker
9a0e9c8b69 Merge pull request #4171 from boomanaiden154/clang-libomp-fixes
Fix build with some clang installations when openmp is enabled
2023-08-10 16:32:33 +02:00
Martin Kroeker
7af0f41762 Merge pull request #4189 from martin-frbg/issue4186
Prepare the interface for INCX < 0 in the new NRM2 implementation from BLAS 3.10
2023-08-10 14:11:12 +02:00
Martin Kroeker
4cc804c754 Prepare for INCX < 0 in new NRM2 implementation from BLAS 3.10 2023-08-09 16:13:23 +02:00
Martin Kroeker
afdc56a421 Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
Martin Kroeker
91e5513f3b Merge pull request #4184 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2023-08-07 08:47:19 +02:00
gxw
e8b571d245 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2 2023-08-07 11:20:42 +08:00
gxw
71fcee6eef LoongArch64: Update dgemm kernel 2023-08-07 11:06:52 +08:00
Martin Kroeker
0f521ece25 Merge pull request #4183 from martin-frbg/issue4181
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2023-08-06 18:59:50 +02:00
Martin Kroeker
232420bdf5 Merge pull request #4182 from xianyi/revert-4153-dgemv
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2023-08-06 16:00:32 +02:00
Martin Kroeker
41c31bc1d4 Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S" 2023-08-06 16:00:03 +02:00
Martin Kroeker
61d803547a Apply USE_TRMM to MIPS64_GENERIC as to GENERIC 2023-08-06 15:17:38 +02:00
Martin Kroeker
f8ee309402 Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00
Martin Kroeker
12e98482e9 Merge pull request #4179 from martin-frbg/jenkinsfix
Run "make clean" on Jenkins first to remove stale objects
2023-08-05 22:47:26 +02:00
Martin Kroeker
51c218d17a Update Jenkinsfile 2023-08-05 18:33:15 +02:00
Martin Kroeker
df978c90cd Update Jenkinsfile.pwr 2023-08-05 18:32:41 +02:00
Martin Kroeker
ef4a7e3fca Merge pull request #4127 from XiWeiGu/LoongArch64-CI
LoongArch64 CI
2023-08-05 18:19:47 +02:00
Martin Kroeker
b63e4581a3 Merge pull request #4016 from mmuetzel/ci-msys2
Add support for LLVM Flang
2023-08-05 15:59:34 +02:00
Markus Mützel
53378296c8 CI: Build with NO_AVX512 for the runners that use Flang 16. 2023-08-05 13:47:38 +02:00
Markus Mützel
1c3fcaaf42 CI (MSYS2): Re-run failed tests verbosely. 2023-08-05 13:16:06 +02:00
Markus Mützel
f334bd9041 CI (MSYS2): Use LLVM Flang on CLANG64 runners. Add CLANG32 runner. 2023-08-05 13:16:06 +02:00
Markus Mützel
57256623f4 fc.cmake: Add support for LLVM Flang. 2023-08-05 13:16:06 +02:00
gxw
ec1e96aac8 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S 2023-08-05 10:24:17 +08:00
gxw
96bf226bca gh-actions: Add loongarch64 CI 2023-08-05 10:21:43 +08:00
gxw
db9a42f8c3 LoongArch64: using getauxval to do runtime check
Using the getauxval instruction can prevent errors
caused by hardware supporting vector instructions
while the kernel does not support them
2023-08-05 10:21:43 +08:00
gxw
d46772e037 LoongArch64: Add compiler feature checks 2023-08-05 10:21:43 +08:00
Martin Kroeker
8a171350db Merge pull request #4178 from martin-frbg/llvm17
Add (gmake) support for LLVM17's new flang
2023-08-04 20:56:00 +02:00
Martin Kroeker
ef23240ab8 Merge pull request #4177 from martin-frbg/issue4176
Fix ZAXPY calls with INCX=0 on pre-AVX x86_64 and add utest
2023-08-04 20:55:22 +02:00
Martin Kroeker
e8bc8a0ee7 Add support for the new generation flang that comes with LLVM17 2023-08-04 15:32:19 +02:00
Martin Kroeker
f2c9ae9c33 Identify the new generation of flang that comes with LLVM17 2023-08-04 15:31:03 +02:00
Martin Kroeker
862d06ab8a Add INCX=0,INCY=1 test case for CAXPY 2023-08-04 15:28:02 +02:00
Martin Kroeker
d64fa286f7 add test case for zaxpy with incx=0 incy=1 2023-08-04 12:26:36 +02:00
Martin Kroeker
4664b57e6e use shortcut only when both incx and incy are zero 2023-08-04 12:25:34 +02:00
Martin Kroeker
c2f4bdbbb4 Merge pull request #4163 from martin-frbg/issue4017
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker
09131f79a6 Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker
6a428b5629 Update casum_microk_skylakex-2.c 2023-07-29 12:24:30 +02:00
Martin Kroeker
ebb447e32e Update zasum_microk_skylakex-2.c 2023-07-29 12:23:57 +02:00
Martin Kroeker
9f6847583a nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:50:16 +02:00
Martin Kroeker
fe54ee3d15 nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:48:38 +02:00
Aiden Grossman
b209915121 Fix build with clang
There are two instances when building the tests where OpenBLAS fails to
build with OpenMP and clang due to library paths getting reset as flags
are set rather than appended. This seems to only affect certain
clang/libomp installations, but if it's already grabbing the correct
library paths we might as well use them.
2023-07-28 12:59:44 -07:00
Martin Kroeker
5720fa02c5 Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Martin Kroeker
b3a5144a74 Merge pull request #4167 from Mousius/sve-zhemm-fix
Fix ZHEMM copy for SVE
2023-07-27 16:20:55 +02:00
Chris Sidebottom
84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00