TiborGY
0d30daa772
Add junk from BF16 test to .gitignore
2023-08-16 00:07:17 +02:00
Ian McInerney
8a8a8479be
Fix cooperlake and sapphire rapids march flags on clang
...
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00
Martin Kroeker
562ef5fdca
Merge pull request #4169 from felixonmars/patch-1
...
Use defined variable for riscv64 in arch.cmake
2023-08-12 17:20:56 +02:00
Martin Kroeker
0e5d56ae4a
Merge pull request #4170 from felixonmars/patch-2
...
Fix 64-bit fortran options for riscv64
2023-08-12 09:21:05 +02:00
Martin Kroeker
ebc157fcc9
Merge pull request #4190 from martin-frbg/issue4186-2
...
Allow negative INCX in the ?NRM2 kernels
2023-08-10 23:12:59 +02:00
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:01:50 +02:00
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:00:18 +02:00
Martin Kroeker
c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:58:57 +02:00
Martin Kroeker
a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:56:52 +02:00
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:55:17 +02:00
Martin Kroeker
7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:53:33 +02:00
Martin Kroeker
f692178792
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:52:09 +02:00
Martin Kroeker
d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:50:44 +02:00
Martin Kroeker
a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:49:05 +02:00
Martin Kroeker
9a0e9c8b69
Merge pull request #4171 from boomanaiden154/clang-libomp-fixes
...
Fix build with some clang installations when openmp is enabled
2023-08-10 16:32:33 +02:00
Martin Kroeker
7af0f41762
Merge pull request #4189 from martin-frbg/issue4186
...
Prepare the interface for INCX < 0 in the new NRM2 implementation from BLAS 3.10
2023-08-10 14:11:12 +02:00
Martin Kroeker
4cc804c754
Prepare for INCX < 0 in new NRM2 implementation from BLAS 3.10
2023-08-09 16:13:23 +02:00
Martin Kroeker
afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
...
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
Martin Kroeker
91e5513f3b
Merge pull request #4184 from XiWeiGu/dgemv
...
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2023-08-07 08:47:19 +02:00
gxw
e8b571d245
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2023-08-07 11:20:42 +08:00
gxw
71fcee6eef
LoongArch64: Update dgemm kernel
2023-08-07 11:06:52 +08:00
Martin Kroeker
0f521ece25
Merge pull request #4183 from martin-frbg/issue4181
...
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2023-08-06 18:59:50 +02:00
Martin Kroeker
232420bdf5
Merge pull request #4182 from xianyi/revert-4153-dgemv
...
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2023-08-06 16:00:32 +02:00
Martin Kroeker
41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2023-08-06 16:00:03 +02:00
Martin Kroeker
61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC
2023-08-06 15:17:38 +02:00
Martin Kroeker
f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
...
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00
Martin Kroeker
12e98482e9
Merge pull request #4179 from martin-frbg/jenkinsfix
...
Run "make clean" on Jenkins first to remove stale objects
2023-08-05 22:47:26 +02:00
Martin Kroeker
51c218d17a
Update Jenkinsfile
2023-08-05 18:33:15 +02:00
Martin Kroeker
df978c90cd
Update Jenkinsfile.pwr
2023-08-05 18:32:41 +02:00
Martin Kroeker
ef4a7e3fca
Merge pull request #4127 from XiWeiGu/LoongArch64-CI
...
LoongArch64 CI
2023-08-05 18:19:47 +02:00
Martin Kroeker
b63e4581a3
Merge pull request #4016 from mmuetzel/ci-msys2
...
Add support for LLVM Flang
2023-08-05 15:59:34 +02:00
Markus Mützel
53378296c8
CI: Build with NO_AVX512 for the runners that use Flang 16.
2023-08-05 13:47:38 +02:00
Markus Mützel
1c3fcaaf42
CI (MSYS2): Re-run failed tests verbosely.
2023-08-05 13:16:06 +02:00
Markus Mützel
f334bd9041
CI (MSYS2): Use LLVM Flang on CLANG64 runners. Add CLANG32 runner.
2023-08-05 13:16:06 +02:00
Markus Mützel
57256623f4
fc.cmake: Add support for LLVM Flang.
2023-08-05 13:16:06 +02:00
gxw
ec1e96aac8
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-05 10:24:17 +08:00
gxw
96bf226bca
gh-actions: Add loongarch64 CI
2023-08-05 10:21:43 +08:00
gxw
db9a42f8c3
LoongArch64: using getauxval to do runtime check
...
Using the getauxval instruction can prevent errors
caused by hardware supporting vector instructions
while the kernel does not support them
2023-08-05 10:21:43 +08:00
gxw
d46772e037
LoongArch64: Add compiler feature checks
2023-08-05 10:21:43 +08:00
Martin Kroeker
8a171350db
Merge pull request #4178 from martin-frbg/llvm17
...
Add (gmake) support for LLVM17's new flang
2023-08-04 20:56:00 +02:00
Martin Kroeker
ef23240ab8
Merge pull request #4177 from martin-frbg/issue4176
...
Fix ZAXPY calls with INCX=0 on pre-AVX x86_64 and add utest
2023-08-04 20:55:22 +02:00
Martin Kroeker
e8bc8a0ee7
Add support for the new generation flang that comes with LLVM17
2023-08-04 15:32:19 +02:00
Martin Kroeker
f2c9ae9c33
Identify the new generation of flang that comes with LLVM17
2023-08-04 15:31:03 +02:00
Martin Kroeker
862d06ab8a
Add INCX=0,INCY=1 test case for CAXPY
2023-08-04 15:28:02 +02:00
Martin Kroeker
d64fa286f7
add test case for zaxpy with incx=0 incy=1
2023-08-04 12:26:36 +02:00
Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2023-08-04 12:25:34 +02:00
Martin Kroeker
c2f4bdbbb4
Merge pull request #4163 from martin-frbg/issue4017
...
Rework OpenMP thread count limit handling
2023-07-31 17:58:51 +02:00
Martin Kroeker
09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
...
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2023-07-29 12:24:30 +02:00
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2023-07-29 12:23:57 +02:00