Martin Kroeker
9f815cf1bf
Update version to 0.3.24
2023-09-03 22:58:32 +02:00
Martin Kroeker
3c49711f1e
Update version to 0.3.24
2023-09-03 22:57:22 +02:00
Martin Kroeker
2c68822cde
Merge pull request #4210 from xianyi/develop
...
merge develop into 0.3.0 for 0.3.24
2023-09-03 22:55:22 +02:00
Martin Kroeker
3c51bd0fbf
Merge pull request #4209 from martin-frbg/changelog0324
...
Update Changelog for 0.3.24
2023-09-03 22:51:03 +02:00
Martin Kroeker
5d73041068
Update Changelog for 0.3.24
2023-09-03 19:05:53 +02:00
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
...
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Martin Kroeker
33797c44fc
Merge pull request #4143 from martin-frbg/issue4130
...
Update to use safe scaling algorithm from Reference-LAPACK PR 527
2023-09-01 14:20:25 +02:00
Martin Kroeker
ee310e3533
Merge pull request #4208 from XiWeiGu/loongarch64_toolchain
...
LoongArch64: Compatible with early internal toolchain
2023-09-01 10:50:01 +02:00
Martin Kroeker
42909ce57d
Merge branch 'xianyi:develop' into issue4130
2023-09-01 09:05:58 +02:00
Martin Kroeker
a2a184572c
update zrotg
2023-08-31 23:42:12 +02:00
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
...
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
Martin Kroeker
12d8f219d6
Merge pull request #4207 from martin-frbg/issue4174-2
...
Clarify the comment on the out-of-bounds check in ?GETF2
2023-08-26 12:05:37 +02:00
Martin Kroeker
9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
...
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2023-08-26 10:17:27 +02:00
Martin Kroeker
3bb70b8ca4
Merge pull request #4205 from martin-frbg/fixintmain
...
Fix missing type declaration for main() in converted LAPACK files
2023-08-26 08:38:38 +02:00
Martin Kroeker
3b6050ac04
clarify the comment on the out-of-bounds check from #723
2023-08-26 02:00:00 +02:00
Martin Kroeker
22a402bc2c
clarify the comment on the out-of-bounds check from #723
2023-08-26 01:58:08 +02:00
Martin Kroeker
88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
...
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2023-08-26 00:32:18 +02:00
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2023-08-26 00:30:17 +02:00
Martin Kroeker
be57c595aa
Merge pull request #4203 from martin-frbg/issue4201
...
Add support for building arm64 SVE kernels with the NVIDIA HPC compiler
2023-08-25 22:55:38 +02:00
Martin Kroeker
7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers
2023-08-25 18:25:51 +02:00
Martin Kroeker
7f7d3896dd
Fix missing type declaration for main
2023-08-25 18:07:47 +02:00
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2023-08-25 17:22:51 +02:00
Martin Kroeker
49689fbef7
Add support for compiling SVE kernels with the NVIDIA HPC compiler
2023-08-25 17:11:04 +02:00
Martin Kroeker
8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler
2023-08-25 16:47:32 +02:00
Martin Kroeker
e9f1b2d26f
Expand the SVE compatibility check for the NVIDIA HPC compiler
2023-08-25 16:45:56 +02:00
Martin Kroeker
d69f57c8c2
Merge pull request #4200 from XiWeiGu/loongarch64_sgemm
...
LoongArch64: Add sgemm_kernel
2023-08-23 13:05:34 +02:00
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
Martin Kroeker
12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
...
Fix cooperlake and sapphire rapids march flags on clang
2023-08-21 15:46:35 +02:00
Martin Kroeker
8d9f701fbf
Merge pull request #4195 from TiborGY/BF16_ignore
...
Add junk from BF16 test to .gitignore
2023-08-19 12:16:44 +02:00
Martin Kroeker
7f67ba9147
Merge pull request #4198 from martin-frbg/issue4197
...
Correct INFO returned for too small lda in non-CBLAS s/dgeadd
2023-08-19 07:51:51 +02:00
Martin Kroeker
214be14c1d
Correct INFO returned for lda in non-CBLAS s/dgeadd
2023-08-18 22:48:30 +02:00
Martin Kroeker
1b09f4b2bb
Merge pull request #4193 from imciner2/im/ppcgnu
...
Fix power10 gcc intrinsic check
2023-08-17 22:56:08 +02:00
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
...
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00
TiborGY
0d30daa772
Add junk from BF16 test to .gitignore
2023-08-16 00:07:17 +02:00
Ian McInerney
8a8a8479be
Fix cooperlake and sapphire rapids march flags on clang
...
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00
Martin Kroeker
562ef5fdca
Merge pull request #4169 from felixonmars/patch-1
...
Use defined variable for riscv64 in arch.cmake
2023-08-12 17:20:56 +02:00
Martin Kroeker
0e5d56ae4a
Merge pull request #4170 from felixonmars/patch-2
...
Fix 64-bit fortran options for riscv64
2023-08-12 09:21:05 +02:00
Martin Kroeker
ebc157fcc9
Merge pull request #4190 from martin-frbg/issue4186-2
...
Allow negative INCX in the ?NRM2 kernels
2023-08-10 23:12:59 +02:00
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:01:50 +02:00
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 17:00:18 +02:00
Martin Kroeker
c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:58:57 +02:00
Martin Kroeker
a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:56:52 +02:00
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:55:17 +02:00
Martin Kroeker
7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:53:33 +02:00
Martin Kroeker
f692178792
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:52:09 +02:00
Martin Kroeker
d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:50:44 +02:00
Martin Kroeker
a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation)
2023-08-10 16:49:05 +02:00
Martin Kroeker
9a0e9c8b69
Merge pull request #4171 from boomanaiden154/clang-libomp-fixes
...
Fix build with some clang installations when openmp is enabled
2023-08-10 16:32:33 +02:00
Martin Kroeker
7af0f41762
Merge pull request #4189 from martin-frbg/issue4186
...
Prepare the interface for INCX < 0 in the new NRM2 implementation from BLAS 3.10
2023-08-10 14:11:12 +02:00