Commit Graph

7404 Commits

Author SHA1 Message Date
Martin Kroeker
bb2f1ec3b0 Merge pull request #4222 from dev-zero/bugfix/correct-thread-warning
memory: show correct number of max threads
2023-09-17 00:02:46 +02:00
Martin Kroeker
466e6115d3 Merge pull request #4230 from martin-frbg/lapack907
Increase work array size in S/DTGEX2 to avoid overflow (Reference-LAPACK PR 907)
2023-09-16 20:13:13 +02:00
Martin Kroeker
1285b53e39 Make IWORK array larger to avoid overflow 2023-09-14 20:22:11 +02:00
Martin Kroeker
7779bb6fb1 Make IWORK array larger to avoid overflow 2023-09-14 20:21:06 +02:00
Martin Kroeker
0606102460 Merge pull request #4229 from martin-frbg/issue4228
Add la_constants.o to SCLAUX/DZLAUX in LAPACK Makefile
2023-09-14 16:15:54 +02:00
Martin Kroeker
fb97cc4d5e Add la_constants.o to SCLAUX/DZLAUX 2023-09-14 10:46:23 +02:00
Tiziano Müller
6a611db560 memory: show correct number of max threads 2023-09-10 08:44:07 +02:00
Martin Kroeker
6bc079687f Merge pull request #4218 from XiWeiGu/loongarch64_sgemv
LoongArch64: Add sgemv kernel
2023-09-08 13:35:35 +02:00
Martin Kroeker
cd36b8fff7 Merge pull request #4214 from martin-frbg/issue4212
Disable SVE targets in DYNAMIC_ARCH when compiler is gcc on macOS
2023-09-05 20:43:44 +02:00
Martin Kroeker
09911f077e Disable SVE targets for DYNAMIC_ARCH when compiling with (homebrew)gcc on macOS/arm64 2023-09-05 16:33:40 +02:00
Martin Kroeker
c3f2a3c0ca Update version to 0.3.24.dev 2023-09-04 08:40:25 +02:00
Martin Kroeker
4867cf5dd7 Update version to 0.3.24.dev 2023-09-04 08:39:40 +02:00
gxw
f2cf929374 LoongArch64: Add sgemv kernel 2023-09-04 14:28:37 +08:00
Martin Kroeker
f29a0d1a7d Merge pull request #4211 from xianyi/release-0.3.0
merge release-0.3.24 back into develop to copy tag
2023-09-03 23:25:58 +02:00
Martin Kroeker
9f815cf1bf Update version to 0.3.24 v0.3.24 2023-09-03 22:58:32 +02:00
Martin Kroeker
3c49711f1e Update version to 0.3.24 2023-09-03 22:57:22 +02:00
Martin Kroeker
2c68822cde Merge pull request #4210 from xianyi/develop
merge develop into 0.3.0 for 0.3.24
2023-09-03 22:55:22 +02:00
Martin Kroeker
3c51bd0fbf Merge pull request #4209 from martin-frbg/changelog0324
Update Changelog for 0.3.24
2023-09-03 22:51:03 +02:00
Martin Kroeker
5d73041068 Update Changelog for 0.3.24 2023-09-03 19:05:53 +02:00
Martin Kroeker
8e6d93359d Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Martin Kroeker
33797c44fc Merge pull request #4143 from martin-frbg/issue4130
Update to use safe scaling algorithm from Reference-LAPACK PR 527
2023-09-01 14:20:25 +02:00
Martin Kroeker
ee310e3533 Merge pull request #4208 from XiWeiGu/loongarch64_toolchain
LoongArch64: Compatible with early internal toolchain
2023-09-01 10:50:01 +02:00
Martin Kroeker
42909ce57d Merge branch 'xianyi:develop' into issue4130 2023-09-01 09:05:58 +02:00
Martin Kroeker
a2a184572c update zrotg 2023-08-31 23:42:12 +02:00
gxw
394a1fd1bf LoongArch64: Compatible with early internal toolchain
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
Martin Kroeker
12d8f219d6 Merge pull request #4207 from martin-frbg/issue4174-2
Clarify the comment on the out-of-bounds check in ?GETF2
2023-08-26 12:05:37 +02:00
Martin Kroeker
9c4ae4d4fb Merge pull request #4206 from martin-frbg/issue4201-2
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2023-08-26 10:17:27 +02:00
Martin Kroeker
3bb70b8ca4 Merge pull request #4205 from martin-frbg/fixintmain
Fix missing type declaration for main() in converted LAPACK files
2023-08-26 08:38:38 +02:00
Martin Kroeker
3b6050ac04 clarify the comment on the out-of-bounds check from #723 2023-08-26 02:00:00 +02:00
Martin Kroeker
22a402bc2c clarify the comment on the out-of-bounds check from #723 2023-08-26 01:58:08 +02:00
Martin Kroeker
88435104c8 Merge pull request #4204 from martin-frbg/llvm17-2
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2023-08-26 00:32:18 +02:00
Martin Kroeker
fc8894dd98 Workaround miscompilation by NVIDIA nvc 2023-08-26 00:30:17 +02:00
Martin Kroeker
be57c595aa Merge pull request #4203 from martin-frbg/issue4201
Add support for building arm64 SVE kernels with the NVIDIA HPC compiler
2023-08-25 22:55:38 +02:00
Martin Kroeker
7a6203ffa1 restore default Neoverse SVE build instructions for non-NVIDIA compilers 2023-08-25 18:25:51 +02:00
Martin Kroeker
7f7d3896dd Fix missing type declaration for main 2023-08-25 18:07:47 +02:00
Martin Kroeker
2c3034ff7f Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well 2023-08-25 17:22:51 +02:00
Martin Kroeker
49689fbef7 Add support for compiling SVE kernels with the NVIDIA HPC compiler 2023-08-25 17:11:04 +02:00
Martin Kroeker
8794544b43 Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler 2023-08-25 16:47:32 +02:00
Martin Kroeker
e9f1b2d26f Expand the SVE compatibility check for the NVIDIA HPC compiler 2023-08-25 16:45:56 +02:00
Martin Kroeker
d69f57c8c2 Merge pull request #4200 from XiWeiGu/loongarch64_sgemm
LoongArch64: Add sgemm_kernel
2023-08-23 13:05:34 +02:00
gxw
553cc1372f LoongArch64: Add sgemm_kernel 2023-08-23 16:08:43 +08:00
Martin Kroeker
12ede72ab7 Merge pull request #4192 from imciner2/im/clangfix
Fix cooperlake and sapphire rapids march flags on clang
2023-08-21 15:46:35 +02:00
Martin Kroeker
8d9f701fbf Merge pull request #4195 from TiborGY/BF16_ignore
Add junk from BF16 test to .gitignore
2023-08-19 12:16:44 +02:00
Martin Kroeker
7f67ba9147 Merge pull request #4198 from martin-frbg/issue4197
Correct INFO returned for too small lda in non-CBLAS s/dgeadd
2023-08-19 07:51:51 +02:00
Martin Kroeker
214be14c1d Correct INFO returned for lda in non-CBLAS s/dgeadd 2023-08-18 22:48:30 +02:00
Martin Kroeker
1b09f4b2bb Merge pull request #4193 from imciner2/im/ppcgnu
Fix power10 gcc intrinsic check
2023-08-17 22:56:08 +02:00
Ian McInerney
79c15db348 Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
TiborGY
0d30daa772 Add junk from BF16 test to .gitignore 2023-08-16 00:07:17 +02:00
Ian McInerney
8a8a8479be Fix cooperlake and sapphire rapids march flags on clang
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.

Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00