Angelika Schwarz
6876ae0c3b
Fix division by zero in zrotg
...
The cases
[ c s ] * [ 0 ] = [ |db_i| ]
[-s c ] [ i*db_i ] [ 0 ]
and
[ c s ] * [ 0 ] = [ |db_r| ]
[-s c ] [ db_r ] [ 0 ]
computed s incorrectly. To flip the entries of vector,
s should be conjg(db)/|db| and not conjg(db) / da,
where da == 0.0.
2023-09-20 19:11:59 +02:00
Martin Kroeker
7e939fb831
Fix handling of additional buffer structures in case of overflow
2023-09-19 23:33:39 +02:00
Martin Kroeker
bb2f1ec3b0
Merge pull request #4222 from dev-zero/bugfix/correct-thread-warning
...
memory: show correct number of max threads
2023-09-17 00:02:46 +02:00
Martin Kroeker
466e6115d3
Merge pull request #4230 from martin-frbg/lapack907
...
Increase work array size in S/DTGEX2 to avoid overflow (Reference-LAPACK PR 907)
2023-09-16 20:13:13 +02:00
Martin Kroeker
1285b53e39
Make IWORK array larger to avoid overflow
2023-09-14 20:22:11 +02:00
Martin Kroeker
7779bb6fb1
Make IWORK array larger to avoid overflow
2023-09-14 20:21:06 +02:00
Martin Kroeker
0606102460
Merge pull request #4229 from martin-frbg/issue4228
...
Add la_constants.o to SCLAUX/DZLAUX in LAPACK Makefile
2023-09-14 16:15:54 +02:00
Martin Kroeker
fb97cc4d5e
Add la_constants.o to SCLAUX/DZLAUX
2023-09-14 10:46:23 +02:00
Tiziano Müller
6a611db560
memory: show correct number of max threads
2023-09-10 08:44:07 +02:00
Martin Kroeker
6bc079687f
Merge pull request #4218 from XiWeiGu/loongarch64_sgemv
...
LoongArch64: Add sgemv kernel
2023-09-08 13:35:35 +02:00
Martin Kroeker
cd36b8fff7
Merge pull request #4214 from martin-frbg/issue4212
...
Disable SVE targets in DYNAMIC_ARCH when compiler is gcc on macOS
2023-09-05 20:43:44 +02:00
Martin Kroeker
09911f077e
Disable SVE targets for DYNAMIC_ARCH when compiling with (homebrew)gcc on macOS/arm64
2023-09-05 16:33:40 +02:00
Martin Kroeker
c3f2a3c0ca
Update version to 0.3.24.dev
2023-09-04 08:40:25 +02:00
Martin Kroeker
4867cf5dd7
Update version to 0.3.24.dev
2023-09-04 08:39:40 +02:00
gxw
f2cf929374
LoongArch64: Add sgemv kernel
2023-09-04 14:28:37 +08:00
Martin Kroeker
f29a0d1a7d
Merge pull request #4211 from xianyi/release-0.3.0
...
merge release-0.3.24 back into develop to copy tag
2023-09-03 23:25:58 +02:00
Martin Kroeker
9f815cf1bf
Update version to 0.3.24
2023-09-03 22:58:32 +02:00
Martin Kroeker
3c49711f1e
Update version to 0.3.24
2023-09-03 22:57:22 +02:00
Martin Kroeker
2c68822cde
Merge pull request #4210 from xianyi/develop
...
merge develop into 0.3.0 for 0.3.24
2023-09-03 22:55:22 +02:00
Martin Kroeker
3c51bd0fbf
Merge pull request #4209 from martin-frbg/changelog0324
...
Update Changelog for 0.3.24
2023-09-03 22:51:03 +02:00
Martin Kroeker
5d73041068
Update Changelog for 0.3.24
2023-09-03 19:05:53 +02:00
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
...
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
Martin Kroeker
33797c44fc
Merge pull request #4143 from martin-frbg/issue4130
...
Update to use safe scaling algorithm from Reference-LAPACK PR 527
2023-09-01 14:20:25 +02:00
Martin Kroeker
ee310e3533
Merge pull request #4208 from XiWeiGu/loongarch64_toolchain
...
LoongArch64: Compatible with early internal toolchain
2023-09-01 10:50:01 +02:00
Martin Kroeker
42909ce57d
Merge branch 'xianyi:develop' into issue4130
2023-09-01 09:05:58 +02:00
Martin Kroeker
a2a184572c
update zrotg
2023-08-31 23:42:12 +02:00
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
...
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
Martin Kroeker
12d8f219d6
Merge pull request #4207 from martin-frbg/issue4174-2
...
Clarify the comment on the out-of-bounds check in ?GETF2
2023-08-26 12:05:37 +02:00
Martin Kroeker
9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
...
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2023-08-26 10:17:27 +02:00
Martin Kroeker
3bb70b8ca4
Merge pull request #4205 from martin-frbg/fixintmain
...
Fix missing type declaration for main() in converted LAPACK files
2023-08-26 08:38:38 +02:00
Martin Kroeker
3b6050ac04
clarify the comment on the out-of-bounds check from #723
2023-08-26 02:00:00 +02:00
Martin Kroeker
22a402bc2c
clarify the comment on the out-of-bounds check from #723
2023-08-26 01:58:08 +02:00
Martin Kroeker
88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
...
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2023-08-26 00:32:18 +02:00
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2023-08-26 00:30:17 +02:00
Martin Kroeker
be57c595aa
Merge pull request #4203 from martin-frbg/issue4201
...
Add support for building arm64 SVE kernels with the NVIDIA HPC compiler
2023-08-25 22:55:38 +02:00
Martin Kroeker
7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers
2023-08-25 18:25:51 +02:00
Martin Kroeker
7f7d3896dd
Fix missing type declaration for main
2023-08-25 18:07:47 +02:00
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2023-08-25 17:22:51 +02:00
Martin Kroeker
49689fbef7
Add support for compiling SVE kernels with the NVIDIA HPC compiler
2023-08-25 17:11:04 +02:00
Martin Kroeker
8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler
2023-08-25 16:47:32 +02:00
Martin Kroeker
e9f1b2d26f
Expand the SVE compatibility check for the NVIDIA HPC compiler
2023-08-25 16:45:56 +02:00
Martin Kroeker
d69f57c8c2
Merge pull request #4200 from XiWeiGu/loongarch64_sgemm
...
LoongArch64: Add sgemm_kernel
2023-08-23 13:05:34 +02:00
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2023-08-23 16:08:43 +08:00
Martin Kroeker
12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
...
Fix cooperlake and sapphire rapids march flags on clang
2023-08-21 15:46:35 +02:00
Martin Kroeker
8d9f701fbf
Merge pull request #4195 from TiborGY/BF16_ignore
...
Add junk from BF16 test to .gitignore
2023-08-19 12:16:44 +02:00
Martin Kroeker
7f67ba9147
Merge pull request #4198 from martin-frbg/issue4197
...
Correct INFO returned for too small lda in non-CBLAS s/dgeadd
2023-08-19 07:51:51 +02:00
Martin Kroeker
214be14c1d
Correct INFO returned for lda in non-CBLAS s/dgeadd
2023-08-18 22:48:30 +02:00
Martin Kroeker
1b09f4b2bb
Merge pull request #4193 from imciner2/im/ppcgnu
...
Fix power10 gcc intrinsic check
2023-08-17 22:56:08 +02:00
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
...
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY
b5ba95a6c0
Modernize obsolete inline order
2023-08-16 00:48:40 +02:00