Commit Graph

8592 Commits

Author SHA1 Message Date
Chris Sidebottom 8c472ef7e3 Further tweak small GEMM for AArch64 2024-06-24 10:47:47 +01:00
Martin Kroeker 9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN 2024-06-23 17:48:18 +02:00
Martin Kroeker a11f086c17
Update sscal_msa.c 2024-06-23 12:55:19 +02:00
Martin Kroeker 541e1b6959
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf 2024-06-23 10:37:55 +02:00
Martin Kroeker c08113c279
fix special cases of x= NAN or INF 2024-06-23 01:12:33 +02:00
Martin Kroeker bd47630bcf
exclude the alpha=0 branch as it does not handle NaN or Inf in x 2024-06-23 00:54:39 +02:00
Martin Kroeker 68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x 2024-06-22 21:08:57 +02:00
Martin Kroeker 0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x 2024-06-22 21:07:43 +02:00
Martin Kroeker 7f8f037a36
handle INF and NAN in input 2024-06-22 16:03:30 +02:00
Martin Kroeker f1248b849d
handle INF and NAN in input 2024-06-22 15:55:29 +02:00
Martin Kroeker a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-21 09:35:56 +02:00
Martin Kroeker 1ba1b9c357
Merge pull request #4761 from martin-frbg/m1zdot
Add a clobber list to the non-SVE ARM64 ZDOT kernel
2024-06-20 23:31:25 +02:00
Martin Kroeker 3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M 2024-06-20 16:19:32 +02:00
Martin Kroeker 7e9a4ba427
Merge pull request #4741 from shivammonaka/Pthread_Scalability_Improvement
Enhancing Core Utilization in BLAS Calls: A Scalable Architecture
2024-06-20 13:36:23 +02:00
Martin Kroeker 0773695a5c
Merge pull request #4760 from martin-frbg/zhaoxin7k
Add cpuid for Zhaoxin KX-7000
2024-06-20 11:08:30 +02:00
Martin Kroeker 9b2a0c79cb
Add Zhaoxin KX7000 2024-06-20 09:23:08 +02:00
Martin Kroeker 758279605f
Add support forZhaoxin KX7000 2024-06-20 09:21:06 +02:00
Martin Kroeker 18063b1ccd
Merge pull request #4757 from martin-frbg/lapack1024
Fix possible infinite loop on error in the LAPACK testsuite  (Reference-LAPACK PR 1024)
2024-06-19 10:07:38 +02:00
Martin Kroeker 215279662e
fix possible infinite loop on error (Reference-LAPACK PR 1024) 2024-06-18 11:21:33 +02:00
Martin Kroeker a9817b4212
fix reference in format (Reference-LAPACK PR 1024) 2024-06-18 11:20:22 +02:00
Martin Kroeker bf521a2ced
fix possible infinite loop on error (Reference-LAPACK PR 1024) 2024-06-18 11:18:04 +02:00
Martin Kroeker cf2962bdb5
fix possible infinite loop on error (Reference-LAPACK PR 1024) 2024-06-18 11:15:44 +02:00
Martin Kroeker 33bb4b98a4
Improve error message output from the fork() utest (#4753)
* Add perror to report the reason for a fork failure
* reword the malloc failure message
2024-06-15 14:16:48 +02:00
Martin Kroeker f13403b6b6
Merge pull request #4755 from martin-frbg/issue4739
Fix Intel oneAPI compiler support in the CMAKE build
2024-06-15 12:26:18 +02:00
Martin Kroeker 8bc37f9384
Merge pull request #4754 from martin-frbg/issue4750-2
Add a clobber list to the arm64 SVE DOT kernel
2024-06-15 10:29:03 +02:00
Martin Kroeker d25ee4d0f5
Fix detection of Intel ifx and apply -fp-model option to it 2024-06-14 23:58:45 +02:00
Martin Kroeker 21c0f769ef
ensure that cpu-specific -march options are always applied to icx 2024-06-14 23:54:27 +02:00
Martin Kroeker 3d8054fb16
add clobber list 2024-06-14 22:07:44 +02:00
Martin Kroeker fdb88e010f
Merge pull request #4749 from XiWeiGu/loongarch64-qemu-update
LoongArch64: Update QEMU
2024-06-14 17:19:14 +02:00
Martin Kroeker dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
2024-06-14 14:21:13 +02:00
guxiwei ed5db5b122 LoongArch64: Update the address for obtaining the Clang cross-toolchain
Improve the stability and speed of testing
2024-06-13 11:25:01 +08:00
guxiwei 1ca1bb829d LoongArch64: Update QEMU
Compile the community version of QEMU to support LSX/LASX extension instructions
2024-06-13 11:24:32 +08:00
Martin Kroeker 62c33db37d
Merge pull request #4746 from martin-frbg/issue4743
Correct CMAKE build definitions for CAXPYC/ZAXPYC
2024-06-09 22:44:50 +02:00
Martin Kroeker 2f12a47405
fix build options for CAXPYC/ZAXPYC 2024-06-09 20:32:10 +02:00
Martin Kroeker 6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly 2024-06-07 14:46:58 +02:00
Martin Kroeker c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN 2024-06-07 13:48:56 +02:00
Martin Kroeker 5ed4f24d6e
Handle corner cases with INF and NAN arguments 2024-06-07 09:39:08 +02:00
shivammonaka 9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 2024-06-07 08:40:17 +05:30
Martin Kroeker 2bd43ad0eb
Merge branch 'OpenMathLib:develop' into issue4728 2024-06-07 00:37:25 +02:00
Martin Kroeker 1abafcd9b2
handle corner cases involving NAN and/or INF 2024-06-06 23:59:43 +02:00
Martin Kroeker ffc1ab3f6e
Test corner cases of all SCAL variants 2024-06-06 23:58:16 +02:00
Martin Kroeker f955616f98
Merge pull request #4740 from martin-frbg/fixlapackmod
remove LAPACK .mod files during make clean
2024-06-06 23:22:31 +02:00
Martin Kroeker f96ee86711
remove .mod files during make clean 2024-06-06 21:17:36 +02:00
Martin Kroeker 442dec28df
Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
2024-06-06 17:22:38 +02:00
Martin Kroeker 0cf8b98f61
Merge pull request #4736 from XiWeiGu/loongarch_issue4728
LoongArch: Fixed issue 4728
2024-06-06 15:28:44 +02:00
Martin Kroeker 2787c9f8e4
Disable GEMM3M for generic targets (not implemented) 2024-06-06 14:39:50 +02:00
gxw af73ae6208 LoongArch: Fixed issue 4728 2024-06-06 16:43:09 +08:00
Martin Kroeker 4e9144b39f
Update .cirrus.yml (#4735)
* Update versions (and paths) of XCode, and update FreeBSD version
2024-06-05 23:43:52 +02:00
Martin Kroeker 0c2ac76a63
Merge pull request #4734 from XiWeiGu/loongarch64_small_matrix
LoongArch: DGEMM small matrix opt
2024-06-05 19:13:17 +02:00
Martin Kroeker 913be34bf0
Merge pull request #4733 from martin-frbg/issue4719
Drop the -static Fortran flag from generic RISCV builds as it breaks OpenMP
2024-06-05 00:11:09 +02:00