Commit Graph

6943 Commits

Author SHA1 Message Date
Martin Kroeker
e8db1fe89b Merge pull request #3943 from martin-frbg/llvm15
Add clang option to avoid running out of registers in AVX512 assembly
2023-03-18 11:24:52 +01:00
Martin Kroeker
de937b3194 Add clang option to avoid running out of registers in AVX512 assembly 2023-03-17 21:22:37 +01:00
Martin Kroeker
f3d21039ce Improve fix from PR3924 (#3941)
* compare denominator against DBL_MIN rather than a somewhat arbitrary small number near it
2023-03-16 15:09:32 +01:00
Martin Kroeker
8d6813ec41 Merge pull request #3938 from martin-frbg/issue3937
Fix CMAKE-based cross-compilation to CortexA53 (wrong DGEMM setting)
2023-03-10 15:33:07 +01:00
Martin Kroeker
19d6795122 Merge pull request #3936 from martin-frbg/issue3933
Observe any stricter (OpenMP) thread count limit imposed by openblas_set_num_threads()
2023-03-10 15:32:50 +01:00
Martin Kroeker
50c263716e Correct DGEMM_UNROLL_M value for A53 in cross-compile 2023-03-10 11:53:24 +01:00
Martin Kroeker
e298d613fa initialize status variable for openblas_set_num_threads 2023-03-08 23:43:15 +01:00
Martin Kroeker
05aa88268f add status variable for openblas_set_num_threads 2023-03-08 23:41:57 +01:00
Martin Kroeker
9f9d0012a3 observe thread limit imposed by openblas_set_num_threads() 2023-03-08 23:38:34 +01:00
Martin Kroeker
fe5d3ca8e0 Merge pull request #3935 from martin-frbg/omp_place_num
Fix OpenMP thread counting returning places rather than cores
2023-03-08 22:41:18 +01:00
Martin Kroeker
e38ab079a0 Fix OpenMP thread counting returning places rather than cores 2023-03-08 19:17:33 +01:00
Martin Kroeker
9feaaa3f39 Merge pull request #3932 from martin-frbg/issue3931
Handle unrecognized ASM compiler (from Arm Compiler 22.1) in CMAKE builds
2023-03-03 12:01:25 +01:00
Martin Kroeker
8272dfc552 Handle unrecognized ASM compiler (from Arm Compiler 22.1) 2023-03-03 00:21:59 +01:00
Martin Kroeker
f616c86404 Merge pull request #3930 from sergei-lewis/dot-kernel-early-bail
dot.c early bail fix
2023-03-02 16:46:25 +01:00
Sergei Lewis
cb0a70e0e2 dot.c early bail fix 2023-03-02 09:51:10 +00:00
Martin Kroeker
5925178d03 Merge pull request #3924 from martin-frbg/numpy22025
Avoid overflow from division in GETF2 potentially causing NaN
2023-02-27 15:59:44 +01:00
Zhang Xianyi
f58080278f Merge pull request #3923 from xctan/fix-cmake-riscv64
Add missing RISC-V architecture in arch.cmake
2023-02-27 09:39:30 +08:00
Martin Kroeker
3d27cbd9a3 avoid overflow in division 2023-02-26 23:44:14 +01:00
Martin Kroeker
a39ced0551 avoid overflow in division 2023-02-26 23:42:20 +01:00
xctan
6a0de3aa39 Add missing RISC-V architecture in arch.cmake
RISC-V support exists in Makefile.system but is missing in arch.cmake. This patch adds riscv64 platform support to cmake building system just like 039e27545f/Makefile.system (L830-L832) did.
2023-02-26 20:21:57 +08:00
Martin Kroeker
039e27545f Merge pull request #3915 from martin-frbg/issue3910
Fix DYNAMIC_ARCH builds that select only a subset of precisions
2023-02-24 07:41:33 +01:00
Martin Kroeker
38d6fb4225 Fix dependencies in builds with specified subsets of precision types 2023-02-23 23:12:06 +01:00
Martin Kroeker
75d5e3eaf5 Replace ifdefs and fix conditional definitions for including only selected precisions in DYNAMIC_ARCH 2023-02-23 23:08:33 +01:00
Martin Kroeker
c0f3417725 make SLARMM/DLARMM available to complex-only builds 2023-02-22 00:38:30 +01:00
Martin Kroeker
e412bee313 fix GEMM kernel dependencies in builds that use only a subset of precisions 2023-02-22 00:37:14 +01:00
Martin Kroeker
69256c2b6c fix GEMM kernel dependencies in builds for a subset of precisions 2023-02-22 00:34:01 +01:00
Martin Kroeker
d80adf253e make SSYMV available to BUILD_DOUBLE-only builds 2023-02-22 00:30:20 +01:00
Martin Kroeker
5481c328e8 fix DYNAMIC_ARCH builds that use only a subset of precisions 2023-02-22 00:28:25 +01:00
Martin Kroeker
ee44082827 fix DYNAMIC_ARCH builds that use only a subset of precisions 2023-02-22 00:27:18 +01:00
Martin Kroeker
fa5ff7d199 slarmm/dlarmm are needed by COMPLEX/COMPLEX16-only builds too 2023-02-22 00:25:12 +01:00
Martin Kroeker
cb76be5bd0 Merge pull request #3914 from martin-frbg/lapack798
Fix bug in complex precision tests (c|z)het21 (Reference-LAPACK PR798)
2023-02-19 19:18:18 +01:00
Martin Kroeker
1946eb4f44 Fix bug in complex precision tests (c|z)het21 2023-02-19 10:30:16 +01:00
Martin Kroeker
10be02c896 Merge pull request #3909 from martin-frbg/lapack796
Fix ill-conditioned matrix in LIN testsuite test_rfp (LAPACK PR 796)
2023-02-15 12:56:47 +01:00
Martin Kroeker
85a03675f6 Fix ill-conditioned test matrix for DIAG=U in LIN testsuite test_rfp (LAPACK 678/796) 2023-02-15 08:24:47 +01:00
Martin Kroeker
fa3bc574d1 Merge pull request #3907 from martin-frbg/lapack794
Fix double subtraction of N_DEFLATE from istop in ?LAQZ0 (LAPACK 794)
2023-02-14 19:34:37 +01:00
Martin Kroeker
15c2571c93 Merge pull request #3906 from martin-frbg/lapack782
Fix warnings and delete unneeded tests in LAPACKE ?LARFB (LAPACK PR782)
2023-02-14 19:34:15 +01:00
Martin Kroeker
24ceb0fc40 Fix double subtraction of N_DEFLATE from istop in ?LAQZ0 (LAPACK 794) 2023-02-14 12:43:41 +01:00
Martin Kroeker
f0f40a599c Suppress warnings and delete unnecessary tests (LAPACK PR782) 2023-02-14 12:06:21 +01:00
Martin Kroeker
2158dc64a3 Merge pull request #3904 from martin-frbg/issue3901
Don't add -tp for the nvc compiler if there is one already in CFLAGS
2023-02-09 18:06:50 +01:00
Martin Kroeker
ebe50458f3 Do not add a -tp to the flags of the nvc compiler if there is one already in CFLAGS 2023-02-09 09:29:27 +01:00
Martin Kroeker
3dec11c669 Merge pull request #3902 from haampie/fix/parallel-build
fix shared and tests prereqs
2023-02-08 15:52:29 +01:00
Harmen Stoppels
bb7ae98dfd fix shared and tests prereqs 2023-02-08 12:52:22 +01:00
Martin Kroeker
fdc1cdb102 Merge pull request #3898 from martin-frbg/zen4fix
Fix compiler option setting for AVX512-capable ZEN targets
2023-02-03 04:48:27 +01:00
Martin Kroeker
60dfba0d92 Merge pull request #3897 from martin-frbg/cortexx3-id
Add cpuid support for Cortex A715 and X3 by aliasing to A710/X2
2023-02-02 22:08:05 +01:00
Martin Kroeker
19a696f8fe fix nested conditionals 2023-02-02 19:59:49 +01:00
Martin Kroeker
e964ebd0d0 Add compiler option for AVX512-capable Ryzen(4) 2023-02-02 19:04:05 +01:00
Martin Kroeker
8e8651f0a5 Supply necessary gcc option for AVX512-capable Ryzens 2023-02-02 18:13:29 +01:00
Martin Kroeker
9ecfa94744 Add part numbers for A715 and X3 aliased to A710/X2 2023-02-02 17:30:30 +01:00
Martin Kroeker
6876360a7a Merge pull request #3896 from antonio-rojas/patch-1
Fix USE_PERL option usage
2023-02-02 17:24:36 +01:00
Martin Kroeker
ab3399d0c3 Merge pull request #3895 from martin-frbg/issue3892
Fix linking to libm  with CMAKE
2023-02-02 15:45:45 +01:00