Commit Graph

7611 Commits

Author SHA1 Message Date
Martin Kroeker f7351e493c
Update Reference-LAPACK docs to 3.12.0 2023-11-25 18:49:34 +01:00
Martin Kroeker be8661ba40
Merge pull request #4338 from martin-frbg/lapack941
Docu fix for Truncated QR With Pivoting (Reference-LAPACK PR 941)
2023-11-25 18:41:25 +01:00
Martin Kroeker ca5a87ff1d
Small documentation fix for Truncated QR With Pivoting (Reference-LAPACK PR 941) 2023-11-25 15:31:18 +01:00
Martin Kroeker 97d3c9b827
Merge pull request #4336 from martin-frbg/fix4322
Revert unintentional change to gmake linking rule in LAPACK TESTING/LIN
2023-11-22 22:44:21 +01:00
Martin Kroeker c883abf838
Revert unintentional change to linking rule from PR 4322 2023-11-22 22:41:53 +01:00
Martin Kroeker 8138999cd0
Merge pull request #4333 from codeworm96/update_dynamic_core_readme
Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
2023-11-21 13:50:37 +01:00
Martin Kroeker a938e48fa2
Merge pull request #4334 from RajalakshmiSR/Makefile_power
POWER: Fixing Makefile error
2023-11-21 10:24:25 +01:00
Rajalakshmi Srinivasaraghavan 47da601a2d POWER: Fixing Makefile error
Recent commit d99aad8ee3 added
extra `)`. This patch fixes the warning from Makefile.
2023-11-20 17:24:22 -06:00
Yuning Zhang 54be8f4d67 Update the list of default dynamic targets for x86_64 in the README to be consistent with the Makefile
Signed-off-by: Yuning Zhang <codeworm96@outlook.com>
2023-11-20 13:28:25 -08:00
Martin Kroeker d526c4306f
Merge pull request #4329 from isuruf/sbgemm
Fix building test_sbgemm
2023-11-20 15:30:02 +01:00
Martin Kroeker 2ea65bacd0
Merge pull request #4330 from bartoldeman/asum-init-mask
Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
2023-11-20 05:38:39 +01:00
Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker 864c65b526
Merge pull request #4328 from martin-frbg/4239-3
Copy XCode15-specific workaround for Apple M to Fortran flags to fix build of tests
2023-11-19 22:06:17 +01:00
Isuru Fernando 6b2651ece3 Fix building test_sbgemm 2023-11-19 02:57:13 -06:00
Martin Kroeker 22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327)
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2023-11-19 00:04:31 +01:00
Martin Kroeker 47b03fd4b4
Copy XCode15-specific workaround to Fortran flags to fix build of tests 2023-11-18 23:45:02 +01:00
Martin Kroeker df4cd7e82c
Merge pull request #4326 from bartoldeman/fix-casum-backup-kernel
Fix casum fallback kernel for x86_64
2023-11-18 19:06:06 +01:00
Bart Oldeman f8ad5344c2 Fix casum fallback kernel.
This kernel is only used on Skylake+ if the kernel with AVX512
intrinsics can't be used, but used the variable x1 incorrectly
in the tail end of the loop, as it is still at the initial
value instead of where x points to.

This caused 55 "other error"s in the LAPACK tests
(https://github.com/OpenMathLib/OpenBLAS/issues/4282)

This change makes casum.c as similar as possible as zasum.c,
because zasum.c does this correctly.
2023-11-17 23:53:56 +00:00
Martin Kroeker cb2950709f
Merge pull request #4322 from martin-frbg/lapack891
Add truncated QR with pivoting (Reference-LAPACK PR 891)
2023-11-16 08:40:17 +01:00
Martin Kroeker cc622f2406
restore OpenBLAS-specific target_link_libraries 2023-11-15 22:51:09 +01:00
Martin Kroeker ee47e4e494
run m1/llvm/cmake buid on all 4 cores 2023-11-15 15:21:32 +01:00
Martin Kroeker 8b2a956890
Implement truncated QR with pivot (Reference-LAPACK PR 891) 2023-11-15 14:20:12 +01:00
Martin Kroeker 20a2a83f49
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:18:15 +01:00
Martin Kroeker f437339130
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:12:26 +01:00
Martin Kroeker 5bf87c86f5
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 12:10:20 +01:00
Martin Kroeker 0eb8a87977
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:56:37 +01:00
Martin Kroeker 387830b9d5
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:53:06 +01:00
Martin Kroeker 40109c0392
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:50:30 +01:00
Martin Kroeker 23cda457fb
Implement truncated QR with pivoting (Reference-LAPACK PR 891) 2023-11-15 09:48:23 +01:00
Martin Kroeker d36b86a794
Merge pull request #4320 from ChipKerchner/fixOldGCCPower
Fix older versions of gcc - missing __has_builtin, cpuid and no support of P10.
2023-11-15 08:48:17 +01:00
Chip-Kerchner d99aad8ee3 Fix older version of gcc - missing __has_builtin, cpuid and no support of P10. 2023-11-14 11:07:08 -06:00
Martin Kroeker 46440a0486
Merge pull request #4317 from OpenMathLib/release-0.3.0
Merge release 0.3.25 back into develop to copy tag
2023-11-12 23:09:47 +01:00
Martin Kroeker f4cc1b7a6f
Update version to 0.3.25.dev 2023-11-12 23:07:19 +01:00
Martin Kroeker dff686a86c
Update version to 0.3.25.dev 2023-11-12 23:06:46 +01:00
Martin Kroeker 5e1a429eab
Merge pull request #4316 from OpenMathLib/develop
Merge develop into release-0.3.0 for 0.3.25
2023-11-12 22:55:00 +01:00
Martin Kroeker 64c96716f7
Merge branch 'release-0.3.0' into develop 2023-11-12 22:54:42 +01:00
Martin Kroeker 0e54cbd18c
Update version to 0.3.25 2023-11-12 22:52:05 +01:00
Martin Kroeker f1940010e4
Update version to 0.3.25 2023-11-12 22:51:26 +01:00
Martin Kroeker a47ceda465
Merge pull request #4315 from martin-frbg/m3_cpufamily
Add OSX hw.cpufamily autodetection for Apple M3 as VORTEX
2023-11-12 22:49:58 +01:00
Martin Kroeker e1f529d024
Add OSX hw.cpufamily value for Apple M3 2023-11-12 22:37:11 +01:00
Martin Kroeker c245c12dc2
Update Changelog for 0.3.25 (#4314)
* Update Changelog.txt for 0.3.25
2023-11-12 22:17:39 +01:00
Martin Kroeker fa615967cd
Merge pull request #4312 from martin-frbg/fixotherproto
Fix empty function prototypes
2023-11-12 21:10:27 +01:00
Martin Kroeker 9b5f8eb33a
Fix empty function prototypes 2023-11-12 19:35:53 +01:00
Martin Kroeker ecaaece695
Merge pull request #4311 from martin-frbg/lapack930
Make vector orthogonalization more reliable (Reference-LAPACK PR 930)
2023-11-12 18:42:32 +01:00
Martin Kroeker 6f094c35ee
Merge pull request #4305 from rgommers/ci-limit-runs
Limit CI runs to pushes and pull requests on main repo
2023-11-12 18:39:27 +01:00
Martin Kroeker 3d38da2bc4
Make vector orthogonalization more reliable (Reference-LAPACK PR 930) 2023-11-12 16:50:52 +01:00
Martin Kroeker d58c88cf42
Merge pull request #4310 from martin-frbg/lapack904
Apply rounding up to workspace calculations done with reals (Reference-LAPACK PR 904)
2023-11-12 16:45:10 +01:00
Martin Kroeker feeb10435b
Merge pull request #4309 from martin-frbg/lapack926
Change ?GECON to return INFO=1 if RCOND is NaN (Reference-LAPACK PR 926)
2023-11-12 15:28:16 +01:00
Martin Kroeker 2ce67e2ada
Apply ROUNDUP_LWORK (Reference-LAPACK PR 904) 2023-11-12 14:42:52 +01:00
Martin Kroeker f5664740cd
Apply ROUNDUP_LWORK (Reference-LAPACK PR 904) 2023-11-12 14:29:04 +01:00