Commit Graph

8681 Commits

Author SHA1 Message Date
gxw
48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
fca86e359c Merge pull request #4887 from goplanid/develop
Small GEMM improvements for AArch64 with SVE
2024-09-16 11:17:19 +02:00
Martin Kroeker
60c1519e01 Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
AzureCI: Update Intel oneAPI download for Mac to final version
2024-09-12 21:09:28 +02:00
Martin Kroeker
c8313d9d80 Merge pull request #4895 from martin-frbg/update_homebrewjob
CI: Update nightly-homebrew workflow
2024-09-12 21:09:10 +02:00
Martin Kroeker
b588e922a1 Update oneAPI download location for Mac to final 2024-09-12 18:13:46 +02:00
Martin Kroeker
4178905fa7 Update version of upload-artifacts following deprecation 2024-09-12 16:39:20 +02:00
Martin Kroeker
5f70e245a2 Merge pull request #4894 from martin-frbg/issue4893
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
2024-09-12 15:09:54 +02:00
Martin Kroeker
383e0b133e remove suppression of gcc14's incompatible pointer error 2024-09-11 22:21:09 +02:00
Martin Kroeker
869a169c57 Fix ZAXPYTEST prototype 2024-09-11 22:18:14 +02:00
Deeksha Goplani
4894c54055 Improve TN case with further unrolling 2024-09-02 22:22:49 +05:30
Martin Kroeker
485027563e Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize
Fix SBGEMM unit test to handle zero elements.
2024-08-17 11:47:26 +02:00
Chip Kerchner
89702e1f4a Fix zero element GEMV test. 2024-08-16 11:37:39 -05:00
Chip Kerchner
77f85c7c00 GEMV tests don't like zero elements. 2024-08-16 11:15:32 -05:00
Chip Kerchner
868aa857bc Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero. 2024-08-16 10:28:10 -05:00
Chip Kerchner
b1802f4dc8 Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems. 2024-08-16 09:51:37 -05:00
Martin Kroeker
f61930eb11 Merge pull request #4882 from martin-frbg/issue4805-3
Restore the workaround in the POTRS utest as it is reportedly still needed on 3C6000/gcc14.2
2024-08-16 11:24:51 +02:00
Martin Kroeker
dfba3f8841 restore the pragma as it is reportedly still needed on 3C6000/gcc14.2 2024-08-16 11:23:19 +02:00
Martin Kroeker
7129a64d87 Merge pull request #4881 from martin-frbg/issue4805-2
Use fld.d/fst.d in PROLOGUE/EPILOGUE in LOONGSON3R5 GEMM
2024-08-16 08:47:12 +02:00
Martin Kroeker
49080b631e remove optimizer pragma again 2024-08-15 22:15:27 +02:00
Martin Kroeker
e05d98d00a expressly use fld.d/fst.d for floating point registers instead of LD/ST macros 2024-08-15 22:14:29 +02:00
Martin Kroeker
3ee9e9d8d0 Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
dd71df8fab Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail
[POWER] Vectorize SGEMV transpose reduce stage
2024-08-15 20:36:22 +02:00
Martin Kroeker
a8d6b0219a Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393 properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
Chip Kerchner
a0aeba631d Merge branch 'develop' into betterPowerGEMVTail 2024-08-15 08:00:00 -05:00
Martin Kroeker
eba8615c11 Merge pull request #4876 from martin-frbg/granite
Add autodetection support for Intel Granite Rapids as Sapphire Rapids
2024-08-15 13:50:54 +02:00
Martin Kroeker
bc80e7f02d Merge pull request #4878 from martin-frbg/cirrus-androidndk
Cirrus CI: fix installation of NDK in armv7 crossbuild
2024-08-15 13:50:09 +02:00
Martin Kroeker
94c9e0b7ad Update ndk version number 2024-08-15 11:30:23 +02:00
Martin Kroeker
ed0321563a fix installation of NDK in armv7 crossbuild 2024-08-15 11:11:07 +02:00
gxw
fd033467ac Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
2024-08-15 16:48:48 +08:00
Martin Kroeker
1b8e40874e Add autodetection support for Intel Granite Rapids as Sapphire Rapids 2024-08-15 09:33:42 +02:00
Martin Kroeker
4944148e66 Merge pull request #4875 from ChipKerchner/addGEMVtoBF16Test
Add GEMV to SBGEMx vs SGEMx testing
2024-08-15 09:32:11 +02:00
Martin Kroeker
a388c4b834 Merge pull request #4872 from chenx97/ls3a-fix-stack-fpr-len
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
2024-08-15 00:10:16 +02:00
Martin Kroeker
f24b521709 Merge pull request #4787 from vlad0x00/patch-1
Update cross compile info
2024-08-15 00:09:53 +02:00
Vladimir Nikolić
2d84ed7e76 Update README.md 2024-08-14 14:31:35 -07:00
Chip Kerchner
083faf7556 Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 15:56:03 -05:00
Chip Kerchner
c23897f585 Add GEMV testing to SBGEMx vs SGEMx testing. 2024-08-14 15:55:23 -05:00
Martin Kroeker
0d8ee96f1e Merge pull request #4874 from martin-frbg/issue4869
Fix handling of deprecated ?GELQS/?GEQRS in building the shared library
2024-08-14 22:49:12 +02:00
Martin Kroeker
b80671d896 Merge pull request #4871 from martin-frbg/issue4868
Ensure a buffer has been allocated for each thread before invoking it
2024-08-14 20:53:39 +02:00
Martin Kroeker
6452f7b46d Merge pull request #4873 from ChipKerchner/fixSBGEMMDefaults
[POWER] Problem with multi-threaded SBGEMM
2024-08-14 19:22:03 +02:00
Chip Kerchner
75472b830a Merge branch 'develop' into betterPowerGEMVTail 2024-08-14 10:52:46 -05:00
Martin Kroeker
ca7777de18 Merge pull request #4870 from chenx97/fix-recursive-make-var
Fix recursive variable expansion in Makefiles for LOONGSON3A
2024-08-14 16:03:50 +02:00
Martin Kroeker
f6469e21bc move gelqs and geqrs to lapack-deprecated 2024-08-14 16:00:43 +02:00
Chip Kerchner
31226740d6 Cleanup of SBGEMM unit test. 2024-08-14 08:10:25 -05:00
Henry Chen
ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c.
2024-08-14 18:05:11 +08:00
Martin Kroeker
23b5d66a86 Ensure a memory buffer has been allocated for each thread before invoking it 2024-08-14 10:35:44 +02:00
Henry Chen
20bdb65882 Fix recursive variable expansion in Makefiles for LOONGSON3A 2024-08-14 15:08:32 +08:00
Chip Kerchner
b1737698db Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences. 2024-08-13 07:01:21 -05:00
Martin Kroeker
e5525036e7 Merge pull request #4865 from martin-frbg/issue4856
Tweak LAPACK STFSM test threshold a little more to cover POWER10 fma
2024-08-13 07:20:06 +02:00
Martin Kroeker
fd52d09490 Merge pull request #4864 from martin-frbg/issue4862
Spell out function prototypes in the SYRK calls of potrf_parallel
2024-08-13 00:16:45 +02:00