Martin Kroeker
|
969bb949b1
|
Strip any mtune option from FFLAGS is the compiler is flang-new
|
2024-09-19 11:10:28 +02:00 |
Martin Kroeker
|
fca86e359c
|
Merge pull request #4887 from goplanid/develop
Small GEMM improvements for AArch64 with SVE
|
2024-09-16 11:17:19 +02:00 |
Chip Kerchner
|
7947970f9d
|
Move common code.
|
2024-09-13 06:22:13 -05:00 |
Martin Kroeker
|
60c1519e01
|
Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
AzureCI: Update Intel oneAPI download for Mac to final version
|
2024-09-12 21:09:28 +02:00 |
Martin Kroeker
|
c8313d9d80
|
Merge pull request #4895 from martin-frbg/update_homebrewjob
CI: Update nightly-homebrew workflow
|
2024-09-12 21:09:10 +02:00 |
Martin Kroeker
|
b588e922a1
|
Update oneAPI download location for Mac to final
|
2024-09-12 18:13:46 +02:00 |
Martin Kroeker
|
4178905fa7
|
Update version of upload-artifacts following deprecation
|
2024-09-12 16:39:20 +02:00 |
Martin Kroeker
|
5f70e245a2
|
Merge pull request #4894 from martin-frbg/issue4893
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
|
2024-09-12 15:09:54 +02:00 |
Martin Kroeker
|
383e0b133e
|
remove suppression of gcc14's incompatible pointer error
|
2024-09-11 22:21:09 +02:00 |
Martin Kroeker
|
869a169c57
|
Fix ZAXPYTEST prototype
|
2024-09-11 22:18:14 +02:00 |
Chip Kerchner
|
72216d28c2
|
Fix bug with inc_y adding results twice.
|
2024-09-11 08:47:32 -05:00 |
Chip Kerchner
|
2f142ee857
|
More common code.
|
2024-09-09 14:41:55 -05:00 |
Chip Kerchner
|
39fd29f1de
|
Minor improvement and turn off BF16 GEMV forwarding by default.
|
2024-09-08 18:28:31 -05:00 |
Chip Kerchner
|
8541b25e1d
|
Special case beta is one.
|
2024-09-06 14:48:48 -05:00 |
Chip Kerchner
|
76227e2948
|
Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.
|
2024-09-06 14:03:31 -05:00 |
Deeksha Goplani
|
4894c54055
|
Improve TN case with further unrolling
|
2024-09-02 22:22:49 +05:30 |
Martin Kroeker
|
485027563e
|
Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize
Fix SBGEMM unit test to handle zero elements.
|
2024-08-17 11:47:26 +02:00 |
Chip Kerchner
|
89702e1f4a
|
Fix zero element GEMV test.
|
2024-08-16 11:37:39 -05:00 |
Chip Kerchner
|
77f85c7c00
|
GEMV tests don't like zero elements.
|
2024-08-16 11:15:32 -05:00 |
Chip Kerchner
|
868aa857bc
|
Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero.
|
2024-08-16 10:28:10 -05:00 |
Chip Kerchner
|
b1802f4dc8
|
Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems.
|
2024-08-16 09:51:37 -05:00 |
Martin Kroeker
|
f61930eb11
|
Merge pull request #4882 from martin-frbg/issue4805-3
Restore the workaround in the POTRS utest as it is reportedly still needed on 3C6000/gcc14.2
|
2024-08-16 11:24:51 +02:00 |
Martin Kroeker
|
dfba3f8841
|
restore the pragma as it is reportedly still needed on 3C6000/gcc14.2
|
2024-08-16 11:23:19 +02:00 |
Martin Kroeker
|
7129a64d87
|
Merge pull request #4881 from martin-frbg/issue4805-2
Use fld.d/fst.d in PROLOGUE/EPILOGUE in LOONGSON3R5 GEMM
|
2024-08-16 08:47:12 +02:00 |
Martin Kroeker
|
49080b631e
|
remove optimizer pragma again
|
2024-08-15 22:15:27 +02:00 |
Martin Kroeker
|
e05d98d00a
|
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
|
2024-08-15 22:14:29 +02:00 |
Martin Kroeker
|
3ee9e9d8d0
|
Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
|
2024-08-15 22:06:54 +02:00 |
Martin Kroeker
|
dd71df8fab
|
Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail
[POWER] Vectorize SGEMV transpose reduce stage
|
2024-08-15 20:36:22 +02:00 |
Martin Kroeker
|
a8d6b0219a
|
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
|
2024-08-15 15:35:26 +02:00 |
Martin Kroeker
|
d24b3cf393
|
properly fix buffer allocation and assignment
|
2024-08-15 15:32:58 +02:00 |
Chip Kerchner
|
a0aeba631d
|
Merge branch 'develop' into betterPowerGEMVTail
|
2024-08-15 08:00:00 -05:00 |
Martin Kroeker
|
eba8615c11
|
Merge pull request #4876 from martin-frbg/granite
Add autodetection support for Intel Granite Rapids as Sapphire Rapids
|
2024-08-15 13:50:54 +02:00 |
Martin Kroeker
|
bc80e7f02d
|
Merge pull request #4878 from martin-frbg/cirrus-androidndk
Cirrus CI: fix installation of NDK in armv7 crossbuild
|
2024-08-15 13:50:09 +02:00 |
Martin Kroeker
|
94c9e0b7ad
|
Update ndk version number
|
2024-08-15 11:30:23 +02:00 |
Martin Kroeker
|
ed0321563a
|
fix installation of NDK in armv7 crossbuild
|
2024-08-15 11:11:07 +02:00 |
gxw
|
fd033467ac
|
Fixed the undefined reference to blas_set_parameter
Fixed the undefined reference to blas_set_parameter when
enabling USE_OPENMP and DYNAMIC_ARCH.
|
2024-08-15 16:48:48 +08:00 |
Martin Kroeker
|
1b8e40874e
|
Add autodetection support for Intel Granite Rapids as Sapphire Rapids
|
2024-08-15 09:33:42 +02:00 |
Martin Kroeker
|
4944148e66
|
Merge pull request #4875 from ChipKerchner/addGEMVtoBF16Test
Add GEMV to SBGEMx vs SGEMx testing
|
2024-08-15 09:32:11 +02:00 |
Martin Kroeker
|
a388c4b834
|
Merge pull request #4872 from chenx97/ls3a-fix-stack-fpr-len
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
|
2024-08-15 00:10:16 +02:00 |
Martin Kroeker
|
f24b521709
|
Merge pull request #4787 from vlad0x00/patch-1
Update cross compile info
|
2024-08-15 00:09:53 +02:00 |
Vladimir Nikolić
|
2d84ed7e76
|
Update README.md
|
2024-08-14 14:31:35 -07:00 |
Chip Kerchner
|
083faf7556
|
Merge branch 'develop' into betterPowerGEMVTail
|
2024-08-14 15:56:03 -05:00 |
Chip Kerchner
|
c23897f585
|
Add GEMV testing to SBGEMx vs SGEMx testing.
|
2024-08-14 15:55:23 -05:00 |
Martin Kroeker
|
0d8ee96f1e
|
Merge pull request #4874 from martin-frbg/issue4869
Fix handling of deprecated ?GELQS/?GEQRS in building the shared library
|
2024-08-14 22:49:12 +02:00 |
Martin Kroeker
|
b80671d896
|
Merge pull request #4871 from martin-frbg/issue4868
Ensure a buffer has been allocated for each thread before invoking it
|
2024-08-14 20:53:39 +02:00 |
Martin Kroeker
|
6452f7b46d
|
Merge pull request #4873 from ChipKerchner/fixSBGEMMDefaults
[POWER] Problem with multi-threaded SBGEMM
|
2024-08-14 19:22:03 +02:00 |
Chip Kerchner
|
75472b830a
|
Merge branch 'develop' into betterPowerGEMVTail
|
2024-08-14 10:52:46 -05:00 |
Martin Kroeker
|
ca7777de18
|
Merge pull request #4870 from chenx97/fix-recursive-make-var
Fix recursive variable expansion in Makefiles for LOONGSON3A
|
2024-08-14 16:03:50 +02:00 |
Martin Kroeker
|
f6469e21bc
|
move gelqs and geqrs to lapack-deprecated
|
2024-08-14 16:00:43 +02:00 |
Chip Kerchner
|
31226740d6
|
Cleanup of SBGEMM unit test.
|
2024-08-14 08:10:25 -05:00 |