Martin Kroeker
fa77561396
add openmp option to pkgconfig template
2024-10-01 13:32:45 +02:00
Martin Kroeker
176107d23a
Add -fopenmp to cflags in pkgconfig file if set
2024-10-01 13:31:14 +02:00
Martin Kroeker
0228d36211
move -fopenmp to CFLAGS
2024-09-30 21:38:05 +02:00
gxw
7087b0a7d0
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
2024-09-29 10:31:26 +08:00
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
2024-09-29 10:13:30 +08:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Chip Kerchner
c8788208c8
Fixing block issue with transpose version.
2024-09-27 13:27:03 -05:00
Chip Kerchner
d7c0d87cd1
Small changes.
2024-09-26 15:21:29 -05:00
Chip Kerchner
eb6f3a05ef
Common MMA code.
2024-09-26 09:28:56 -05:00
Chip Kerchner
fb287d17fc
Common code.
2024-09-25 16:31:36 -05:00
Chip Kerchner
8ab6245771
Small change.
2024-09-24 16:50:21 -05:00
Chip Kerchner
df19375560
Almost final code for MMA.
2024-09-24 16:30:01 -05:00
Chip Kerchner
05aa63e738
More MMA BF16 GEMV code.
2024-09-24 12:54:02 -05:00
Chip Kerchner
c9ce37d527
Force vector pairs in clang.
2024-09-23 08:43:58 -05:00
Chip Kerchner
89a12fa083
MMA BF16 GEMV code.
2024-09-23 06:32:14 -05:00
Martin Kroeker
92f7a2dc3e
Merge pull request #4899 from martin-frbg/flangmtune
...
Strip any mtune option from FFLAGS is the compiler is flang-new
2024-09-19 14:15:06 +02:00
Martin Kroeker
969bb949b1
Strip any mtune option from FFLAGS is the compiler is flang-new
2024-09-19 11:10:28 +02:00
Martin Kroeker
fca86e359c
Merge pull request #4887 from goplanid/develop
...
Small GEMM improvements for AArch64 with SVE
2024-09-16 11:17:19 +02:00
Chip Kerchner
7947970f9d
Move common code.
2024-09-13 06:22:13 -05:00
Martin Kroeker
60c1519e01
Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
...
AzureCI: Update Intel oneAPI download for Mac to final version
2024-09-12 21:09:28 +02:00
Martin Kroeker
c8313d9d80
Merge pull request #4895 from martin-frbg/update_homebrewjob
...
CI: Update nightly-homebrew workflow
2024-09-12 21:09:10 +02:00
Martin Kroeker
b588e922a1
Update oneAPI download location for Mac to final
2024-09-12 18:13:46 +02:00
Martin Kroeker
4178905fa7
Update version of upload-artifacts following deprecation
2024-09-12 16:39:20 +02:00
Martin Kroeker
5f70e245a2
Merge pull request #4894 from martin-frbg/issue4893
...
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
2024-09-12 15:09:54 +02:00
Martin Kroeker
383e0b133e
remove suppression of gcc14's incompatible pointer error
2024-09-11 22:21:09 +02:00
Martin Kroeker
869a169c57
Fix ZAXPYTEST prototype
2024-09-11 22:18:14 +02:00
Chip Kerchner
72216d28c2
Fix bug with inc_y adding results twice.
2024-09-11 08:47:32 -05:00
Chip Kerchner
2f142ee857
More common code.
2024-09-09 14:41:55 -05:00
Chip Kerchner
39fd29f1de
Minor improvement and turn off BF16 GEMV forwarding by default.
2024-09-08 18:28:31 -05:00
Chip Kerchner
8541b25e1d
Special case beta is one.
2024-09-06 14:48:48 -05:00
Chip Kerchner
76227e2948
Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV.
2024-09-06 14:03:31 -05:00
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
2024-09-02 22:22:49 +05:30
Martin Kroeker
485027563e
Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize
...
Fix SBGEMM unit test to handle zero elements.
2024-08-17 11:47:26 +02:00
Chip Kerchner
89702e1f4a
Fix zero element GEMV test.
2024-08-16 11:37:39 -05:00
Chip Kerchner
77f85c7c00
GEMV tests don't like zero elements.
2024-08-16 11:15:32 -05:00
Chip Kerchner
868aa857bc
Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero.
2024-08-16 10:28:10 -05:00
Chip Kerchner
b1802f4dc8
Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems.
2024-08-16 09:51:37 -05:00
Martin Kroeker
f61930eb11
Merge pull request #4882 from martin-frbg/issue4805-3
...
Restore the workaround in the POTRS utest as it is reportedly still needed on 3C6000/gcc14.2
2024-08-16 11:24:51 +02:00
Martin Kroeker
dfba3f8841
restore the pragma as it is reportedly still needed on 3C6000/gcc14.2
2024-08-16 11:23:19 +02:00
Martin Kroeker
7129a64d87
Merge pull request #4881 from martin-frbg/issue4805-2
...
Use fld.d/fst.d in PROLOGUE/EPILOGUE in LOONGSON3R5 GEMM
2024-08-16 08:47:12 +02:00
Martin Kroeker
49080b631e
remove optimizer pragma again
2024-08-15 22:15:27 +02:00
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
2024-08-15 22:14:29 +02:00
Martin Kroeker
3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2
...
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker
dd71df8fab
Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail
...
[POWER] Vectorize SGEMV transpose reduce stage
2024-08-15 20:36:22 +02:00
Martin Kroeker
a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
...
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker
d24b3cf393
properly fix buffer allocation and assignment
2024-08-15 15:32:58 +02:00
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
2024-08-15 08:00:00 -05:00
Martin Kroeker
eba8615c11
Merge pull request #4876 from martin-frbg/granite
...
Add autodetection support for Intel Granite Rapids as Sapphire Rapids
2024-08-15 13:50:54 +02:00
Martin Kroeker
bc80e7f02d
Merge pull request #4878 from martin-frbg/cirrus-androidndk
...
Cirrus CI: fix installation of NDK in armv7 crossbuild
2024-08-15 13:50:09 +02:00
Martin Kroeker
94c9e0b7ad
Update ndk version number
2024-08-15 11:30:23 +02:00