Commit Graph

8703 Commits

Author SHA1 Message Date
Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 2024-10-06 20:03:32 +02:00
Martin Kroeker 624e9d110e
Merge pull request #4916 from martin-frbg/issue4901
Fix SIGILL/SIGSEGV in PPCG4 SGEMM and fix NAN handling in PPCG4 SSCAL/DSCAL
2024-10-03 23:25:45 +02:00
Martin Kroeker d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds 2024-10-03 22:04:20 +02:00
Martin Kroeker 7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4 2024-10-03 22:00:15 +02:00
Martin Kroeker 54afc24e4d
Merge pull request #4906 from XiWeiGu/arm64_cmake_small_matrix_opt
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
2024-10-03 20:05:11 +02:00
Martin Kroeker b4495a8fb8
Merge branch 'develop' into arm64_cmake_small_matrix_opt 2024-10-03 20:04:52 +02:00
Martin Kroeker 68eefe60b9
Merge pull request #4915 from martin-frbg/issue4907
Support LoongArch64 compilation with LLVM
2024-10-03 18:29:29 +02:00
Martin Kroeker 4f00f02567
Do not add -mabi flags for Loongson when the compiler is flang 2024-10-03 16:06:33 +02:00
Martin Kroeker f817f26062
Add simpler EPILOGUE for clang 2024-10-03 16:01:10 +02:00
Martin Kroeker a492181665
filter out Loongarch -mabi options for flang-new 2024-10-03 15:58:47 +02:00
Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
Martin Kroeker edaf5933c4
Merge pull request #4913 from martin-frbg/issue4912
Declare the input array in CBLAS_?GEADD as const in cblas.h
2024-10-02 23:37:15 +02:00
Martin Kroeker 71131406ae
Declare the input array in CBLAS_?GEADD as const 2024-10-02 18:32:48 +02:00
Martin Kroeker f10d47c4bb
Merge pull request #4910 from martin-frbg/issue4908
fix placement of -fopenmp in the pkgconfig file
2024-10-01 17:49:12 +02:00
Martin Kroeker a1073f5eed
Merge pull request #4900 from XiWeiGu/la64_core_rename
LoongArch64: Rename core
2024-10-01 15:29:16 +02:00
Martin Kroeker fa77561396
add openmp option to pkgconfig template 2024-10-01 13:32:45 +02:00
Martin Kroeker 176107d23a
Add -fopenmp to cflags in pkgconfig file if set 2024-10-01 13:31:14 +02:00
Martin Kroeker 0228d36211
move -fopenmp to CFLAGS 2024-09-30 21:38:05 +02:00
gxw 7087b0a7d0 ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake 2024-09-29 10:31:26 +08:00
gxw 30af9278dc LoongArch64: Enable cmake cross-compilation 2024-09-29 10:13:30 +08:00
gxw 48698b2b1d LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker 92f7a2dc3e
Merge pull request #4899 from martin-frbg/flangmtune
Strip any mtune option from FFLAGS is the compiler is flang-new
2024-09-19 14:15:06 +02:00
Martin Kroeker 969bb949b1
Strip any mtune option from FFLAGS is the compiler is flang-new 2024-09-19 11:10:28 +02:00
Martin Kroeker fca86e359c
Merge pull request #4887 from goplanid/develop
Small GEMM improvements for AArch64 with SVE
2024-09-16 11:17:19 +02:00
Martin Kroeker 60c1519e01
Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
AzureCI: Update Intel oneAPI download for Mac to final version
2024-09-12 21:09:28 +02:00
Martin Kroeker c8313d9d80
Merge pull request #4895 from martin-frbg/update_homebrewjob
CI: Update nightly-homebrew workflow
2024-09-12 21:09:10 +02:00
Martin Kroeker b588e922a1
Update oneAPI download location for Mac to final 2024-09-12 18:13:46 +02:00
Martin Kroeker 4178905fa7
Update version of upload-artifacts following deprecation 2024-09-12 16:39:20 +02:00
Martin Kroeker 5f70e245a2
Merge pull request #4894 from martin-frbg/issue4893
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
2024-09-12 15:09:54 +02:00
Martin Kroeker 383e0b133e
remove suppression of gcc14's incompatible pointer error 2024-09-11 22:21:09 +02:00
Martin Kroeker 869a169c57
Fix ZAXPYTEST prototype 2024-09-11 22:18:14 +02:00
Deeksha Goplani 4894c54055 Improve TN case with further unrolling 2024-09-02 22:22:49 +05:30
Martin Kroeker 485027563e
Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize
Fix SBGEMM unit test to handle zero elements.
2024-08-17 11:47:26 +02:00
Chip Kerchner 89702e1f4a Fix zero element GEMV test. 2024-08-16 11:37:39 -05:00
Chip Kerchner 77f85c7c00 GEMV tests don't like zero elements. 2024-08-16 11:15:32 -05:00
Chip Kerchner 868aa857bc Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero. 2024-08-16 10:28:10 -05:00
Chip Kerchner b1802f4dc8 Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems. 2024-08-16 09:51:37 -05:00
Martin Kroeker f61930eb11
Merge pull request #4882 from martin-frbg/issue4805-3
Restore the workaround in the POTRS utest as it is reportedly still needed on 3C6000/gcc14.2
2024-08-16 11:24:51 +02:00
Martin Kroeker dfba3f8841
restore the pragma as it is reportedly still needed on 3C6000/gcc14.2 2024-08-16 11:23:19 +02:00
Martin Kroeker 7129a64d87
Merge pull request #4881 from martin-frbg/issue4805-2
Use fld.d/fst.d in PROLOGUE/EPILOGUE in LOONGSON3R5 GEMM
2024-08-16 08:47:12 +02:00
Martin Kroeker 49080b631e
remove optimizer pragma again 2024-08-15 22:15:27 +02:00
Martin Kroeker e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros 2024-08-15 22:14:29 +02:00
Martin Kroeker 3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2
Ensure a memory buffer has been allocated for each thread before invoking it (take 2)
2024-08-15 22:06:54 +02:00
Martin Kroeker dd71df8fab
Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail
[POWER] Vectorize SGEMV transpose reduce stage
2024-08-15 20:36:22 +02:00
Martin Kroeker a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter
Fixed the undefined reference to blas_set_parameter
2024-08-15 15:35:26 +02:00
Martin Kroeker d24b3cf393
properly fix buffer allocation and assignment 2024-08-15 15:32:58 +02:00
Chip Kerchner a0aeba631d Merge branch 'develop' into betterPowerGEMVTail 2024-08-15 08:00:00 -05:00
Martin Kroeker eba8615c11
Merge pull request #4876 from martin-frbg/granite
Add autodetection support for Intel Granite Rapids as Sapphire Rapids
2024-08-15 13:50:54 +02:00
Martin Kroeker bc80e7f02d
Merge pull request #4878 from martin-frbg/cirrus-androidndk
Cirrus CI: fix installation of NDK in armv7 crossbuild
2024-08-15 13:50:09 +02:00
Martin Kroeker 94c9e0b7ad
Update ndk version number 2024-08-15 11:30:23 +02:00