Martin Kroeker
b89fb9632f
Update Android NDK install path for M1/armv7 crossbuild
2024-10-10 10:19:11 +02:00
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
...
CGEMM & ZGEMM using C code, Power only, P10 only.
2024-10-09 20:26:21 +02:00
Martin Kroeker
dbd83762f9
Merge pull request #4926 from NickelWenzel/fix_arm64_windows_and_uwp
...
fix: add missing NO_AFFINITY checks
2024-10-09 19:48:16 +02:00
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
2024-10-09 09:42:23 -05:00
NickelWenzel
bee123e8e3
fix: add missing NO_AFFINITY checks
2024-10-09 16:36:40 +02:00
Martin Kroeker
7ac5b9011f
Merge pull request #4923 from martin-frbg/zen5
...
Add preliminary cpu autodetection for Zen5/5c
2024-10-09 16:18:47 +02:00
Martin Kroeker
2c3b87a082
Add preliminary cpu autodetection for Zen5/5c
2024-10-08 23:07:42 +02:00
Martin Kroeker
73c1882129
Merge pull request #4922 from martin-frbg/issue4904-2
...
Update names of Loongarch64 targets in cmake cross-building
2024-10-07 13:24:14 +02:00
Martin Kroeker
bc0691a556
Merge pull request #4920 from martin-frbg/issue4917
...
Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO
2024-10-07 08:26:03 +02:00
Martin Kroeker
b0346e72f4
update names of loongarch64 targets for cross-compilation
2024-10-06 22:48:33 +02:00
Martin Kroeker
9c707dc6b9
Update dynamic arch list to new target scheme
2024-10-06 22:46:03 +02:00
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
2024-10-06 22:43:11 +02:00
Martin Kroeker
0dfe42d62a
Merge pull request #4919 from martin-frbg/issue4916-2
...
Handle inf/nan in ppc440 s/dscal
2024-10-06 22:29:28 +02:00
Martin Kroeker
8a1710dd0d
don't apply switch_ratio to tail of loop
2024-10-06 20:03:32 +02:00
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
2024-10-06 19:57:17 +02:00
Martin Kroeker
624e9d110e
Merge pull request #4916 from martin-frbg/issue4901
...
Fix SIGILL/SIGSEGV in PPCG4 SGEMM and fix NAN handling in PPCG4 SSCAL/DSCAL
2024-10-03 23:25:45 +02:00
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
2024-10-03 22:04:20 +02:00
Martin Kroeker
7c4f3638fd
switch PPCG4 SGEMM kernel to 4x4
2024-10-03 22:00:15 +02:00
Martin Kroeker
54afc24e4d
Merge pull request #4906 from XiWeiGu/arm64_cmake_small_matrix_opt
...
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
2024-10-03 20:05:11 +02:00
Martin Kroeker
b4495a8fb8
Merge branch 'develop' into arm64_cmake_small_matrix_opt
2024-10-03 20:04:52 +02:00
Martin Kroeker
68eefe60b9
Merge pull request #4915 from martin-frbg/issue4907
...
Support LoongArch64 compilation with LLVM
2024-10-03 18:29:29 +02:00
Martin Kroeker
4f00f02567
Do not add -mabi flags for Loongson when the compiler is flang
2024-10-03 16:06:33 +02:00
Martin Kroeker
f817f26062
Add simpler EPILOGUE for clang
2024-10-03 16:01:10 +02:00
Martin Kroeker
a492181665
filter out Loongarch -mabi options for flang-new
2024-10-03 15:58:47 +02:00
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
...
LoongArch64: Enable cmake cross-compilation
2024-10-03 15:53:57 +02:00
Martin Kroeker
edaf5933c4
Merge pull request #4913 from martin-frbg/issue4912
...
Declare the input array in CBLAS_?GEADD as const in cblas.h
2024-10-02 23:37:15 +02:00
Martin Kroeker
71131406ae
Declare the input array in CBLAS_?GEADD as const
2024-10-02 18:32:48 +02:00
Martin Kroeker
f10d47c4bb
Merge pull request #4910 from martin-frbg/issue4908
...
fix placement of -fopenmp in the pkgconfig file
2024-10-01 17:49:12 +02:00
Martin Kroeker
a1073f5eed
Merge pull request #4900 from XiWeiGu/la64_core_rename
...
LoongArch64: Rename core
2024-10-01 15:29:16 +02:00
Martin Kroeker
fa77561396
add openmp option to pkgconfig template
2024-10-01 13:32:45 +02:00
Martin Kroeker
176107d23a
Add -fopenmp to cflags in pkgconfig file if set
2024-10-01 13:31:14 +02:00
Martin Kroeker
0228d36211
move -fopenmp to CFLAGS
2024-09-30 21:38:05 +02:00
gxw
7087b0a7d0
ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake
2024-09-29 10:31:26 +08:00
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
2024-09-29 10:13:30 +08:00
gxw
48698b2b1d
LoongArch64: Rename core
...
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
2024-09-29 09:35:21 +08:00
Martin Kroeker
92f7a2dc3e
Merge pull request #4899 from martin-frbg/flangmtune
...
Strip any mtune option from FFLAGS is the compiler is flang-new
2024-09-19 14:15:06 +02:00
Martin Kroeker
969bb949b1
Strip any mtune option from FFLAGS is the compiler is flang-new
2024-09-19 11:10:28 +02:00
Martin Kroeker
fca86e359c
Merge pull request #4887 from goplanid/develop
...
Small GEMM improvements for AArch64 with SVE
2024-09-16 11:17:19 +02:00
Martin Kroeker
60c1519e01
Merge pull request #4896 from martin-frbg/update_azure_mac_hpc
...
AzureCI: Update Intel oneAPI download for Mac to final version
2024-09-12 21:09:28 +02:00
Martin Kroeker
c8313d9d80
Merge pull request #4895 from martin-frbg/update_homebrewjob
...
CI: Update nightly-homebrew workflow
2024-09-12 21:09:10 +02:00
Martin Kroeker
b588e922a1
Update oneAPI download location for Mac to final
2024-09-12 18:13:46 +02:00
Martin Kroeker
4178905fa7
Update version of upload-artifacts following deprecation
2024-09-12 16:39:20 +02:00
Martin Kroeker
5f70e245a2
Merge pull request #4894 from martin-frbg/issue4893
...
Fix function definition in the f2c-converted ctest and remove suppression of gcc14 error
2024-09-12 15:09:54 +02:00
Martin Kroeker
383e0b133e
remove suppression of gcc14's incompatible pointer error
2024-09-11 22:21:09 +02:00
Martin Kroeker
869a169c57
Fix ZAXPYTEST prototype
2024-09-11 22:18:14 +02:00
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
2024-09-02 22:22:49 +05:30
Martin Kroeker
485027563e
Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize
...
Fix SBGEMM unit test to handle zero elements.
2024-08-17 11:47:26 +02:00
Chip Kerchner
89702e1f4a
Fix zero element GEMV test.
2024-08-16 11:37:39 -05:00
Chip Kerchner
77f85c7c00
GEMV tests don't like zero elements.
2024-08-16 11:15:32 -05:00
Chip Kerchner
868aa857bc
Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero.
2024-08-16 10:28:10 -05:00