Martin Kroeker
|
f40819476c
|
mention RISCV64 as a permitted architecture for DYNAMIC_ARCH
|
2024-08-03 23:54:35 +02:00 |
Martin Kroeker
|
7af3c552d3
|
use TARGET rather than CORE from Makefile.conf_last to fill in pkgconfig
|
2024-08-03 23:51:57 +02:00 |
Martin Kroeker
|
2c2b6bcf96
|
Merge pull request #4831 from martin-frbg/gemmforward
Enable forwarding from GEMM to GEMV for RISCV and PPC in addition to ARM64
|
2024-08-03 18:52:11 +02:00 |
Martin Kroeker
|
6468dc1142
|
restore the coarse locking of the pre-4359 version
|
2024-08-02 16:39:47 +02:00 |
Martin Kroeker
|
abff4baa4d
|
re-enable queue struct members related to locking
|
2024-08-02 16:37:01 +02:00 |
Chip Kerchner
|
1a7b8c650d
|
Merge branch 'develop' into betterPowerGEMVTail
|
2024-08-01 14:59:12 -05:00 |
Chip Kerchner
|
e2334d0218
|
Remove GEMV hack.
|
2024-08-01 14:44:40 -05:00 |
Martin Kroeker
|
42d8865234
|
fix typo
|
2024-08-01 12:24:45 +02:00 |
Martin Kroeker
|
9eecd0d33b
|
enable GEMM/GEMV forwarding for riscv and ppc
|
2024-07-31 23:29:12 +02:00 |
Martin Kroeker
|
fcb88b9d52
|
enable GEMM/GEMV forwarding for riscv and ppc
|
2024-07-31 23:21:35 +02:00 |
Martin Kroeker
|
9afd0c8afd
|
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
|
2024-07-31 23:18:01 +02:00 |
Martin Kroeker
|
edbf093c98
|
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
|
2024-07-31 19:45:15 +02:00 |
Chris Sidebottom
|
ba2e989c67
|
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
|
2024-07-31 13:09:14 +01:00 |
Chris Sidebottom
|
b26424c6a2
|
Allow opt into GEMM -> GEMV forwarding
|
2024-07-31 13:09:14 +01:00 |
Chris Sidebottom
|
90eb863d4b
|
Re-add accidental removal
|
2024-07-31 13:09:14 +01:00 |
Chris Sidebottom
|
28b5334f22
|
Complete implementation of GEMV forwarding
|
2024-07-31 13:09:14 +01:00 |
Martin Kroeker
|
3db5dbc88e
|
forward to GEMV when one argument is actually a vector
|
2024-07-31 13:09:14 +01:00 |
Martin Kroeker
|
136a4edc5f
|
Merge pull request #4830 from martin-frbg/jenk
Continue requesting ubuntu18 instead of latest on OSUOSL powerCI
|
2024-07-30 22:19:14 +02:00 |
Martin Kroeker
|
86c15f028b
|
Update Jenkinsfile.pwr
|
2024-07-30 21:21:34 +02:00 |
Martin Kroeker
|
a13015b656
|
try requesting ubuntu22 instead of latest
|
2024-07-30 19:10:18 +02:00 |
Martin Kroeker
|
d11e734002
|
Merge pull request #4827 from Mousius/a64fx-gcc11
Fix GCC11 check for A64FX target
|
2024-07-29 16:36:13 +02:00 |
Chris Sidebottom
|
54ce33e851
|
Fix GCC11 check for A64FX target
|
2024-07-29 15:28:59 +01:00 |
Martin Kroeker
|
6d071f1a1c
|
Merge pull request #4826 from Mousius/a64fx-fallback
Add fallback compile options for A64FX target
|
2024-07-29 15:33:43 +02:00 |
Chris Sidebottom
|
3ed226d3f8
|
Re-add ISCLANG filter
|
2024-07-29 11:32:59 +01:00 |
Chris Sidebottom
|
85ca003ae7
|
Add fallback compile options for A64FX target
|
2024-07-29 11:25:03 +01:00 |
Martin Kroeker
|
05bf35f296
|
Merge pull request #4822 from martin-frbg/issue4821
Fix c_check to tolerate a dashed suffix to the gcc version number
|
2024-07-27 20:28:06 +02:00 |
Martin Kroeker
|
175008caf8
|
harden against a dashed suffix to the gcc version number
|
2024-07-27 19:08:02 +02:00 |
Martin Kroeker
|
886acfc444
|
Merge pull request #4819 from martin-frbg/issue4776
Re-enable the SGESDD benchmark after the SCAL fixes
|
2024-07-26 16:57:35 +02:00 |
Martin Kroeker
|
4460d3ee7f
|
re-enable the sgesdd benchmark
|
2024-07-26 15:07:52 +02:00 |
Martin Kroeker
|
092986582f
|
Merge pull request #4818 from martin-frbg/docs_winbuild
[Docs] replace "Preview" in the MSVC vcvarsall path example
|
2024-07-26 14:57:53 +02:00 |
Martin Kroeker
|
25e148ec58
|
Merge pull request #4817 from martin-frbg/fix4807
Fix SCAL on x86 and RISCV_GENERIC
|
2024-07-26 14:56:44 +02:00 |
Martin Kroeker
|
a090011fbf
|
just use numeric constants in dimensions
|
2024-07-26 12:56:12 +02:00 |
Martin Kroeker
|
7006492863
|
replace "Preview" in the MSVC vcvarsall path with "Community"
|
2024-07-26 12:49:57 +02:00 |
Martin Kroeker
|
db5328e85b
|
make array dimensions constant
|
2024-07-26 12:45:39 +02:00 |
Martin Kroeker
|
d9ae4609fb
|
remove C99 requirement
|
2024-07-26 11:15:33 +02:00 |
Martin Kroeker
|
a875304eb0
|
fix inverted conditional for NAN handling
|
2024-07-26 09:50:20 +02:00 |
Martin Kroeker
|
24acdd6bbb
|
correct offset
|
2024-07-26 09:49:24 +02:00 |
Martin Kroeker
|
fb7c53c5e5
|
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
|
2024-07-25 23:42:50 +02:00 |
Martin Kroeker
|
15c53dd2e0
|
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
|
2024-07-25 23:42:13 +02:00 |
Martin Kroeker
|
a4e56e0452
|
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
|
2024-07-25 21:50:04 +02:00 |
Martin Kroeker
|
949a7f9393
|
Merge pull request #4811 from yamazakimitsufumi/add_a64fx_to_dynamic_arch
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
|
2024-07-25 19:13:04 +02:00 |
yamazaki-mitsufumi
|
88caf02f62
|
Fix ambiguous error on Mac OS
|
2024-07-25 22:43:13 +09:00 |
Martin Kroeker
|
b613754143
|
Update scal..c
|
2024-07-24 14:31:29 +02:00 |
Martin Kroeker
|
4140ac45d7
|
Merge pull request #4813 from martin-frbg/issue4812
Fix incompatible definitions of MAXLOC in f2c-converted LAPACK sources
|
2024-07-23 21:35:06 +02:00 |
Martin Kroeker
|
0096482f03
|
fix incompatible definitions of MAXLOC
|
2024-07-23 15:01:26 +02:00 |
Martin Kroeker
|
ed82fd24fc
|
Merge pull request #4810 from martin-frbg/issue4805
Work around a gcc14.1 bug that breaks utest on Loongarch
|
2024-07-23 14:54:55 +02:00 |
yamazaki-mitsufumi
|
821ef34635
|
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
|
2024-07-23 20:44:39 +09:00 |
Martin Kroeker
|
29f3e759b9
|
work around a gcc14.1 bug observed on Loongarch
|
2024-07-23 11:20:48 +02:00 |
Martin Kroeker
|
f5d04318e3
|
Merge branch 'OpenMathLib:develop' into scalfixes
|
2024-07-21 13:43:43 +02:00 |
Martin Kroeker
|
73f8866ffb
|
make NAN handling depend on DUMMY2 parameter
|
2024-07-21 13:42:47 +02:00 |