Commit Graph

8574 Commits

Author SHA1 Message Date
Martin Kroeker 9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
2024-07-31 23:18:01 +02:00
Martin Kroeker edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments (#4829)
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
2024-07-31 19:45:15 +02:00
Chris Sidebottom ba2e989c67 Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
2024-07-31 13:09:14 +01:00
Chris Sidebottom b26424c6a2 Allow opt into GEMM -> GEMV forwarding 2024-07-31 13:09:14 +01:00
Chris Sidebottom 90eb863d4b Re-add accidental removal 2024-07-31 13:09:14 +01:00
Chris Sidebottom 28b5334f22 Complete implementation of GEMV forwarding 2024-07-31 13:09:14 +01:00
Martin Kroeker 3db5dbc88e forward to GEMV when one argument is actually a vector 2024-07-31 13:09:14 +01:00
Martin Kroeker 136a4edc5f
Merge pull request #4830 from martin-frbg/jenk
Continue requesting ubuntu18 instead of latest on OSUOSL powerCI
2024-07-30 22:19:14 +02:00
Martin Kroeker 86c15f028b
Update Jenkinsfile.pwr 2024-07-30 21:21:34 +02:00
Martin Kroeker a13015b656
try requesting ubuntu22 instead of latest 2024-07-30 19:10:18 +02:00
Martin Kroeker d11e734002
Merge pull request #4827 from Mousius/a64fx-gcc11
Fix GCC11 check for A64FX target
2024-07-29 16:36:13 +02:00
Chris Sidebottom 54ce33e851 Fix GCC11 check for A64FX target 2024-07-29 15:28:59 +01:00
Martin Kroeker 6d071f1a1c
Merge pull request #4826 from Mousius/a64fx-fallback
Add fallback compile options for A64FX target
2024-07-29 15:33:43 +02:00
Chris Sidebottom 3ed226d3f8 Re-add ISCLANG filter 2024-07-29 11:32:59 +01:00
Chris Sidebottom 85ca003ae7 Add fallback compile options for A64FX target 2024-07-29 11:25:03 +01:00
Martin Kroeker 05bf35f296
Merge pull request #4822 from martin-frbg/issue4821
Fix c_check to tolerate a dashed suffix to the gcc version number
2024-07-27 20:28:06 +02:00
Martin Kroeker 175008caf8
harden against a dashed suffix to the gcc version number 2024-07-27 19:08:02 +02:00
Martin Kroeker 886acfc444
Merge pull request #4819 from martin-frbg/issue4776
Re-enable the SGESDD benchmark after the SCAL fixes
2024-07-26 16:57:35 +02:00
Martin Kroeker 4460d3ee7f
re-enable the sgesdd benchmark 2024-07-26 15:07:52 +02:00
Martin Kroeker 092986582f
Merge pull request #4818 from martin-frbg/docs_winbuild
[Docs] replace "Preview" in the MSVC vcvarsall path example
2024-07-26 14:57:53 +02:00
Martin Kroeker 25e148ec58
Merge pull request #4817 from martin-frbg/fix4807
Fix SCAL on x86 and RISCV_GENERIC
2024-07-26 14:56:44 +02:00
Martin Kroeker a090011fbf
just use numeric constants in dimensions 2024-07-26 12:56:12 +02:00
Martin Kroeker 7006492863
replace "Preview" in the MSVC vcvarsall path with "Community" 2024-07-26 12:49:57 +02:00
Martin Kroeker db5328e85b
make array dimensions constant 2024-07-26 12:45:39 +02:00
Martin Kroeker d9ae4609fb
remove C99 requirement 2024-07-26 11:15:33 +02:00
Martin Kroeker a875304eb0
fix inverted conditional for NAN handling 2024-07-26 09:50:20 +02:00
Martin Kroeker 24acdd6bbb
correct offset 2024-07-26 09:49:24 +02:00
Martin Kroeker fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
2024-07-25 23:42:50 +02:00
Martin Kroeker 15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
2024-07-25 23:42:13 +02:00
Martin Kroeker a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
2024-07-25 21:50:04 +02:00
Martin Kroeker 949a7f9393
Merge pull request #4811 from yamazakimitsufumi/add_a64fx_to_dynamic_arch
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
2024-07-25 19:13:04 +02:00
yamazaki-mitsufumi 88caf02f62 Fix ambiguous error on Mac OS 2024-07-25 22:43:13 +09:00
Martin Kroeker b613754143
Update scal..c 2024-07-24 14:31:29 +02:00
Martin Kroeker 4140ac45d7
Merge pull request #4813 from martin-frbg/issue4812
Fix incompatible definitions of MAXLOC in f2c-converted LAPACK sources
2024-07-23 21:35:06 +02:00
Martin Kroeker 0096482f03
fix incompatible definitions of MAXLOC 2024-07-23 15:01:26 +02:00
Martin Kroeker ed82fd24fc
Merge pull request #4810 from martin-frbg/issue4805
Work around a gcc14.1 bug that breaks utest on Loongarch
2024-07-23 14:54:55 +02:00
yamazaki-mitsufumi 821ef34635 Add A64FX to the list of CPUs supported by DYNAMIC_ARCH 2024-07-23 20:44:39 +09:00
Martin Kroeker 29f3e759b9
work around a gcc14.1 bug observed on Loongarch 2024-07-23 11:20:48 +02:00
Martin Kroeker f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes 2024-07-21 13:43:43 +02:00
Martin Kroeker 73f8866ffb
make NAN handling depend on DUMMY2 parameter 2024-07-21 13:42:47 +02:00
Martin Kroeker dfbc2348a8
fix NAN handling 2024-07-20 18:27:15 +02:00
Martin Kroeker c064319ecb
fix alpha=NAN case 2024-07-20 17:42:31 +02:00
Martin Kroeker c2ffd90e8c
make NAN handling depend on dummy2 parameter 2024-07-20 17:31:00 +02:00
Chris Sidebottom ea4ab3b310 Better header guard around bridge 2024-07-20 14:39:57 +01:00
Chris Sidebottom 7311d93016 Unroll TT further 2024-07-19 17:51:20 +01:00
Martin Kroeker a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
2024-07-19 17:12:07 +02:00
Martin Kroeker dd6c33d34d
make NAN handling depend on dummy2 parameter 2024-07-19 16:14:55 +02:00
Martin Kroeker 5a845ef1f4
Merge pull request #4809 from penghongbo/reorder_gemm_gemvt
Change computational order in GEMV and GEMM Power6 kernel
2024-07-19 13:06:42 +02:00
Hong Bo Peng db98f8753f Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
  2. Remove the FADD insn from GEMM and ZGEMM code.
  3. Reorder the compution of the Imaginary part in ZGEMM code.
2024-07-19 02:08:19 -04:00
Chris Sidebottom a9edddb695 Unroll TN further 2024-07-18 20:04:15 +01:00