Martin Kroeker
2787c9f8e4
Disable GEMM3M for generic targets (not implemented)
2024-06-06 14:39:50 +02:00
Martin Kroeker
4e9144b39f
Update .cirrus.yml ( #4735 )
...
* Update versions (and paths) of XCode, and update FreeBSD version
2024-06-05 23:43:52 +02:00
Martin Kroeker
0c2ac76a63
Merge pull request #4734 from XiWeiGu/loongarch64_small_matrix
...
LoongArch: DGEMM small matrix opt
2024-06-05 19:13:17 +02:00
Martin Kroeker
913be34bf0
Merge pull request #4733 from martin-frbg/issue4719
...
Drop the -static Fortran flag from generic RISCV builds as it breaks OpenMP
2024-06-05 00:11:09 +02:00
gxw
8ab2e9ec65
LoongArch: DGEMM small matrix opt
2024-06-04 16:52:45 +08:00
Martin Kroeker
df87aeb5a2
Drop the -static Fortran flag from generic builds as it breaks OpenMP
2024-06-04 09:49:18 +02:00
Martin Kroeker
3a3ff1ba5e
Merge pull request #4732 from martin-frbg/issue4731
...
fix conflicting types for cblas_sbgemm_batch
2024-06-03 21:18:28 +02:00
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
2024-06-03 00:22:16 +02:00
Martin Kroeker
a9fae32a33
Merge pull request #4730 from jake-arkinstall/develop
...
Updated CONTRIBUTORS.md
2024-06-01 13:38:04 +02:00
Jake Arkinstall
44004178aa
Updated CONTRIBUTORS.md
...
As requested on X (https://x.com/KroekerMartin/status/1755218919290278185 )
2024-06-01 11:22:26 +01:00
Martin Kroeker
83bc8d5dd8
Merge pull request #4712 from RajalakshmiSR/zscalp10
...
POWER: Fix issues in zscal to address lapack failures
2024-06-01 11:22:08 +02:00
Martin Kroeker
56bd57ca99
Merge pull request #4720 from martin-frbg/issue3039
...
Resurrect and complete cblas_?gemm_batch
2024-06-01 00:34:32 +02:00
Martin Kroeker
6b564d53fd
Merge pull request #4727 from martin-frbg/issue4726
...
Fix another corner case of infinity handling in x86_64 ZSCAL
2024-05-31 19:44:33 +02:00
Martin Kroeker
db070a9223
add gemm_batch drivers
2024-05-31 18:29:27 +02:00
Martin Kroeker
076766df4e
Update CMakeLists.txt
2024-05-31 18:23:18 +02:00
Martin Kroeker
8c05765a5a
fix other corner cases where x=INF
2024-05-31 18:06:36 +02:00
Martin Kroeker
516743f7dc
fix other instances of mishandling INF
2024-05-31 16:02:12 +02:00
Martin Kroeker
9ff4e9714e
additional fixes for handling INF arguments
2024-05-31 15:44:07 +02:00
Martin Kroeker
ce130f11d2
Update zscal.c
2024-05-31 15:09:03 +02:00
Martin Kroeker
ab13cfef93
more fixes for infinite x
2024-05-31 14:34:49 +02:00
Martin Kroeker
a16f8249ba
add tests with the imaginary part of the array infinite
2024-05-31 01:08:17 +02:00
Martin Kroeker
ad2b5c67c8
fix another corner case involving infinity
2024-05-31 01:06:58 +02:00
Martin Kroeker
0d007adb18
fix clang_cl-flang job to use flang-new after the llvm update
2024-05-30 23:30:16 +02:00
Martin Kroeker
b9a1c9a06c
Merge pull request #4725 from Neumann-A/patch-1
...
Fix CMake warning
2024-05-30 21:32:32 +02:00
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
2024-05-30 18:26:02 +02:00
Alexander Neumann
dd4505c5dd
Fix CMake warning
2024-05-30 09:04:23 +02:00
Martin Kroeker
362a063396
remove return value
2024-05-29 23:16:58 +02:00
Martin Kroeker
d0794f88dc
add gemm_batch driver
2024-05-29 15:49:20 +02:00
Martin Kroeker
833a8880c6
add cblas_?gemm_batch
2024-05-29 15:47:50 +02:00
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
2024-05-29 15:47:02 +02:00
Martin Kroeker
103637887e
add cblas_?gemm_batch
2024-05-29 15:46:10 +02:00
Martin Kroeker
0073affe63
Merge pull request #4693 from goplanid/locks-improvement
...
Lock Management Improvements for Memory Allocation Efficiency
2024-05-26 23:14:52 +02:00
Martin Kroeker
834e633d79
Merge pull request #4718 from martin-frbg/issue4713
...
Override Intel icx's default fp-model to ensure correct handling on NaNs
2024-05-26 16:38:18 +02:00
Martin Kroeker
3833190454
Merge pull request #4716 from martin-frbg/lapack1018
...
Fix a potential bounds error in ?UNHR_COL/?ORHR_COL (Reference-LAPACK PR 1018)
2024-05-26 14:01:31 +02:00
Martin Kroeker
cf7e668fe8
Merge pull request #4709 from martin-frbg/docsbuildbranch
...
Don't try to deploy docs when PR-building in a fork
2024-05-26 14:01:05 +02:00
Martin Kroeker
8b4996a2d5
Override icx's default fast math mode to ensure correct NaN handling
2024-05-26 13:16:03 +02:00
Martin Kroeker
616cc28d82
Override icx's default fast math mode to ensure correct NaN handling
2024-05-26 12:59:11 +02:00
Martin Kroeker
772116879d
Merge pull request #4717 from bartoldeman/zscal-float-inf-fix
...
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
2024-05-26 12:26:41 +02:00
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
...
Commit def4996
fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
2024-05-24 17:20:27 +00:00
Martin Kroeker
7ebbe3cc72
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:12:19 +02:00
Martin Kroeker
791e015024
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:11:14 +02:00
Martin Kroeker
4dd715d220
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:09:55 +02:00
Martin Kroeker
e2c1a1e269
Fix potential bounds error (Reference-LAPACK PR 1018)
2024-05-23 23:08:27 +02:00
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
...
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
2024-05-22 08:00:06 -05:00
Martin Kroeker
172d91846f
Don't try to deploy docs in a fork
2024-05-20 22:53:43 +02:00
Martin Kroeker
700ea74a37
Merge pull request #4705 from martin-frbg/issue4703
...
Fix INTERFACE64 builds on Loongarch64
2024-05-18 21:38:22 +02:00
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
...
Fix regression SAXPY when compiler with OpenXL compiler.
2024-05-18 19:11:25 +02:00
Martin Kroeker
25b34e67f9
Merge pull request #4678 from ev-br/codspeed
...
WIP: add codspeed benchmarks [skip cirrus]
2024-05-18 16:51:02 +02:00
Martin Kroeker
6494f432df
Fix INTERFACE64 builds on Loongarch64
2024-05-18 16:49:03 +02:00
Evgeni Burovski
81cf0db047
DOC: add a readme for benchmarks/pybench
2024-05-18 15:30:00 +03:00