Commit Graph

7948 Commits

Author SHA1 Message Date
Dmitry Mikushin 8698f9e37f Adding basic support of benchmarks into CMake for single, double, single complex and double complex cases. Each benchmarking target has a suffix to identify the data type, for example ./benchmark_gemm3m_COMPLEX_DOUBLE is a gemm3m.c source compiled with COMPLEX and DOUBLE macros defined 2024-02-10 19:12:16 +01:00
Martin Kroeker b1ae777afb
Merge pull request #4497 from sergei-lewis/dev/slewis/zaxpy
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
2024-02-09 16:22:00 +01:00
Sergei Lewis ff1523163f Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V. 2024-02-09 12:59:14 +00:00
Martin Kroeker ba3bfe85ee
Merge pull request #4495 from martin-frbg/update-gensymbol
Update gensymbol with recently added CBLAS interfaces and LAPACK/LAPACKE functions
2024-02-09 08:55:22 +01:00
Martin Kroeker 93872f4681
drop the ?laqz? symbols for now (not translatable by f2c) 2024-02-08 23:02:09 +01:00
Martin Kroeker 83bec51355
Update with recently added CBLAS interfaces and LAPACK/LAPACKE functions 2024-02-08 21:23:48 +01:00
Martin Kroeker 974f29c4e9
Merge pull request #4494 from ChipKerchner/fixPower10CPUID
Make sure CPU ID works for all POWER_10 conditions
2024-02-08 21:21:32 +01:00
Chip Kerchner d408ecedba Add environment variable to display coretype for dynamic arch. 2024-02-08 12:17:18 -06:00
Martin Kroeker a96a04ee61
Merge pull request #4493 from martin-frbg/issue4475-3
Fix incompatible pointer types in the declarations of C/ZAXPBY
2024-02-08 16:50:06 +01:00
Chip Kerchner ac6b4b7aa4 Make sure CPU ID works for all POWER_10 conditions 2024-02-08 08:56:30 -06:00
Martin Kroeker 500ac4de5e
fix incompatible pointer types 2024-02-08 13:18:34 +01:00
Martin Kroeker b3fa16345d
fix prototype for c/zaxpby 2024-02-08 13:15:34 +01:00
Martin Kroeker e9cfb7fd30
Merge pull request #4491 from martin-frbg/fixup-4488
fix sbgemm bfloat16 conversion errors introduced in PR 4488
2024-02-07 21:34:40 +01:00
Martin Kroeker e9f480111e
fix sbgemm bfloat16 conversion errors introduced in PR 4488 2024-02-07 19:57:18 +01:00
Martin Kroeker 22b487b622
Merge pull request #4488 from martin-frbg/issue4475-2
Separate the interface for SBGEMMT from GEMMT
2024-02-07 18:40:35 +01:00
Martin Kroeker 818bf30628
Merge pull request #4490 from ChipKerchner/missingCPUIDsForAIX
Add missing CPU ID definitions for old versions of AIX.
2024-02-07 17:31:26 +01:00
Martin Kroeker 344763331a
Merge pull request #4484 from martin-frbg/lapack981
Rescale input vector more often in C/ZLARFGP (Reference-LAPACK PR 981)
2024-02-07 15:22:48 +01:00
Chip Kerchner 08ce6b1c1c Add missing CPU ID definitions for old versions of AIX. 2024-02-07 07:54:06 -06:00
Martin Kroeker fb99fc2e6e
fix type conversion warnings 2024-02-07 13:42:08 +01:00
Martin Kroeker 08e479f956
Merge pull request #4487 from ErnstPeng/feature-branch
Optimized zgemm kernel 8x4 LASX, 4x4 LSX and cgemm kernel 8x4 LSX for LoongArch
2024-02-07 13:19:04 +01:00
Martin Kroeker d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments 2024-02-06 22:23:47 +01:00
pengxu fe3da43b7d Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch 2024-02-06 11:49:01 +08:00
Martin Kroeker e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
2024-02-05 15:50:16 +01:00
Martin Kroeker 479e4af089
Rescale input vector more often to minimize relative error (Reference-LAPACK PR 981) 2024-02-05 15:35:24 +01:00
Martin Kroeker a4fde2c5ac
Merge pull request #4451 from martin-frbg/overflow_reset
Reset "buffer management structure overflowed" state and free auxiliary struct on blas_shutdown
2024-02-05 07:27:04 +01:00
Martin Kroeker b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
2024-02-05 06:24:50 +01:00
Martin Kroeker bc7154a80d
Merge pull request #4482 from martin-frbg/issue4476
Fix missing NO_AVX2 fallback for SapphireRapids in DYNAMIC_ARCH
2024-02-04 23:13:10 +01:00
Martin Kroeker 6d8a273cca
Handle zero increment(s) in C910V ?AXPBY (#4483)
* Handle zero increment(s)
2024-02-04 22:07:51 +01:00
Martin Kroeker dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
2024-02-04 19:50:28 +01:00
Martin Kroeker dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
2024-02-04 18:51:09 +01:00
Martin Kroeker e307675222
Merge pull request #4478 from martin-frbg/issue4475
Fix incompatible pointer type in BFLOAT16 GEMMT
2024-02-04 16:36:40 +01:00
Martin Kroeker 033168cdf0
Merge pull request #4481 from martin-frbg/cpuid_riscv
Update lowercase cpunames for RISC-V
2024-02-04 14:09:44 +01:00
Martin Kroeker a29f91ae9a
Merge pull request #4471 from ChipKerchner/fixMakefileAIXOpenMP
Fix Makefiles to support OpenMP on AIX for xlc (clang) with xlf.
2024-02-04 12:13:26 +01:00
Martin Kroeker e61d96303d
Fix missing NO_AVX2 fallback for SapphireRapids 2024-02-04 10:05:20 +01:00
Martin Kroeker d02c61e82e
Update lowercase cpunames for RISC-V 2024-02-04 10:01:27 +01:00
Martin Kroeker 7228c708d7
Merge pull request #4461 from markdryan/cpuid_riscv64_crash
Fix two issues with cpuid_riscv64.c
2024-02-04 09:57:00 +01:00
gxw adde725321 LoongArch64: Fixed {s/d}amin LSX optimization 2024-02-04 14:44:47 +08:00
gxw 7bc93d95a1 LoongArch64: Opt {c/z}axpby 2024-02-04 11:23:31 +08:00
gxw 1e1f487dc7 LoongArch64: Fixed {s/d}axpby 2024-02-04 09:41:37 +08:00
gxw 3597827c93 utest: add axpby 2024-02-04 09:41:30 +08:00
Martin Kroeker 68d354814f
Fix incompatible pointer type in BFLOAT16 mode 2024-02-04 01:14:22 +01:00
Martin Kroeker 3848d4e9f4
Merge pull request #4477 from martin-frbg/c910caxpy
Temporarily disable the CAXPY/ZAXPY kernels for C910V to workaround a CI hang
2024-02-04 01:10:57 +01:00
Martin Kroeker 4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels 2024-02-04 01:05:03 +01:00
Martin Kroeker 27816fa929
Merge pull request #4472 from sergei-lewis/dev/slewis/merge-from-riscv
Merge risc-v branch to develop
2024-02-03 20:56:11 +01:00
Chip Kerchner 2bb7ea64a1 Only vectorize 64-bit version for Power8. 2024-02-01 08:11:43 -06:00
Sergei Lewis 3ffd6868d7 Merge branch 'develop' into dev/slewis/merge-from-riscv 2024-02-01 11:29:41 +00:00
Sergei Lewis a3b0ef6596 Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling 2024-02-01 10:32:00 +00:00
Martin Kroeker ec74dcd213
Merge pull request #4470 from martin-frbg/issue4455
Add CBLAS interfaces for BLAS extensions ?AMIN/?AMAX and C/ZAXPYC
2024-01-31 23:51:01 +01:00
Chip Kerchner 61c8e19f95 Fix Makefile to support OpenMP on AIX for xlc (clang) with xlf. 2024-01-31 15:27:50 -06:00
Martin Kroeker 47bd064763
Fix names in build rules 2024-01-31 20:49:43 +01:00