1953 Commits

Author SHA1 Message Date
Martin Kroeker
889c5d026a Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
2024-01-26 13:31:09 +01:00
Martin Kroeker
4e2a32ff51 Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
2024-01-26 11:40:46 +01:00
Martin Kroeker
a21b2fa5e4 Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
2024-01-24 17:52:25 +01:00
Andrey Sokolov
9c49a81d54 Resolve conflicts 2024-01-23 19:08:53 +03:00
kseniyazaytseva
e1afb23811 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2024-01-23 19:01:31 +03:00
Octavian Maghiar
deecfb1a39 Merge branch 'risc-v' into img-riscv64-zvl128b 2024-01-19 12:26:38 +00:00
kseniyazaytseva
5222b5fc18 Added axpby kernels for GENERIC RISC-V target 2024-01-18 23:22:26 +03:00
kseniyazaytseva
ff41cf5c49 Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2024-01-18 23:19:52 +03:00
kseniyazaytseva
b193ea3d7b Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
2024-01-18 22:14:32 +03:00
Martin Kroeker
88e994116c Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Sergei Lewis
9edb805e64 fix builds with t-head toolchains that use old versions of the intrinsics spec 2024-01-16 14:33:08 +00:00
Octavian Maghiar
4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar
e4586e81b8 [RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Octavian Maghiar
826a9d5fa4 Adds tail undisturbed for RVV Level 2 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-25 11:36:23 +01:00
Octavian Maghiar
8df0289db6 Adds tail undisturbed for RVV Level 1 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-20 15:28:35 +01:00
Octavian Maghiar
1e4a3a2b5e Fixes RVV masked intrinsics for izamax/izamin kernels 2023-07-12 12:55:50 +01:00
Octavian Maghiar
e1958eb705 Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
2023-07-05 11:34:00 +01:00
ZhengSh
2a8bc38cdc Merge branch 'xianyi:risc-v' into risc-v 2023-06-09 20:01:03 +08:00
Heller Zheng
0954746380 remove argument unused during compilation.
fix wrong vr = VFMVVF_FLOAT(0, vl);
2023-06-04 20:06:58 -07:00
sh-zheng
d3bf5a5401 Combine two reduction operations of zhe/symv into one, with tail undisturbed setted. 2023-05-22 22:39:45 +08:00
sh-zheng
18d7afe69d Add rvv support for zsymv and active rvv support for zhemv 2023-05-20 01:19:44 +08:00
Heller Zheng
1374a2d08b This PR adapts latest spec changes
Add prefix (_riscv) for all riscv intrinsics
Update some intrinsics' parameter, like vfredxxxx, vmerge
2023-03-19 23:59:03 -07:00
Zhang Xianyi
19f17c8bc6 Merge pull request #3893 from HellerZheng/develop
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Sergei Lewis
9b61be4545 factoring riscv64/dot.c fix into separate PR as requested 2023-03-01 17:40:42 +00:00
Sergei Lewis
2406958629 * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Heller Zheng
63cf4d0166 add riscv level3 C,Z kernel functions. 2023-02-01 19:13:44 -08:00
Xianyi Zhang
c19dff0a31 Fix T-Head RVV intrinsic API changes. 2023-01-25 19:33:32 +08:00
Xianyi Zhang
e5313f53d5 Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop 2022-12-03 12:00:52 +08:00
Chris Sidebottom
eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom
fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
Chris Sidebottom
4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Heller Zheng
3918d8504e nrm2 simple optimization 2022-11-21 19:06:07 -08:00
HellerZheng
943372bdf5 Merge branch 'develop' into develop 2022-11-18 10:12:46 +08:00
Martin Kroeker
f73cfb7e2c change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
Martin Kroeker
1688c7da43 change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Heller Zheng
5d0d1c5551 Remove redundant files 2022-11-15 18:22:21 -08:00
Heller Zheng
bef47917bd Initial version for riscv sifive x280 2022-11-15 00:06:25 -08:00
Bart Oldeman
6c1043eb41 Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.

Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker
c9d78dc3b2 Remove excess initializer (leftover from rework of PR 3793) 2022-10-31 16:57:03 +01:00
Martin Kroeker
65338a9493 Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
2022-10-30 18:56:10 +01:00
Honglin Zhu
79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Bart Oldeman
e7e3aa2948 x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.

Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.

This uses the same syntax as used in 22aa81f for zarch (s390x).
2022-10-27 18:16:43 -04:00
Honglin Zhu
4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu
843e9fd0b9 Fix typo error 2022-10-26 17:06:33 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
f6f35a4288 fix copyobj declarations to work with DYNAMIC_ARCH 2022-09-29 08:47:14 +02:00
Martin Kroeker
b1d69fb3ac Add MIPS64_GENERIC as a copy of GENERIC 2022-09-17 23:52:32 +02:00
gxw
edea1bcfaf MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500 2022-09-17 16:43:22 +08:00
Martin Kroeker
101a2c77c3 Fix warnings 2022-09-15 09:19:19 +02:00
Martin Kroeker
23d59baaf1 Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well 2022-09-13 22:39:27 +02:00