Commit Graph

6869 Commits

Author SHA1 Message Date
kseniyazaytseva
ec5cfe3bc8 Fix invalid tests 2024-02-08 00:21:38 +03:00
kseniyazaytseva
ff10e6b6dc Fix zero step tests 2024-02-08 00:19:54 +03:00
kseniyazaytseva
b6949ce74c add axpyc to cmake build 2024-02-02 14:42:27 +03:00
kseniyazaytseva
441339104f fix test ext cmake build 2024-02-02 13:49:39 +03:00
kseniyazaytseva
f68e9989c4 Remove zero rows/columns matcopy tests 2024-02-02 12:26:23 +03:00
Andrey Sokolov
c99e231fc5 Fix rand_generate 2024-01-18 23:56:22 +03:00
kseniyazaytseva
bf39c0d8b5 Added new tests for BLAS-like and BLAS API in utest 2024-01-18 23:56:22 +03:00
Martin Kroeker
88e994116c Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
2024-01-17 15:19:37 +01:00
Martin Kroeker
e3508d3713 Merge pull request #4439 from sergei-lewis/risc-v
Fix builds with t-head toolchains that use old intrinsics spec
2024-01-16 20:35:12 +01:00
Sergei Lewis
9edb805e64 fix builds with t-head toolchains that use old versions of the intrinsics spec 2024-01-16 14:33:08 +00:00
Martin Kroeker
1332f8a822 Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy
Set tail policy to undisturbed for RVV intrinsics accumulators
2023-12-08 10:25:41 +01:00
Martin Kroeker
2d316c2920 Merge pull request #4125 from OMaghiarIMG/risc-v
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
2023-12-07 14:50:58 +01:00
Octavian Maghiar
4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar
826a9d5fa4 Adds tail undisturbed for RVV Level 2 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-25 11:36:23 +01:00
Octavian Maghiar
8df0289db6 Adds tail undisturbed for RVV Level 1 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-20 15:28:35 +01:00
Octavian Maghiar
1e4a3a2b5e Fixes RVV masked intrinsics for izamax/izamin kernels 2023-07-12 12:55:50 +01:00
Octavian Maghiar
e1958eb705 Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
2023-07-05 11:34:00 +01:00
Martin Kroeker
62f0f506ec Merge pull request #4049 from sh-zheng/risc-v
Add rvv support for zsymv and active rvv support for zhemv
2023-06-09 19:08:00 +02:00
ZhengSh
2a8bc38cdc Merge branch 'xianyi:risc-v' into risc-v 2023-06-09 20:01:03 +08:00
Martin Kroeker
5147831f25 Merge pull request #4074 from HellerZheng/risc-v
fix wrong vr = VFMVVF_FLOAT(0, vl); in symv_L_rvv.c and symv_U_rvv.c
2023-06-06 14:55:32 +02:00
Heller Zheng
0954746380 remove argument unused during compilation.
fix wrong vr = VFMVVF_FLOAT(0, vl);
2023-06-04 20:06:58 -07:00
sh-zheng
d3bf5a5401 Combine two reduction operations of zhe/symv into one, with tail undisturbed setted. 2023-05-22 22:39:45 +08:00
sh-zheng
18d7afe69d Add rvv support for zsymv and active rvv support for zhemv 2023-05-20 01:19:44 +08:00
Zhang Xianyi
30222d0832 Merge pull request #3971 from HellerZheng/risc-v
RISC-V for new intrinsic API changes
2023-04-01 12:43:43 +08:00
Heller Zheng
6b74bee2f9 Update TARGET=x280 description. 2023-03-27 18:59:24 -07:00
Heller Zheng
1374a2d08b This PR adapts latest spec changes
Add prefix (_riscv) for all riscv intrinsics
Update some intrinsics' parameter, like vfredxxxx, vmerge
2023-03-19 23:59:03 -07:00
Zhang Xianyi
19f17c8bc6 Merge pull request #3893 from HellerZheng/develop
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Zhang Xianyi
20511dfa65 Merge pull request #3919 from sergei-lewis/risc-v-latest-rvv-intrinsics
update riscv intrinsics for latest spec
2023-03-15 10:16:19 +08:00
Sergei Lewis
9b61be4545 factoring riscv64/dot.c fix into separate PR as requested 2023-03-01 17:40:42 +00:00
Sergei Lewis
2406958629 * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Heller Zheng
63cf4d0166 add riscv level3 C,Z kernel functions. 2023-02-01 19:13:44 -08:00
Xianyi Zhang
c19dff0a31 Fix T-Head RVV intrinsic API changes. 2023-01-25 19:33:32 +08:00
Xianyi Zhang
d9993e21a2 Refs #3825 Merge branch 'HellerZheng-develop' into risc-v 2022-12-03 12:01:29 +08:00
Xianyi Zhang
e5313f53d5 Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop 2022-12-03 12:00:52 +08:00
Xianyi Zhang
e284c048df Merge branch 'develop' into risc-v 2022-12-03 11:56:55 +08:00
Martin Kroeker
0a24f631e9 Merge pull request #3844 from Mousius/switch-ratio-16
Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
2022-12-02 12:48:43 +01:00
Martin Kroeker
65984fbe68 Merge pull request #3847 from bartoldeman/scal-benchmark
scal benchmark: eliminate y, move init/timing out of loop
2022-12-02 11:51:50 +01:00
Martin Kroeker
f6f0d13b9f Merge pull request #3842 from Mousius/sve-dot
Add SVE implementation for sdot/ddot
2022-12-02 08:30:51 +01:00
Chris Sidebottom
eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom
fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
Martin Kroeker
b6a4ef98b9 Merge pull request #3845 from Mousius/asimd-dot-opt
Remove unnecessary instructions from Advanced SIMD dot
2022-11-30 21:07:30 +01:00
Chris Sidebottom
2fb096315e Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs
From testing this yields better results than the default of `2`.
2022-11-30 09:35:38 +00:00
Bart Oldeman
bae45d94d1 scal benchmark: eliminate y, move init/timing out of loop
Removing y avoids cache effects (if y is the size of the L1 cache, the
main array x is removed from it).
Moving init and timing out of the loop makes the scal benchmark behave like
the gemm benchmark, and allows higher accuracy for smaller test cases since
the loop overhead is much smaller than the timing overhead.

Example:
OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024
on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core.

Before
From : 1024  To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000
   SIZE       Flops
   1024 :     5627.08 MFlops   0.000000 sec
   2048 :     5907.34 MFlops   0.000000 sec
   3072 :     5553.30 MFlops   0.000001 sec
   4096 :     5446.38 MFlops   0.000001 sec
   5120 :     5504.61 MFlops   0.000001 sec
   6144 :     5501.80 MFlops   0.000001 sec
   7168 :     5547.43 MFlops   0.000001 sec
   8192 :     5548.46 MFlops   0.000001 sec

After
From : 1024  To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000
   SIZE       Flops
   1024 :     6310.28 MFlops   0.000000 sec
   2048 :     6396.29 MFlops   0.000000 sec
   3072 :     6439.14 MFlops   0.000000 sec
   4096 :     6327.14 MFlops   0.000001 sec
   5120 :     5628.24 MFlops   0.000001 sec
   6144 :     5616.41 MFlops   0.000001 sec
   7168 :     5553.13 MFlops   0.000001 sec
   8192 :     5600.88 MFlops   0.000001 sec

We can see the L1->L2 switchover point is now where it should be, and the
number of flops for L1 is more accurate.
2022-11-29 08:02:45 -05:00
Heller Zheng
387e8970cd Fix merge problem; Update compiling COMMON_OPT per review comments. 2022-11-28 21:42:29 -08:00
Chris Sidebottom
4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Martin Kroeker
e9a911fb9f Merge pull request #3841 from martin-frbg/lapack755+764
Fix SLATRS3 and CLATRS3 tests in TESTING/LIN (Reference-LAPACK PRs 755+764)
2022-11-23 22:38:06 +01:00
Martin Kroeker
bf0e8d67b5 Merge pull request #3840 from martin-frbg/lapack760
Fix typo in EIG tests and spurious return in lapacke_?tz_trans utility (Reference-LAPACK PR760)
2022-11-23 19:16:25 +01:00
Martin Kroeker
a5470521ee Fix array indexation in copy, and fix test (Reference-LAPACK PR764) 2022-11-23 15:31:25 +01:00
Martin Kroeker
b0393ea4e1 Fix test (Reference-LAPACK PR764) 2022-11-23 15:27:46 +01:00
Martin Kroeker
0d26f1a4c7 Fix wrong indexation in test (Reference-LAPACK PR755) 2022-11-23 15:22:27 +01:00