OpenBLAS

Commit Graph

Author	SHA1	Message	Date
kseniyazaytseva	ff10e6b6dc	Fix zero step tests	2024-02-08 00:19:54 +03:00
kseniyazaytseva	b6949ce74c	add axpyc to cmake build	2024-02-02 14:42:27 +03:00
kseniyazaytseva	441339104f	fix test ext cmake build	2024-02-02 13:49:39 +03:00
kseniyazaytseva	f68e9989c4	Remove zero rows/columns matcopy tests	2024-02-02 12:26:23 +03:00
Andrey Sokolov	c99e231fc5	Fix rand_generate	2024-01-18 23:56:22 +03:00
kseniyazaytseva	bf39c0d8b5	Added new tests for BLAS-like and BLAS API in utest	2024-01-18 23:56:22 +03:00
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	2024-01-17 15:19:37 +01:00
Martin Kroeker	e3508d3713	Merge pull request #4439 from sergei-lewis/risc-v Fix builds with t-head toolchains that use old intrinsics spec	2024-01-16 20:35:12 +01:00
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	2024-01-16 14:33:08 +00:00
Martin Kroeker	1332f8a822	Merge pull request #4159 from OMaghiarIMG/risc-v-tail-policy Set tail policy to undisturbed for RVV intrinsics accumulators	2023-12-08 10:25:41 +01:00
Martin Kroeker	2d316c2920	Merge pull request #4125 from OMaghiarIMG/risc-v Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels	2023-12-07 14:50:58 +01:00
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	2023-12-04 11:13:35 +00:00
Octavian Maghiar	826a9d5fa4	Adds tail undisturbed for RVV Level 2 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-25 11:36:23 +01:00
Octavian Maghiar	8df0289db6	Adds tail undisturbed for RVV Level 1 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-20 15:28:35 +01:00
Octavian Maghiar	1e4a3a2b5e	Fixes RVV masked intrinsics for izamax/izamin kernels	2023-07-12 12:55:50 +01:00
Octavian Maghiar	e1958eb705	Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.	2023-07-05 11:34:00 +01:00
Martin Kroeker	62f0f506ec	Merge pull request #4049 from sh-zheng/risc-v Add rvv support for zsymv and active rvv support for zhemv	2023-06-09 19:08:00 +02:00
ZhengSh	2a8bc38cdc	Merge branch 'xianyi:risc-v' into risc-v	2023-06-09 20:01:03 +08:00
Martin Kroeker	5147831f25	Merge pull request #4074 from HellerZheng/risc-v fix wrong vr = VFMVVF_FLOAT(0, vl); in symv_L_rvv.c and symv_U_rvv.c	2023-06-06 14:55:32 +02:00
Heller Zheng	0954746380	remove argument unused during compilation. fix wrong vr = VFMVVF_FLOAT(0, vl);	2023-06-04 20:06:58 -07:00
sh-zheng	d3bf5a5401	Combine two reduction operations of zhe/symv into one, with tail undisturbed setted.	2023-05-22 22:39:45 +08:00
sh-zheng	18d7afe69d	Add rvv support for zsymv and active rvv support for zhemv	2023-05-20 01:19:44 +08:00
Zhang Xianyi	30222d0832	Merge pull request #3971 from HellerZheng/risc-v RISC-V for new intrinsic API changes	2023-04-01 12:43:43 +08:00
Heller Zheng	6b74bee2f9	Update TARGET=x280 description.	2023-03-27 18:59:24 -07:00
Heller Zheng	1374a2d08b	This PR adapts latest spec changes Add prefix (_riscv) for all riscv intrinsics Update some intrinsics' parameter, like vfredxxxx, vmerge	2023-03-19 23:59:03 -07:00
Zhang Xianyi	19f17c8bc6	Merge pull request #3893 from HellerZheng/develop add riscv level3 C,Z kernel functions.	2023-03-15 10:17:13 +08:00
Zhang Xianyi	20511dfa65	Merge pull request #3919 from sergei-lewis/risc-v-latest-rvv-intrinsics update riscv intrinsics for latest spec	2023-03-15 10:16:19 +08:00
Sergei Lewis	9b61be4545	factoring riscv64/dot.c fix into separate PR as requested	2023-03-01 17:40:42 +00:00
Sergei Lewis	2406958629	* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls	2023-02-24 10:45:03 +00:00
Heller Zheng	63cf4d0166	add riscv level3 C,Z kernel functions.	2023-02-01 19:13:44 -08:00
Xianyi Zhang	c19dff0a31	Fix T-Head RVV intrinsic API changes.	2023-01-25 19:33:32 +08:00
Xianyi Zhang	d9993e21a2	Refs #3825 Merge branch 'HellerZheng-develop' into risc-v	2022-12-03 12:01:29 +08:00
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2022-12-03 12:00:52 +08:00
Xianyi Zhang	e284c048df	Merge branch 'develop' into risc-v	2022-12-03 11:56:55 +08:00
Martin Kroeker	0a24f631e9	Merge pull request #3844 from Mousius/switch-ratio-16 Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs	2022-12-02 12:48:43 +01:00
Martin Kroeker	65984fbe68	Merge pull request #3847 from bartoldeman/scal-benchmark scal benchmark: eliminate y, move init/timing out of loop	2022-12-02 11:51:50 +01:00
Martin Kroeker	f6f0d13b9f	Merge pull request #3842 from Mousius/sve-dot Add SVE implementation for sdot/ddot	2022-12-02 08:30:51 +01:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
Martin Kroeker	b6a4ef98b9	Merge pull request #3845 from Mousius/asimd-dot-opt Remove unnecessary instructions from Advanced SIMD dot	2022-11-30 21:07:30 +01:00
Chris Sidebottom	2fb096315e	Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs From testing this yields better results than the default of `2`.	2022-11-30 09:35:38 +00:00
Bart Oldeman	bae45d94d1	scal benchmark: eliminate y, move init/timing out of loop Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.	2022-11-29 08:02:45 -05:00
Heller Zheng	387e8970cd	Fix merge problem; Update compiling COMMON_OPT per review comments.	2022-11-28 21:42:29 -08:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00
Martin Kroeker	e9a911fb9f	Merge pull request #3841 from martin-frbg/lapack755+764 Fix SLATRS3 and CLATRS3 tests in TESTING/LIN (Reference-LAPACK PRs 755+764)	2022-11-23 22:38:06 +01:00
Martin Kroeker	bf0e8d67b5	Merge pull request #3840 from martin-frbg/lapack760 Fix typo in EIG tests and spurious return in lapacke_?tz_trans utility (Reference-LAPACK PR760)	2022-11-23 19:16:25 +01:00
Martin Kroeker	a5470521ee	Fix array indexation in copy, and fix test (Reference-LAPACK PR764)	2022-11-23 15:31:25 +01:00
Martin Kroeker	b0393ea4e1	Fix test (Reference-LAPACK PR764)	2022-11-23 15:27:46 +01:00
Martin Kroeker	0d26f1a4c7	Fix wrong indexation in test (Reference-LAPACK PR755)	2022-11-23 15:22:27 +01:00
Martin Kroeker	19fd2d7f00	Use LSAME for character comparison (Reference-LAPACK PR755)	2022-11-23 15:19:07 +01:00

1 2 3 4 5 ...

6868 Commits All Branches Search

6868 Commits

All Branches