OpenBLAS

Author	SHA1	Message	Date
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Martin Kroeker	62f0f506ec	Merge pull request #4049 from sh-zheng/risc-v Add rvv support for zsymv and active rvv support for zhemv	2023-06-09 19:08:00 +02:00
ZhengSh	2a8bc38cdc	Merge branch 'xianyi:risc-v' into risc-v	2023-06-09 20:01:03 +08:00
Martin Kroeker	5147831f25	Merge pull request #4074 from HellerZheng/risc-v fix wrong vr = VFMVVF_FLOAT(0, vl); in symv_L_rvv.c and symv_U_rvv.c	2023-06-06 14:55:32 +02:00
Heller Zheng	0954746380	remove argument unused during compilation. fix wrong vr = VFMVVF_FLOAT(0, vl);	2023-06-04 20:06:58 -07:00
sh-zheng	d3bf5a5401	Combine two reduction operations of zhe/symv into one, with tail undisturbed setted.	2023-05-22 22:39:45 +08:00
sh-zheng	18d7afe69d	Add rvv support for zsymv and active rvv support for zhemv	2023-05-20 01:19:44 +08:00
Zhang Xianyi	30222d0832	Merge pull request #3971 from HellerZheng/risc-v RISC-V for new intrinsic API changes	2023-04-01 12:43:43 +08:00
Heller Zheng	6b74bee2f9	Update TARGET=x280 description.	2023-03-27 18:59:24 -07:00
Heller Zheng	1374a2d08b	This PR adapts latest spec changes Add prefix (_riscv) for all riscv intrinsics Update some intrinsics' parameter, like vfredxxxx, vmerge	2023-03-19 23:59:03 -07:00
Zhang Xianyi	19f17c8bc6	Merge pull request #3893 from HellerZheng/develop add riscv level3 C,Z kernel functions.	2023-03-15 10:17:13 +08:00
Zhang Xianyi	20511dfa65	Merge pull request #3919 from sergei-lewis/risc-v-latest-rvv-intrinsics update riscv intrinsics for latest spec	2023-03-15 10:16:19 +08:00
Sergei Lewis	9b61be4545	factoring riscv64/dot.c fix into separate PR as requested	2023-03-01 17:40:42 +00:00
Sergei Lewis	2406958629	* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics) * fix multiple numerical stability and corner case issues * add a script to generate arbitrary gemm kernel shapes * add a generic zvl256b target to demonstrate large gemm kernel unrolls	2023-02-24 10:45:03 +00:00
Heller Zheng	63cf4d0166	add riscv level3 C,Z kernel functions.	2023-02-01 19:13:44 -08:00
Xianyi Zhang	c19dff0a31	Fix T-Head RVV intrinsic API changes.	2023-01-25 19:33:32 +08:00
Xianyi Zhang	d9993e21a2	Refs #3825 Merge branch 'HellerZheng-develop' into risc-v	2022-12-03 12:01:29 +08:00
Xianyi Zhang	e5313f53d5	Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop	2022-12-03 12:00:52 +08:00
Xianyi Zhang	e284c048df	Merge branch 'develop' into risc-v	2022-12-03 11:56:55 +08:00
Martin Kroeker	0a24f631e9	Merge pull request #3844 from Mousius/switch-ratio-16 Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs	2022-12-02 12:48:43 +01:00
Martin Kroeker	65984fbe68	Merge pull request #3847 from bartoldeman/scal-benchmark scal benchmark: eliminate y, move init/timing out of loop	2022-12-02 11:51:50 +01:00
Martin Kroeker	f6f0d13b9f	Merge pull request #3842 from Mousius/sve-dot Add SVE implementation for sdot/ddot	2022-12-02 08:30:51 +01:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
Martin Kroeker	b6a4ef98b9	Merge pull request #3845 from Mousius/asimd-dot-opt Remove unnecessary instructions from Advanced SIMD dot	2022-11-30 21:07:30 +01:00
Chris Sidebottom	2fb096315e	Set SWITCH_RATIO for Arm(R) Neoverse(TM) V1 CPUs From testing this yields better results than the default of `2`.	2022-11-30 09:35:38 +00:00
Bart Oldeman	bae45d94d1	scal benchmark: eliminate y, move init/timing out of loop Removing y avoids cache effects (if y is the size of the L1 cache, the main array x is removed from it). Moving init and timing out of the loop makes the scal benchmark behave like the gemm benchmark, and allows higher accuracy for smaller test cases since the loop overhead is much smaller than the timing overhead. Example: OPENBLAS_LOOPS=10000 ./dscal.goto 1024 8192 1024 on AMD Zen2 (7532) with 32k (4k doubles) L1 cache per core. Before From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 5627.08 MFlops 0.000000 sec 2048 : 5907.34 MFlops 0.000000 sec 3072 : 5553.30 MFlops 0.000001 sec 4096 : 5446.38 MFlops 0.000001 sec 5120 : 5504.61 MFlops 0.000001 sec 6144 : 5501.80 MFlops 0.000001 sec 7168 : 5547.43 MFlops 0.000001 sec 8192 : 5548.46 MFlops 0.000001 sec After From : 1024 To : 8192 Step = 1024 Inc_x = 1 Inc_y = 1 Loops = 10000 SIZE Flops 1024 : 6310.28 MFlops 0.000000 sec 2048 : 6396.29 MFlops 0.000000 sec 3072 : 6439.14 MFlops 0.000000 sec 4096 : 6327.14 MFlops 0.000001 sec 5120 : 5628.24 MFlops 0.000001 sec 6144 : 5616.41 MFlops 0.000001 sec 7168 : 5553.13 MFlops 0.000001 sec 8192 : 5600.88 MFlops 0.000001 sec We can see the L1->L2 switchover point is now where it should be, and the number of flops for L1 is more accurate.	2022-11-29 08:02:45 -05:00
Heller Zheng	387e8970cd	Fix merge problem; Update compiling COMMON_OPT per review comments.	2022-11-28 21:42:29 -08:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00
Martin Kroeker	e9a911fb9f	Merge pull request #3841 from martin-frbg/lapack755+764 Fix SLATRS3 and CLATRS3 tests in TESTING/LIN (Reference-LAPACK PRs 755+764)	2022-11-23 22:38:06 +01:00
Martin Kroeker	bf0e8d67b5	Merge pull request #3840 from martin-frbg/lapack760 Fix typo in EIG tests and spurious return in lapacke_?tz_trans utility (Reference-LAPACK PR760)	2022-11-23 19:16:25 +01:00
Martin Kroeker	a5470521ee	Fix array indexation in copy, and fix test (Reference-LAPACK PR764)	2022-11-23 15:31:25 +01:00
Martin Kroeker	b0393ea4e1	Fix test (Reference-LAPACK PR764)	2022-11-23 15:27:46 +01:00
Martin Kroeker	0d26f1a4c7	Fix wrong indexation in test (Reference-LAPACK PR755)	2022-11-23 15:22:27 +01:00
Martin Kroeker	19fd2d7f00	Use LSAME for character comparison (Reference-LAPACK PR755)	2022-11-23 15:19:07 +01:00
Martin Kroeker	663bf68dbd	Merge pull request #3839 from martin-frbg/lapack758 Fix array dimesion in complex SYL01 test (Reference-LAPACK PR758)	2022-11-23 14:57:56 +01:00
Martin Kroeker	c2ba4e6249	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:43:34 +01:00
Martin Kroeker	74962c7f53	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:42:29 +01:00
Martin Kroeker	d952cbf7bc	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:41:50 +01:00
Martin Kroeker	7694ff495f	Remove unnecessary return in void function call (Reference-LAPACK PR760)	2022-11-23 10:40:59 +01:00
Martin Kroeker	825ae316e2	Fix typo in EXTERNAL (Reference-LAPACK PR760)	2022-11-23 10:36:10 +01:00
Martin Kroeker	730ed549e6	Fix typo in EXTERNAL (Reference-LAPACK PR760)	2022-11-23 10:35:23 +01:00
Martin Kroeker	bc3393f703	Fix array dimension (Reference-LAPACK 758)	2022-11-23 10:31:18 +01:00
Martin Kroeker	0b2f8dabbf	Fix array dimension (Reference-LAPACK 758)	2022-11-23 10:30:35 +01:00
Martin Kroeker	b4c9228441	Merge pull request #3838 from martin-frbg/lapa311 Update the version number of the included LAPACK to 3.11.0	2022-11-22 17:39:51 +01:00
Martin Kroeker	e6e2a63650	Update LAPACK version number to 3.11.0	2022-11-22 14:02:21 +01:00
Martin Kroeker	8408357bab	Update LAPACK version number to 3.11.0	2022-11-22 14:01:48 +01:00
Martin Kroeker	ba8fb8b4b2	Merge pull request #3837 from martin-frbg/lapack655+697 Improve convergence of LAPACK ?LAED4 and fix a bug in DORCSD2BY1 (Reference-LAPACK PRs 655+697)	2022-11-22 13:51:57 +01:00
Martin Kroeker	cabf9453e2	Merge pull request #3836 from martin-frbg/lapack665+735 Fix documentation of LAPACK functions ?TPRFB and IEEECK (Reference-LAPACK PRs 665+735)	2022-11-22 09:25:24 +01:00
Heller Zheng	3918d8504e	nrm2 simple optimization	2022-11-21 19:06:07 -08:00

1 2 3 4 5 ...

6853 Commits