OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	a875304eb0	fix inverted conditional for NAN handling	2024-07-26 09:50:20 +02:00
Martin Kroeker	f5d04318e3	Merge branch 'OpenMathLib:develop' into scalfixes	2024-07-21 13:43:43 +02:00
Martin Kroeker	a815594fd1	Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch Add autodetection for riscv64	2024-07-19 17:12:07 +02:00
Martin Kroeker	2020569705	fix NAN handling and make it depend on dummy2 parameter	2024-07-17 23:55:54 +02:00
Martin Kroeker	3870995f01	make NAN handling depend on dummy2 parameter	2024-07-17 23:54:24 +02:00
Martin Kroeker	7284c533b5	make NAN handling depend on dummy2 parameter	2024-07-17 23:50:40 +02:00
Mark Ryan	67bf4b6998	Fix axpby_rvv kernels for cases where inc_y = 0 The following openblas_utest tests fail when the RISCV64_ZVL128B is enabled. TEST 89/103 axpby:zaxpby_inc_0 [FAIL] TEST 92/103 axpby:caxpby_inc_0 [FAIL] TEST 95/103 axpby:daxpby_inc_0 [FAIL] TEST 98/103 axpby:saxpby_inc_0 [FAIL] The issue is that the vectorized kernels do not work when inc_y == 0. This patch updates the kernels to fall back to the scalar algorithms when inc_y == 0, fixing the failing tests. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:47 +00:00
Mark Ryan	3b715e6162	Add autodetection for riscv64 Implement DYNAMIC_ARCH support for riscv64. Three cpu types are supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b. The two non-generic kernels require CPU support for RVV 1.0 to function correctly. Detecting that a riscv64 device supports RVV 1.0 is a little complicated as there are some boards on the market that advertise support for V via hwcap but only support RVV 0.7.1, which is not binary compatible with RVV 1.0. The approach taken is to first try hwprobe. If hwprobe is not available, we fall back to hwcap + an additional check to distinguish between RVV 1.0 and RVV 0.7.1. Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no vector. A compiler with RVV 1.0 support must be used to build OpenBLAS for riscv64 when DYNAMIC_ARCH=1. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:22 +00:00
Martin Kroeker	c1019d5832	Handle INF and NAN in inputs	2024-06-27 10:58:59 +02:00
Martin Kroeker	516743f7dc	fix other instances of mishandling INF	2024-05-31 16:02:12 +02:00
Martin Kroeker	cf80bd8500	Update nrm2_rvv.c	2024-03-13 13:07:26 +01:00
Martin Kroeker	9baa757905	Update nrm2_vector.c	2024-03-13 11:40:14 +01:00
Martin Kroeker	18a6db6862	Update nrm2_vector.c	2024-03-13 11:10:26 +01:00
Martin Kroeker	3752e73919	handle incx < 0	2024-03-12 20:44:01 +01:00
Martin Kroeker	db70c7f7fb	handle incx < 0	2024-03-12 20:42:11 +01:00
Martin Kroeker	dee8557d58	handle incx < 0	2024-03-12 20:40:29 +01:00
Martin Kroeker	d9dff17aec	handle incx < 0	2024-03-12 20:38:23 +01:00
Martin Kroeker	6b89e1f1d7	fix loop condition for incx < 0	2024-03-12 15:49:41 +01:00
Martin Kroeker	20016a0096	fix loop condition for incx < 0	2024-03-12 15:48:55 +01:00
Sergei Lewis	ba17758c02	fix axpy implementations where y has a stride of 0	2024-02-16 16:00:38 +00:00
Sergei Lewis	ff1523163f	Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.	2024-02-09 12:59:14 +00:00
Martin Kroeker	6d8a273cca	Handle zero increment(s) in C910V ?AXPBY (#4483 ) * Handle zero increment(s)	2024-02-04 22:07:51 +01:00
Martin Kroeker	4d8dee508c	temporarily disable the CAXPY/ZAXPY kernels	2024-02-04 01:05:03 +01:00
Sergei Lewis	a3b0ef6596	Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling	2024-02-01 10:32:00 +00:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2024-01-26 13:31:09 +01:00
Martin Kroeker	4e2a32ff51	Merge pull request #4454 from kseniyazaytseva/riscv-rvv07 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets	2024-01-26 11:40:46 +01:00
Martin Kroeker	a21b2fa5e4	Merge pull request #4452 from kseniyazaytseva/riscv-generic Fix BLAS, BLAS-like functions and Generic RISC-V kernels	2024-01-24 17:52:25 +01:00
Andrey Sokolov	9c49a81d54	Resolve conflicts	2024-01-23 19:08:53 +03:00
kseniyazaytseva	e1afb23811	Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets * Fixed bugs in dgemm, [a]min\max, asum kernels * Added zero checks for BLAS kernels * Added dsdot implementation for RVV 0.7.1 * Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets * Added additional definitions for RISCV64_ZVL256B target	2024-01-23 19:01:31 +03:00
Octavian Maghiar	deecfb1a39	Merge branch 'risc-v' into img-riscv64-zvl128b	2024-01-19 12:26:38 +00:00
kseniyazaytseva	5222b5fc18	Added axpby kernels for GENERIC RISC-V target	2024-01-18 23:22:26 +03:00
kseniyazaytseva	ff41cf5c49	Fix BLAS, BLAS-like functions and Generic RISC-V kernels * Fixed gemmt, imatcopy, zimatcopy_cnc functions * Fixed cblas_cscal testing in ctest * Removed rotmg unreacheble code * Added zero size checks	2024-01-18 23:19:52 +03:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	2024-01-17 15:19:37 +01:00
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	2024-01-16 14:33:08 +00:00
Martin Kroeker	f637e12713	Handle INF and NAN	2024-01-08 09:52:38 +01:00
Martin Kroeker	f0808d856b	Handle NAN in input	2024-01-07 20:27:29 +01:00
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	2023-12-04 11:13:35 +00:00
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Martin Kroeker	a34a0a7abc	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:56:52 +02:00
Octavian Maghiar	826a9d5fa4	Adds tail undisturbed for RVV Level 2 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-25 11:36:23 +01:00
Octavian Maghiar	8df0289db6	Adds tail undisturbed for RVV Level 1 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-20 15:28:35 +01:00
Martin Kroeker	76ef1672f8	Override DSDOT with generic code to get rid of qemu precision error	2023-07-19 22:31:07 +02:00
Octavian Maghiar	1e4a3a2b5e	Fixes RVV masked intrinsics for izamax/izamin kernels	2023-07-12 12:55:50 +01:00
Octavian Maghiar	e1958eb705	Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.	2023-07-05 11:34:00 +01:00
Xianyi Zhang	e14a025bb1	Temporily walk around zaxpy vector kernel bug.	2023-06-28 11:17:38 +00:00
Martin Kroeker	772b0cc715	Fix early bailout	2023-06-27 16:12:27 +02:00
Martin Kroeker	d6be5036d7	Fix IDAMAX	2023-06-26 21:19:33 +02:00
Martin Kroeker	1fe96f8da7	Fix failures to handle increments of zero	2023-06-25 22:36:57 +02:00

1 2

79 Commits