OpenBLAS

Commit Graph

Author	SHA1	Message	Date
gxw	ad13e04669	loongarch: Fixed {s/d/sc/dz}amin LSX opt	2024-03-19 09:18:44 +08:00
gxw	bbf82cb624	loongarch: Fixed {s/d}axpby LSX opt	2024-03-18 17:51:42 +08:00
gxw	ac460eb42a	loongarch: Fixed i{c/z}amin LSX opt	2024-03-18 17:15:58 +08:00
gxw	60e251a1f8	loongarch: Fixed {sc/dz}amax LASX opt	2024-03-16 14:52:17 +08:00
gxw	a10dde5554	loongarch: Fixed {s/d/sc/dz}amin LASX opt	2024-03-16 14:52:14 +08:00
gxw	6534d378b7	loongarch: Fixed {s/d/c/z}sum LASX opt	2024-03-16 14:52:10 +08:00
gxw	6159cffc58	loongarch: Fixed i{s/c/z}amin LASX opt	2024-03-16 14:52:06 +08:00
gxw	7d755912b9	loongarch: Fixed {s/d/c/z}axpby LASX opt	2024-03-16 14:51:56 +08:00
Martin Kroeker	cf80bd8500	Update nrm2_rvv.c	2024-03-13 13:07:26 +01:00
Martin Kroeker	9baa757905	Update nrm2_vector.c	2024-03-13 11:40:14 +01:00
Martin Kroeker	18a6db6862	Update nrm2_vector.c	2024-03-13 11:10:26 +01:00
Martin Kroeker	3752e73919	handle incx < 0	2024-03-12 20:44:01 +01:00
Martin Kroeker	db70c7f7fb	handle incx < 0	2024-03-12 20:42:11 +01:00
Martin Kroeker	dee8557d58	handle incx < 0	2024-03-12 20:40:29 +01:00
Martin Kroeker	d9dff17aec	handle incx < 0	2024-03-12 20:38:23 +01:00
Martin Kroeker	552c521353	remove another early exit for incx < 0	2024-03-12 18:49:27 +01:00
Martin Kroeker	ed532dc75b	remove another early exit for incx < 0	2024-03-12 18:47:00 +01:00
Martin Kroeker	6b89e1f1d7	fix loop condition for incx < 0	2024-03-12 15:49:41 +01:00
Martin Kroeker	20016a0096	fix loop condition for incx < 0	2024-03-12 15:48:55 +01:00
Martin Kroeker	09e84bd29a	fix loop condition for incx < 0	2024-03-12 15:48:00 +01:00
Martin Kroeker	f747aedb52	fix loop condition for incx < 0	2024-03-12 15:47:17 +01:00
Martin Kroeker	23796f8d31	fix loop condition for incx < 0	2024-03-12 15:46:23 +01:00
Martin Kroeker	bf93459746	fix loop condition for incx < 0	2024-03-12 15:45:23 +01:00
Martin Kroeker	e41d01bad9	remove early exit on negative inc_x	2024-03-11 22:53:54 +01:00
Martin Kroeker	02a025f9c1	remove early exit on negative inc_x	2024-03-11 22:52:18 +01:00
pengxu	680a77fafc	Optimized ssymv and dsymv kernel LSX for LoongArch	2024-03-05 20:36:59 +08:00
Chris Sidebottom	7a6fa699f2	Small GEMM for AArch64 This is a fairly conservative addition of small matrix kernels using SVE.	2024-03-04 15:48:47 +00:00
pengxu	6546600342	Optimized ssymv and dsymv kernel LASX for LoongArch	2024-03-04 16:18:39 +08:00
Chip-Kerchner	99384933ff	Revert "Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code" This reverts commit `accea15551`, reversing changes made to `b925353006`.	2024-03-01 07:57:39 -06:00
Martin Kroeker	577d480c62	Merge pull request #4529 from ErnstPeng/feature-branch Optimized sgemv and dgemv kernel LSX for LoongArch	2024-02-28 13:49:54 +01:00
pengxu	b2db064285	Optimized sgemv and dgemv kernel LSX for LoongArch	2024-02-28 18:07:27 +08:00
Martin Kroeker	cfbb701497	Merge pull request #4536 from XiWeiGu/loongarch64-cgemv-zgemv-opt Loongarch64 cgemv zgemv opt	2024-02-28 10:15:34 +01:00
gxw	8e05c053be	LoongArch64:Fixed the failed test cases test_{c/z}gemv_n in test_extensions	2024-02-27 22:19:26 -05:00
gxw	3f22fc2233	LoongArch64: Add zgemv LSX opt	2024-02-27 22:19:04 -05:00
gxw	c508a10cf2	LoongArch64: Add cgemv LSX opt	2024-02-27 22:17:30 -05:00
Martin Kroeker	accea15551	Merge pull request #4532 from austinpagan/cgemm_zgemm_c_code Cgemm zgemm c code	2024-02-27 22:07:07 +01:00
Martin Kroeker	8e872a91a9	Fix erroneous mapping of SUM kernels to ASUM	2024-02-27 11:28:50 +01:00
Martin Kroeker	6699227d45	Merge pull request #4525 from XiWeiGu/loongarch64_fixed_kernel_regress_skx_avx LoongArch64: Fixed utest kernel_regress:skx_avx	2024-02-26 09:49:34 +01:00
gxw	8dea25ffff	LoongArch64: Fixed utest kernel_regress:skx_avx	2024-02-26 02:04:37 -05:00
Martin Kroeker	7d506984fa	fix assignment of default CSUM kernel	2024-02-25 17:57:11 +01:00
Martin Kroeker	12787775d9	add csum/zsum kernels (trivially derived from the asum ones)s)	2024-02-25 17:55:36 +01:00
Martin Kroeker	8f8ef3492a	Add CSUM and ZSUM kernels (trivially derived from their existing ASUM counterparts)	2024-02-24 23:57:50 +01:00
Martin Kroeker	be5e18c6f9	Add kernel definitions for CSUM and ZSUM	2024-02-24 23:55:43 +01:00
gxw	990507e3b8	LoongArch64: Opt zgemv with LASX	2024-02-22 11:58:02 +08:00
gxw	d51ffec3a2	LoongArch64: Opt cgemv with LASX	2024-02-22 11:56:04 +08:00
pengxu	4787a55c64	Optimized cgemm kernel 16x4 LASX for LoongArch	2024-02-21 15:28:47 +08:00
Sergei Lewis	ba17758c02	fix axpy implementations where y has a stride of 0	2024-02-16 16:00:38 +00:00
Dmitry Mikushin	d0f5dc763b	Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500	2024-02-12 02:29:58 +01:00
Sergei Lewis	ff1523163f	Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.	2024-02-09 12:59:14 +00:00
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	2024-02-06 11:49:01 +08:00
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	2024-02-05 15:50:16 +01:00
Martin Kroeker	b537528feb	Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx LoongArch64: Fixed {s/d}amin LSX optimization	2024-02-05 06:24:50 +01:00
Martin Kroeker	6d8a273cca	Handle zero increment(s) in C910V ?AXPBY (#4483 ) * Handle zero increment(s)	2024-02-04 22:07:51 +01:00
Martin Kroeker	dbcf4f8b7d	Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby Loongarch opt axpby	2024-02-04 19:50:28 +01:00
Martin Kroeker	dc802dd637	Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR Vectorize in-copy packing/copying for SGEMM - up to 4X faster.	2024-02-04 18:51:09 +01:00
gxw	adde725321	LoongArch64: Fixed {s/d}amin LSX optimization	2024-02-04 14:44:47 +08:00
gxw	7bc93d95a1	LoongArch64: Opt {c/z}axpby	2024-02-04 11:23:31 +08:00
gxw	1e1f487dc7	LoongArch64: Fixed {s/d}axpby	2024-02-04 09:41:37 +08:00
Martin Kroeker	4d8dee508c	temporarily disable the CAXPY/ZAXPY kernels	2024-02-04 01:05:03 +01:00
austinpagan	87ba528d8b	Changed C files to straighten out indentation. Removed commented lines from other file.	2024-02-01 18:46:07 -06:00
austinpagan	461cf9083c	Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code	2024-02-01 12:40:04 -06:00
austinpagan	ddac75e0ef	Adding .C versions of CGEMM and ZGEMM	2024-02-01 12:24:25 -06:00
Chip Kerchner	2bb7ea64a1	Only vectorize 64-bit version for Power8.	2024-02-01 08:11:43 -06:00
Sergei Lewis	3ffd6868d7	Merge branch 'develop' into dev/slewis/merge-from-riscv	2024-02-01 11:29:41 +00:00
Sergei Lewis	a3b0ef6596	Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling	2024-02-01 10:32:00 +00:00
Martin Kroeker	d1343302bd	Merge pull request #4465 from XiWeiGu/utest-zscal utest: Add tests for zscal	2024-01-31 14:19:19 +01:00
gxw	969601a1dc	X86_64: Fixed bug in zscal Fixed handling of NAN and INF arguments when inc is greater than 1.	2024-01-31 11:23:59 +08:00
Martin Kroeker	98c9ff3194	Merge pull request #4464 from XiWeiGu/loongarch64-zscal LoongArch64: Handle NAN and INF	2024-01-30 22:53:29 +01:00
Chip Kerchner	09bb48d1b9	Vectorize in-copy packing/copying for SGEMM - 4X faster.	2024-01-30 09:13:16 -06:00
gxw	83ce97a4ca	LoongArch64: Handle NAN and INF	2024-01-30 17:17:30 +08:00
gxw	a79d117405	LoogArch64: Fixed bug for {s/d}amin	2024-01-30 11:32:57 +08:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2024-01-26 13:31:09 +01:00
Martin Kroeker	4e2a32ff51	Merge pull request #4454 from kseniyazaytseva/riscv-rvv07 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets	2024-01-26 11:40:46 +01:00
gxw	276e3ebf9e	LoongArch64: Add dzamax and dzamin opt	2024-01-26 10:03:50 +08:00
Martin Kroeker	a21b2fa5e4	Merge pull request #4452 from kseniyazaytseva/riscv-generic Fix BLAS, BLAS-like functions and Generic RISC-V kernels	2024-01-24 17:52:25 +01:00
Andrey Sokolov	9c49a81d54	Resolve conflicts	2024-01-23 19:08:53 +03:00
kseniyazaytseva	e1afb23811	Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets * Fixed bugs in dgemm, [a]min\max, asum kernels * Added zero checks for BLAS kernels * Added dsdot implementation for RVV 0.7.1 * Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets * Added additional definitions for RISCV64_ZVL256B target	2024-01-23 19:01:31 +03:00
Octavian Maghiar	deecfb1a39	Merge branch 'risc-v' into img-riscv64-zvl128b	2024-01-19 12:26:38 +00:00
kseniyazaytseva	5222b5fc18	Added axpby kernels for GENERIC RISC-V target	2024-01-18 23:22:26 +03:00
kseniyazaytseva	ff41cf5c49	Fix BLAS, BLAS-like functions and Generic RISC-V kernels * Fixed gemmt, imatcopy, zimatcopy_cnc functions * Fixed cblas_cscal testing in ctest * Removed rotmg unreacheble code * Added zero size checks	2024-01-18 23:19:52 +03:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	2024-01-17 15:19:37 +01:00
Dirreke	ec89466e14	Add CSKY support	2024-01-16 23:45:06 +08:00
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	2024-01-16 14:33:08 +00:00
Martin Kroeker	0d2e486edf	Handle NAN and INF	2024-01-15 11:18:59 +01:00
Martin Kroeker	5f5b7c4f45	Merge pull request #4423 from martin-frbg/issue4422 Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that	2024-01-12 16:30:50 +01:00
Martin Kroeker	f31bea07dd	Merge pull request #4419 from martin-frbg/issue4413 [WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments	2024-01-12 14:27:08 +01:00
Martin Kroeker	20413ee6ec	Update zscal.c	2024-01-12 13:11:13 +01:00
Martin Kroeker	b57627c27f	Handle NAN and INF	2024-01-12 12:03:08 +01:00
Martin Kroeker	995a990e24	Make AVX512 BFLOAT16 kernels conditional on compiler capability	2024-01-12 00:12:46 +01:00
Martin Kroeker	7df363e1e2	temporarily disable the MSA C/ZSCAL kernels	2024-01-12 00:08:52 +01:00
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2024-01-08 14:16:09 -06:00
Martin Kroeker	1c31f56e5a	Handle NAN	2024-01-08 16:11:25 +01:00
Martin Kroeker	7ee1ee38e2	Handle NaN in input	2024-01-08 14:20:07 +01:00
Martin Kroeker	f637e12713	Handle INF and NAN	2024-01-08 09:52:38 +01:00
Martin Kroeker	25b0c48082	Update zscal.c	2024-01-08 09:49:18 +01:00
Martin Kroeker	5e7f714e93	Update zscal.c	2024-01-08 08:17:40 +01:00
Martin Kroeker	cf8b03ae8b	Use NAN rather than SNAN for portability	2024-01-07 23:09:57 +01:00
Martin Kroeker	f0808d856b	Handle NAN in input	2024-01-07 20:27:29 +01:00
Martin Kroeker	acf17a825d	Handle NAN in input	2024-01-07 20:26:16 +01:00
Martin Kroeker	c9df62e883	Fix handling of NAN	2024-01-07 17:49:40 +01:00
Martin Kroeker	def4996170	Fix handling of NAN and INF arguments	2024-01-07 15:29:42 +01:00
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2023-12-30 19:51:08 +01:00
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2023-12-29 18:06:26 +08:00
gxw	546f13558c	loongarch64: Add {c/z}swap and {c/z}sum optimization	2023-12-29 17:30:57 +08:00
Hao Chen	edabb93668	loongarch64: Refine axpby optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	1ec5dded43	loongarch64: Add c/zrot optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	3c53ded315	loongarch64: Add c/znrm2 optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	fbd612f8c4	loongarch64: Add ic/zamin optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	d97272cb35	loongarch64: Add c/zdot optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	65a0aeb128	loongarch64: Add c/zcopy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	2a34fb4b80	loongarch64: Add and refine scal optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	8785e948b5	loongarch64: Add camin optimization function.	2023-12-29 17:30:57 +08:00
Hao Chen	0753848e03	loongarch64: Refine and add axpy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	06fd5b5995	loongarch64: Add and Refine asum optimization functions.	2023-12-29 17:30:57 +08:00
guxiwei	e771be185e	Optimize copy functions with lsx. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2023-12-29 17:30:57 +08:00
Hao Chen	173a65d4e6	loongarch64: Add and refine iamax optimization functions.	2023-12-29 17:30:57 +08:00
zhoupeng	ea70e165c7	loongarch64: Refine rot optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	116aee7527	loongarch64: Refine imin optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	8be2654193	loongarch64: Refine imax optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	154baad454	loongarch64: Refine iamin optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	36c12c4971	loongarch64: Refine copy,swap,nrm2,sum optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	c6996a80e9	loongarch64: Refine amax,amin,max,min optimization.	2023-12-29 17:30:57 +08:00
Chris Sidebottom	ecae1389df	Reduce duplication in kernel definitions These files are exactly the same, so I believe we can reduce these files down. Other files require a slightly more complex unpicking.	2023-12-23 12:39:53 +00:00
Chris Sidebottom	60e66725e4	Use numeric labels to allow repeated inlining	2023-12-19 13:11:06 +00:00
Chris Sidebottom	7a4fef4f60	Tweak SVE dot kernel This changes the SVE dot kernel to only predicate when necessary as well as streamlining the assembly a bit. The benchmarks seem to indicate this can improve performance by ~33%.	2023-12-19 12:08:54 +00:00
Martin Kroeker	f06b535566	Use C kernel for dgemv_t due to limitations of the old assembly one	2023-12-15 09:58:44 +01:00
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Chip-Kerchner	93747fb377	Merge remote-tracking branch 'origin/develop' into power10Copies	2023-12-12 09:32:49 -06:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00
yancheng	d32f38fb37	loongarch64: Add optimizations for nrm2.	2023-12-07 14:36:26 +08:00
yancheng	f9b468990e	loongarch64: Add optimizations for rot.	2023-12-07 14:36:26 +08:00
yancheng	c80e7e27d1	loongarch64: Add optimizations for sum and asum.	2023-12-07 14:36:26 +08:00
yancheng	d4c96a35a8	loongarch64: Add optimizations for axpy and axpby.	2023-12-07 14:36:26 +08:00
yancheng	360acc0a41	loongarch64: Add optimizations for swap.	2023-12-07 14:36:26 +08:00
yancheng	174c25766b	loongarch64: Add optimizations for copy.	2023-12-07 14:36:26 +08:00
yancheng	49829b2b7d	loongarch64: Add optimizations for iamin.	2023-12-07 14:36:07 +08:00
yancheng	be83f5e4e0	loongarch64: Add optimizations for iamax.	2023-12-07 14:36:07 +08:00
yancheng	e3fb2b5afa	loongarch64: Add optimizations for imin.	2023-12-07 14:36:07 +08:00
yancheng	e46b48e372	loongarch64: Add optimizations for imax.	2023-12-07 14:36:07 +08:00
yancheng	702fc1d56d	loongarch64: Add optimization for min.	2023-12-07 14:36:07 +08:00
yancheng	346b384d1c	loongarch64: Add optimization for max.	2023-12-07 14:36:07 +08:00
yancheng	ff2ecc6cda	loongarch64: Add optimization for amin.	2023-12-07 14:36:07 +08:00
yancheng	265b5f2e80	loongarch64: Add optimizations for amax.	2023-12-07 14:36:07 +08:00
yancheng	993ede7c70	loongarch64: Add optimizations for scal.	2023-12-07 14:36:07 +08:00
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	2023-12-04 11:13:35 +00:00
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Martin Kroeker	39bf8ece20	Merge pull request #4340 from yinshiyou/la-dev Add some refines and optimizations for LoongArch.	2023-11-29 08:22:25 +01:00

1 2 3 4 5 ...

2309 Commits