OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Sergei Lewis	ba17758c02	fix axpy implementations where y has a stride of 0	2024-02-16 16:00:38 +00:00
Dmitry Mikushin	d0f5dc763b	Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500	2024-02-12 02:29:58 +01:00
Sergei Lewis	ff1523163f	Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.	2024-02-09 12:59:14 +00:00
pengxu	fe3da43b7d	Optimized zgemm kernel 84 LASX, 44 LSX and cgemm kernel 8*4 LSX for LoongArch	2024-02-06 11:49:01 +08:00
Martin Kroeker	e5d2725e5a	Merge pull request #4185 from XiWeiGu/mips_enable_msa MIPS: Enable MSA	2024-02-05 15:50:16 +01:00
Martin Kroeker	b537528feb	Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx LoongArch64: Fixed {s/d}amin LSX optimization	2024-02-05 06:24:50 +01:00
Martin Kroeker	6d8a273cca	Handle zero increment(s) in C910V ?AXPBY (#4483 ) * Handle zero increment(s)	2024-02-04 22:07:51 +01:00
Martin Kroeker	dbcf4f8b7d	Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby Loongarch opt axpby	2024-02-04 19:50:28 +01:00
Martin Kroeker	dc802dd637	Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR Vectorize in-copy packing/copying for SGEMM - up to 4X faster.	2024-02-04 18:51:09 +01:00
gxw	adde725321	LoongArch64: Fixed {s/d}amin LSX optimization	2024-02-04 14:44:47 +08:00
gxw	7bc93d95a1	LoongArch64: Opt {c/z}axpby	2024-02-04 11:23:31 +08:00
gxw	1e1f487dc7	LoongArch64: Fixed {s/d}axpby	2024-02-04 09:41:37 +08:00
Martin Kroeker	4d8dee508c	temporarily disable the CAXPY/ZAXPY kernels	2024-02-04 01:05:03 +01:00
austinpagan	87ba528d8b	Changed C files to straighten out indentation. Removed commented lines from other file.	2024-02-01 18:46:07 -06:00
austinpagan	461cf9083c	Merge remote-tracking branch 'origin/develop' into cgemm_zgemm_c_code	2024-02-01 12:40:04 -06:00
austinpagan	ddac75e0ef	Adding .C versions of CGEMM and ZGEMM	2024-02-01 12:24:25 -06:00
Chip Kerchner	2bb7ea64a1	Only vectorize 64-bit version for Power8.	2024-02-01 08:11:43 -06:00
Sergei Lewis	3ffd6868d7	Merge branch 'develop' into dev/slewis/merge-from-riscv	2024-02-01 11:29:41 +00:00
Sergei Lewis	a3b0ef6596	Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling	2024-02-01 10:32:00 +00:00
Martin Kroeker	d1343302bd	Merge pull request #4465 from XiWeiGu/utest-zscal utest: Add tests for zscal	2024-01-31 14:19:19 +01:00
gxw	969601a1dc	X86_64: Fixed bug in zscal Fixed handling of NAN and INF arguments when inc is greater than 1.	2024-01-31 11:23:59 +08:00
Martin Kroeker	98c9ff3194	Merge pull request #4464 from XiWeiGu/loongarch64-zscal LoongArch64: Handle NAN and INF	2024-01-30 22:53:29 +01:00
Chip Kerchner	09bb48d1b9	Vectorize in-copy packing/copying for SGEMM - 4X faster.	2024-01-30 09:13:16 -06:00
gxw	83ce97a4ca	LoongArch64: Handle NAN and INF	2024-01-30 17:17:30 +08:00
gxw	a79d117405	LoogArch64: Fixed bug for {s/d}amin	2024-01-30 11:32:57 +08:00
Sergei Lewis	1093def0d1	Merge branch 'risc-v' into develop	2024-01-29 11:11:39 +00:00
Martin Kroeker	889c5d026a	Merge pull request #4456 from kseniyazaytseva/riscv-rvv10 Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics	2024-01-26 13:31:09 +01:00
Martin Kroeker	4e2a32ff51	Merge pull request #4454 from kseniyazaytseva/riscv-rvv07 Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets	2024-01-26 11:40:46 +01:00
gxw	276e3ebf9e	LoongArch64: Add dzamax and dzamin opt	2024-01-26 10:03:50 +08:00
Martin Kroeker	a21b2fa5e4	Merge pull request #4452 from kseniyazaytseva/riscv-generic Fix BLAS, BLAS-like functions and Generic RISC-V kernels	2024-01-24 17:52:25 +01:00
Andrey Sokolov	9c49a81d54	Resolve conflicts	2024-01-23 19:08:53 +03:00
kseniyazaytseva	e1afb23811	Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets * Fixed bugs in dgemm, [a]min\max, asum kernels * Added zero checks for BLAS kernels * Added dsdot implementation for RVV 0.7.1 * Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets * Added additional definitions for RISCV64_ZVL256B target	2024-01-23 19:01:31 +03:00
Octavian Maghiar	deecfb1a39	Merge branch 'risc-v' into img-riscv64-zvl128b	2024-01-19 12:26:38 +00:00
kseniyazaytseva	5222b5fc18	Added axpby kernels for GENERIC RISC-V target	2024-01-18 23:22:26 +03:00
kseniyazaytseva	ff41cf5c49	Fix BLAS, BLAS-like functions and Generic RISC-V kernels * Fixed gemmt, imatcopy, zimatcopy_cnc functions * Fixed cblas_cscal testing in ctest * Removed rotmg unreacheble code * Added zero size checks	2024-01-18 23:19:52 +03:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	2024-01-17 15:19:37 +01:00
Dirreke	ec89466e14	Add CSKY support	2024-01-16 23:45:06 +08:00
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	2024-01-16 14:33:08 +00:00
Martin Kroeker	0d2e486edf	Handle NAN and INF	2024-01-15 11:18:59 +01:00
Martin Kroeker	5f5b7c4f45	Merge pull request #4423 from martin-frbg/issue4422 Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that	2024-01-12 16:30:50 +01:00
Martin Kroeker	f31bea07dd	Merge pull request #4419 from martin-frbg/issue4413 [WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments	2024-01-12 14:27:08 +01:00
Martin Kroeker	20413ee6ec	Update zscal.c	2024-01-12 13:11:13 +01:00
Martin Kroeker	b57627c27f	Handle NAN and INF	2024-01-12 12:03:08 +01:00
Martin Kroeker	995a990e24	Make AVX512 BFLOAT16 kernels conditional on compiler capability	2024-01-12 00:12:46 +01:00
Martin Kroeker	7df363e1e2	temporarily disable the MSA C/ZSCAL kernels	2024-01-12 00:08:52 +01:00
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2024-01-08 14:16:09 -06:00
Martin Kroeker	1c31f56e5a	Handle NAN	2024-01-08 16:11:25 +01:00
Martin Kroeker	7ee1ee38e2	Handle NaN in input	2024-01-08 14:20:07 +01:00
Martin Kroeker	f637e12713	Handle INF and NAN	2024-01-08 09:52:38 +01:00
Martin Kroeker	25b0c48082	Update zscal.c	2024-01-08 09:49:18 +01:00
Martin Kroeker	5e7f714e93	Update zscal.c	2024-01-08 08:17:40 +01:00
Martin Kroeker	cf8b03ae8b	Use NAN rather than SNAN for portability	2024-01-07 23:09:57 +01:00
Martin Kroeker	f0808d856b	Handle NAN in input	2024-01-07 20:27:29 +01:00
Martin Kroeker	acf17a825d	Handle NAN in input	2024-01-07 20:26:16 +01:00
Martin Kroeker	c9df62e883	Fix handling of NAN	2024-01-07 17:49:40 +01:00
Martin Kroeker	def4996170	Fix handling of NAN and INF arguments	2024-01-07 15:29:42 +01:00
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2023-12-30 19:51:08 +01:00
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2023-12-29 18:06:26 +08:00
gxw	546f13558c	loongarch64: Add {c/z}swap and {c/z}sum optimization	2023-12-29 17:30:57 +08:00
Hao Chen	edabb93668	loongarch64: Refine axpby optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	1ec5dded43	loongarch64: Add c/zrot optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	3c53ded315	loongarch64: Add c/znrm2 optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	fbd612f8c4	loongarch64: Add ic/zamin optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	d97272cb35	loongarch64: Add c/zdot optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	65a0aeb128	loongarch64: Add c/zcopy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	2a34fb4b80	loongarch64: Add and refine scal optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	8785e948b5	loongarch64: Add camin optimization function.	2023-12-29 17:30:57 +08:00
Hao Chen	0753848e03	loongarch64: Refine and add axpy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	06fd5b5995	loongarch64: Add and Refine asum optimization functions.	2023-12-29 17:30:57 +08:00
guxiwei	e771be185e	Optimize copy functions with lsx. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2023-12-29 17:30:57 +08:00
Hao Chen	173a65d4e6	loongarch64: Add and refine iamax optimization functions.	2023-12-29 17:30:57 +08:00
zhoupeng	ea70e165c7	loongarch64: Refine rot optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	116aee7527	loongarch64: Refine imin optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	8be2654193	loongarch64: Refine imax optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	154baad454	loongarch64: Refine iamin optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	36c12c4971	loongarch64: Refine copy,swap,nrm2,sum optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	c6996a80e9	loongarch64: Refine amax,amin,max,min optimization.	2023-12-29 17:30:57 +08:00
Chris Sidebottom	ecae1389df	Reduce duplication in kernel definitions These files are exactly the same, so I believe we can reduce these files down. Other files require a slightly more complex unpicking.	2023-12-23 12:39:53 +00:00
Chris Sidebottom	60e66725e4	Use numeric labels to allow repeated inlining	2023-12-19 13:11:06 +00:00
Chris Sidebottom	7a4fef4f60	Tweak SVE dot kernel This changes the SVE dot kernel to only predicate when necessary as well as streamlining the assembly a bit. The benchmarks seem to indicate this can improve performance by ~33%.	2023-12-19 12:08:54 +00:00
Martin Kroeker	f06b535566	Use C kernel for dgemv_t due to limitations of the old assembly one	2023-12-15 09:58:44 +01:00
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Chip-Kerchner	93747fb377	Merge remote-tracking branch 'origin/develop' into power10Copies	2023-12-12 09:32:49 -06:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00
yancheng	d32f38fb37	loongarch64: Add optimizations for nrm2.	2023-12-07 14:36:26 +08:00
yancheng	f9b468990e	loongarch64: Add optimizations for rot.	2023-12-07 14:36:26 +08:00
yancheng	c80e7e27d1	loongarch64: Add optimizations for sum and asum.	2023-12-07 14:36:26 +08:00
yancheng	d4c96a35a8	loongarch64: Add optimizations for axpy and axpby.	2023-12-07 14:36:26 +08:00
yancheng	360acc0a41	loongarch64: Add optimizations for swap.	2023-12-07 14:36:26 +08:00
yancheng	174c25766b	loongarch64: Add optimizations for copy.	2023-12-07 14:36:26 +08:00
yancheng	49829b2b7d	loongarch64: Add optimizations for iamin.	2023-12-07 14:36:07 +08:00
yancheng	be83f5e4e0	loongarch64: Add optimizations for iamax.	2023-12-07 14:36:07 +08:00
yancheng	e3fb2b5afa	loongarch64: Add optimizations for imin.	2023-12-07 14:36:07 +08:00
yancheng	e46b48e372	loongarch64: Add optimizations for imax.	2023-12-07 14:36:07 +08:00
yancheng	702fc1d56d	loongarch64: Add optimization for min.	2023-12-07 14:36:07 +08:00
yancheng	346b384d1c	loongarch64: Add optimization for max.	2023-12-07 14:36:07 +08:00
yancheng	ff2ecc6cda	loongarch64: Add optimization for amin.	2023-12-07 14:36:07 +08:00
yancheng	265b5f2e80	loongarch64: Add optimizations for amax.	2023-12-07 14:36:07 +08:00

1 2 3 4 5 ...

2213 Commits