OpenBLAS

Author	SHA1	Message	Date
Andrey Sokolov	9c49a81d54	Resolve conflicts	2024-01-23 19:08:53 +03:00
kseniyazaytseva	e1afb23811	Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets * Fixed bugs in dgemm, [a]min\max, asum kernels * Added zero checks for BLAS kernels * Added dsdot implementation for RVV 0.7.1 * Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets * Added additional definitions for RISCV64_ZVL256B target	2024-01-23 19:01:31 +03:00
Octavian Maghiar	deecfb1a39	Merge branch 'risc-v' into img-riscv64-zvl128b	2024-01-19 12:26:38 +00:00
kseniyazaytseva	5222b5fc18	Added axpby kernels for GENERIC RISC-V target	2024-01-18 23:22:26 +03:00
kseniyazaytseva	ff41cf5c49	Fix BLAS, BLAS-like functions and Generic RISC-V kernels * Fixed gemmt, imatcopy, zimatcopy_cnc functions * Fixed cblas_cscal testing in ctest * Removed rotmg unreacheble code * Added zero size checks	2024-01-18 23:19:52 +03:00
kseniyazaytseva	b193ea3d7b	Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics * Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores) * Fixed nrm2, axpby, ncopy, zgemv and scal kernels * Added zero size checks	2024-01-18 22:14:32 +03:00
Martin Kroeker	88e994116c	Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator [RISC-V] Improve RVV kernel generator LMUL usage	2024-01-17 15:19:37 +01:00
Dirreke	ec89466e14	Add CSKY support	2024-01-16 23:45:06 +08:00
Sergei Lewis	9edb805e64	fix builds with t-head toolchains that use old versions of the intrinsics spec	2024-01-16 14:33:08 +00:00
Martin Kroeker	0d2e486edf	Handle NAN and INF	2024-01-15 11:18:59 +01:00
Martin Kroeker	5f5b7c4f45	Merge pull request #4423 from martin-frbg/issue4422 Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that	2024-01-12 16:30:50 +01:00
Martin Kroeker	f31bea07dd	Merge pull request #4419 from martin-frbg/issue4413 [WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments	2024-01-12 14:27:08 +01:00
Martin Kroeker	20413ee6ec	Update zscal.c	2024-01-12 13:11:13 +01:00
Martin Kroeker	b57627c27f	Handle NAN and INF	2024-01-12 12:03:08 +01:00
Martin Kroeker	995a990e24	Make AVX512 BFLOAT16 kernels conditional on compiler capability	2024-01-12 00:12:46 +01:00
Martin Kroeker	7df363e1e2	temporarily disable the MSA C/ZSCAL kernels	2024-01-12 00:08:52 +01:00
Chip-Kerchner	058dd2a4cb	Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.	2024-01-08 14:16:09 -06:00
Martin Kroeker	1c31f56e5a	Handle NAN	2024-01-08 16:11:25 +01:00
Martin Kroeker	7ee1ee38e2	Handle NaN in input	2024-01-08 14:20:07 +01:00
Martin Kroeker	f637e12713	Handle INF and NAN	2024-01-08 09:52:38 +01:00
Martin Kroeker	25b0c48082	Update zscal.c	2024-01-08 09:49:18 +01:00
Martin Kroeker	5e7f714e93	Update zscal.c	2024-01-08 08:17:40 +01:00
Martin Kroeker	cf8b03ae8b	Use NAN rather than SNAN for portability	2024-01-07 23:09:57 +01:00
Martin Kroeker	f0808d856b	Handle NAN in input	2024-01-07 20:27:29 +01:00
Martin Kroeker	acf17a825d	Handle NAN in input	2024-01-07 20:26:16 +01:00
Martin Kroeker	c9df62e883	Fix handling of NAN	2024-01-07 17:49:40 +01:00
Martin Kroeker	def4996170	Fix handling of NAN and INF arguments	2024-01-07 15:29:42 +01:00
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2023-12-30 19:51:08 +01:00
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2023-12-29 18:06:26 +08:00
gxw	546f13558c	loongarch64: Add {c/z}swap and {c/z}sum optimization	2023-12-29 17:30:57 +08:00
Hao Chen	edabb93668	loongarch64: Refine axpby optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	1ec5dded43	loongarch64: Add c/zrot optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	3c53ded315	loongarch64: Add c/znrm2 optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	fbd612f8c4	loongarch64: Add ic/zamin optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	d97272cb35	loongarch64: Add c/zdot optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	65a0aeb128	loongarch64: Add c/zcopy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	2a34fb4b80	loongarch64: Add and refine scal optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	8785e948b5	loongarch64: Add camin optimization function.	2023-12-29 17:30:57 +08:00
Hao Chen	0753848e03	loongarch64: Refine and add axpy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	06fd5b5995	loongarch64: Add and Refine asum optimization functions.	2023-12-29 17:30:57 +08:00
guxiwei	e771be185e	Optimize copy functions with lsx. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2023-12-29 17:30:57 +08:00
Hao Chen	173a65d4e6	loongarch64: Add and refine iamax optimization functions.	2023-12-29 17:30:57 +08:00
zhoupeng	ea70e165c7	loongarch64: Refine rot optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	116aee7527	loongarch64: Refine imin optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	8be2654193	loongarch64: Refine imax optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	154baad454	loongarch64: Refine iamin optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	36c12c4971	loongarch64: Refine copy,swap,nrm2,sum optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	c6996a80e9	loongarch64: Refine amax,amin,max,min optimization.	2023-12-29 17:30:57 +08:00
Chris Sidebottom	ecae1389df	Reduce duplication in kernel definitions These files are exactly the same, so I believe we can reduce these files down. Other files require a slightly more complex unpicking.	2023-12-23 12:39:53 +00:00

1 2 3 4 5 ...

2183 Commits