OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	fc8894dd98	Workaround miscompilation by NVIDIA nvc	2023-08-26 00:30:17 +02:00
Martin Kroeker	5720fa02c5	Merge pull request #4168 from Mousius/sve-zgemm-cgemm Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core	2023-07-27 17:41:45 +02:00
Chris Sidebottom	84a268b6ca	Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores. I also replicated the unrolling the copies from sgemm and dgemm.	2023-07-27 14:12:20 +01:00
Chris Sidebottom	730ca04b48	Fix ZHEMM copy for SVE Whilst disambiguating whilelt, I inadvertantly used the wrong datatype for offsets, which can be negative. This rectifies that.	2023-07-27 13:27:28 +01:00
Martin Kroeker	849c8806b8	Merge pull request #4161 from Mousius/non-sve-kernels Use latest non-SVE kernels in ARMV8SVE	2023-07-26 15:49:40 +02:00
Chris Sidebottom	24586bc4ff	Disambiguate whilelt	2023-07-25 20:15:44 +01:00
Chris Sidebottom	aea2a4622b	Use latest non-SVE kernels in ARMV8SVE These are generally better and, in some cases, include threading which helps in the cores we're targeting here.	2023-07-25 14:12:26 +01:00
martin-frbg	7976deff80	Fix file permissions (issue 4095)	2023-07-23 20:37:07 +02:00
Martin Kroeker	3d31191b0f	Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140 ) * Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH * add casts to disambiguate svwhilelt for clang	2023-07-14 11:06:48 +02:00
Martin Kroeker	72caceb324	Merge pull request #4009 from Mousius/sve-gemm Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1	2023-04-22 13:56:45 +02:00
Chris Sidebottom	ec334e69dc	Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance. After #3868, the SVE kernels represent a pretty good boost. This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).	2023-04-17 17:38:42 +01:00
Martin Kroeker	44164e3a3d	revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)	2023-04-17 14:23:13 +02:00
Martin Kroeker	8be68fa7f4	move declaration of sca to really keep the compiler from throwing it out (for now)	2023-04-15 12:02:39 +02:00
Martin Kroeker	3727672a74	Improve workaround and keep compilers from optimizing it out	2023-04-13 18:07:52 +02:00
Martin Kroeker	108a21e47a	Move ALPHA out of register 18 (reserved on OSX)	2023-04-13 18:05:14 +02:00
Martin Kroeker	0b1acb0ba3	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:03:35 +02:00
Martin Kroeker	c7bbad09ad	Move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 18:00:47 +02:00
Martin Kroeker	cda29633a3	move ALPHA_I out of register 18 (reserved on OSX)	2023-04-13 17:59:48 +02:00
Martin Kroeker	09ace3cf23	Merge pull request #3846 from lilh9598/sbgemm_opt Improve the performance of sbgemm_tcopy on neoversen2	2023-03-26 19:04:57 +02:00
Chris Sidebottom	1361229291	Remove prefetches from SVE kernels This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM) V1 which has 256-bit SVE. Testing revealed that the SVE kernel was actually worse in some cases than the existing kernel which seemed odd - removing these prefetches the underlying architecture seems to do a better job 😸	2022-12-16 14:43:09 +00:00
lilianhuang	729af6406f	bugfix for sbgemm_ncopy_8_neoversen2	2022-12-05 05:10:18 -05:00
Chris Sidebottom	eea006a688	Wrap SVE header with __has_include check	2022-12-01 12:07:55 +00:00
Chris Sidebottom	fd4f52c797	Add SVE implementation for sdot/ddot This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel. All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.	2022-12-01 12:07:50 +00:00
lilianhuang	fdac8a97c1	Add sbgemm_ncopy_8 and sbgemm_tcopy_4	2022-11-29 04:46:14 -05:00
lilianhuang	135718eafc	Improve the performance of sbgemm_tcopy on neoversen2	2022-11-28 04:17:54 -05:00
Chris Sidebottom	4f7b77e08a	Remove unnecessary instructions from Advanced SIMD dot The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register. This has an impact on smaller sized dots and seemed like a quick fix	2022-11-25 16:19:03 +00:00
Martin Kroeker	1688c7da43	change line endings from CRLF to LF	2022-11-16 22:24:01 +01:00
Honglin Zhu	79066b6bf3	Change file name to match the norm and delete useless code.	2022-10-28 17:09:39 +08:00
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2022-10-26 15:09:41 +08:00
Martin Kroeker	e12d474780	Eliminate uses of CREAL on left-hand side of assignments	2022-07-05 00:01:09 +02:00
Martin Kroeker	9e29598575	workaround fault with ssq=inf,scale=0	2022-07-02 23:47:17 +02:00
Honglin Zhu	123e0dfb62	Neoverse N2 sbgemm: 1. Modify the algorithm to resolve multithreading failures 2. No memory allocation in sbgemm kernel 3. Optimize when alpha == 1.0f	2022-06-29 10:14:21 +08:00
Honglin Zhu	bc3728475f	format code	2022-06-29 10:14:21 +08:00
Honglin Zhu	55d686d41e	neoverse n2 sbgemm: implement ncopy tcopy kernel_8x4	2022-06-29 10:14:21 +08:00
Honglin Zhu	04593bb27c	neoverse n2 sbgemm: init file	2022-06-29 10:14:21 +08:00
Nursultan Zarlyk	1bb7993a97	Fix MSVC ARM64 build. Add generic kernel for ARM64	2022-06-02 16:53:54 +02:00
Martin Kroeker	115bc9b98f	CortexX1 is ARMV8 like A7x	2022-03-28 17:28:29 +02:00
Martin Kroeker	b3b4672c30	Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2	2022-03-27 15:29:20 +02:00
Martin Kroeker	c1c0d5ce1d	Merge pull request #3492 from binebrank/arm_sve_zgemm SVE zgemm&cgemm (and other BLAS 3 complex)	2022-01-18 21:36:33 +01:00
Bine Brank	19d435b1b3	update armv8sve + contributors	2022-01-18 08:28:31 +01:00
Bine Brank	0fb6cc07bf	fix ztrsm lt/ut copy	2022-01-16 21:39:57 +01:00
Bine Brank	f1315288a8	add sve ztrsm	2022-01-15 22:27:25 +01:00
Bine Brank	aaa2b1a861	fix sve dtrsm kernels	2022-01-15 21:02:14 +01:00
Bine Brank	8071e179f1	add remaining sve trsm copy kernels	2022-01-11 21:16:38 +01:00
Bine Brank	f87468ac91	trsm_lncopy_sve	2022-01-10 21:45:37 +01:00
Bine Brank	e8939b3d30	sve trsmRN and trsmRT	2022-01-10 20:42:20 +01:00
Bine Brank	098672b51b	add trsm_kernel_LT_sve	2022-01-09 20:11:47 +01:00
Bine Brank	be7e55880c	sve trsm_kernel_LN	2022-01-09 19:40:04 +01:00
Sunita Nadampalli	19c8f615dc	OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics	2022-01-07 00:28:17 +00:00
Bine Brank	f33543d029	combine zchemm into single file	2022-01-05 14:42:37 +01:00

1 2 3 4 5

230 Commits