Commit Graph

222 Commits

Author SHA1 Message Date
Martin Kroeker 3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH

* add casts to disambiguate svwhilelt for clang
2023-07-14 11:06:48 +02:00
Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.

After #3868, the SVE kernels represent a pretty good boost.

This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Martin Kroeker 44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2023-04-17 14:23:13 +02:00
Martin Kroeker 8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now) 2023-04-15 12:02:39 +02:00
Martin Kroeker 3727672a74
Improve workaround and keep compilers from optimizing it out 2023-04-13 18:07:52 +02:00
Martin Kroeker 108a21e47a
Move ALPHA out of register 18 (reserved on OSX) 2023-04-13 18:05:14 +02:00
Martin Kroeker 0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker cda29633a3
move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
Martin Kroeker 09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt
Improve the performance of sbgemm_tcopy on neoversen2
2023-03-26 19:04:57 +02:00
Chris Sidebottom 1361229291 Remove prefetches from SVE kernels
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
2022-12-16 14:43:09 +00:00
lilianhuang 729af6406f bugfix for sbgemm_ncopy_8_neoversen2 2022-12-05 05:10:18 -05:00
Chris Sidebottom eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
lilianhuang fdac8a97c1 Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2022-11-29 04:46:14 -05:00
lilianhuang 135718eafc Improve the performance of sbgemm_tcopy on neoversen2 2022-11-28 04:17:54 -05:00
Chris Sidebottom 4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Martin Kroeker 1688c7da43
change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Honglin Zhu 79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker e12d474780
Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker 9e29598575
workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Honglin Zhu 123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu 55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu 04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Nursultan Zarlyk 1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Martin Kroeker 115bc9b98f
CortexX1 is ARMV8 like A7x 2022-03-28 17:28:29 +02:00
Martin Kroeker b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
Martin Kroeker c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank 19d435b1b3 update armv8sve + contributors 2022-01-18 08:28:31 +01:00
Bine Brank 0fb6cc07bf fix ztrsm lt/ut copy 2022-01-16 21:39:57 +01:00
Bine Brank f1315288a8 add sve ztrsm 2022-01-15 22:27:25 +01:00
Bine Brank aaa2b1a861 fix sve dtrsm kernels 2022-01-15 21:02:14 +01:00
Bine Brank 8071e179f1 add remaining sve trsm copy kernels 2022-01-11 21:16:38 +01:00
Bine Brank f87468ac91 trsm_lncopy_sve 2022-01-10 21:45:37 +01:00
Bine Brank e8939b3d30 sve trsmRN and trsmRT 2022-01-10 20:42:20 +01:00
Bine Brank 098672b51b add trsm_kernel_LT_sve 2022-01-09 20:11:47 +01:00
Bine Brank be7e55880c sve trsm_kernel_LN 2022-01-09 19:40:04 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Bine Brank f33543d029 combine zchemm into single file 2022-01-05 14:42:37 +01:00
Bine Brank 39ab219704 sve copy functions for cgemm chemm zsymm 2022-01-05 09:12:22 +01:00
Bine Brank 18102ae8c3 add cgemm ctrmm sve kernels 2022-01-05 09:09:18 +01:00
Bine Brank 87537b8c55 modify sve zgemmcopy kernels 2022-01-05 09:07:28 +01:00
Bine Brank d30157d891 update configuration of kernels for A64FX and ARMV8SVE 2022-01-05 09:00:54 +01:00
Bine Brank 2e2c02b762 fix sve ztrmm kernel 2022-01-04 14:42:07 +01:00
Bine Brank 68c414d3a6 ztrmm sve copy functions 2022-01-04 14:40:59 +01:00
Bine Brank ce329ab686 add sve zhemm copy routines 2022-01-03 15:56:05 +01:00
Bine Brank 0140373802 add sve ztrmm 2022-01-02 19:15:33 +01:00