265 Commits

Author SHA1 Message Date
Martin Kroeker
0b1acb0ba3 Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:03:35 +02:00
Martin Kroeker
c7bbad09ad Move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 18:00:47 +02:00
Martin Kroeker
cda29633a3 move ALPHA_I out of register 18 (reserved on OSX) 2023-04-13 17:59:48 +02:00
Martin Kroeker
09ace3cf23 Merge pull request #3846 from lilh9598/sbgemm_opt
Improve the performance of sbgemm_tcopy on neoversen2
2023-03-26 19:04:57 +02:00
Chris Sidebottom
1361229291 Remove prefetches from SVE kernels
This is a precursor to enabling the SVE kernels for Arm(R) Neoverse(TM)
V1 which has 256-bit SVE. Testing revealed that the SVE kernel was
actually worse in some cases than the existing kernel which seemed odd -
removing these prefetches the underlying architecture seems to do a better job
😸
2022-12-16 14:43:09 +00:00
lilianhuang
729af6406f bugfix for sbgemm_ncopy_8_neoversen2 2022-12-05 05:10:18 -05:00
Chris Sidebottom
eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom
fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
lilianhuang
fdac8a97c1 Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2022-11-29 04:46:14 -05:00
lilianhuang
135718eafc Improve the performance of sbgemm_tcopy on neoversen2 2022-11-28 04:17:54 -05:00
Chris Sidebottom
4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Martin Kroeker
1688c7da43 change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Honglin Zhu
79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Honglin Zhu
b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker
e12d474780 Eliminate uses of CREAL on left-hand side of assignments 2022-07-05 00:01:09 +02:00
Martin Kroeker
9e29598575 workaround fault with ssq=inf,scale=0 2022-07-02 23:47:17 +02:00
Honglin Zhu
123e0dfb62 Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f
2022-06-29 10:14:21 +08:00
Honglin Zhu
bc3728475f format code 2022-06-29 10:14:21 +08:00
Honglin Zhu
55d686d41e neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c neoverse n2 sbgemm: init file 2022-06-29 10:14:21 +08:00
Nursultan Zarlyk
1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 2022-06-02 16:53:54 +02:00
Martin Kroeker
115bc9b98f CortexX1 is ARMV8 like A7x 2022-03-28 17:28:29 +02:00
Martin Kroeker
b3b4672c30 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:29:20 +02:00
Martin Kroeker
c1c0d5ce1d Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
2022-01-18 21:36:33 +01:00
Bine Brank
19d435b1b3 update armv8sve + contributors 2022-01-18 08:28:31 +01:00
Bine Brank
0fb6cc07bf fix ztrsm lt/ut copy 2022-01-16 21:39:57 +01:00
Bine Brank
f1315288a8 add sve ztrsm 2022-01-15 22:27:25 +01:00
Bine Brank
aaa2b1a861 fix sve dtrsm kernels 2022-01-15 21:02:14 +01:00
Bine Brank
8071e179f1 add remaining sve trsm copy kernels 2022-01-11 21:16:38 +01:00
Bine Brank
f87468ac91 trsm_lncopy_sve 2022-01-10 21:45:37 +01:00
Bine Brank
e8939b3d30 sve trsmRN and trsmRT 2022-01-10 20:42:20 +01:00
Bine Brank
098672b51b add trsm_kernel_LT_sve 2022-01-09 20:11:47 +01:00
Bine Brank
be7e55880c sve trsm_kernel_LN 2022-01-09 19:40:04 +01:00
Sunita Nadampalli
19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
Bine Brank
f33543d029 combine zchemm into single file 2022-01-05 14:42:37 +01:00
Bine Brank
39ab219704 sve copy functions for cgemm chemm zsymm 2022-01-05 09:12:22 +01:00
Bine Brank
18102ae8c3 add cgemm ctrmm sve kernels 2022-01-05 09:09:18 +01:00
Bine Brank
87537b8c55 modify sve zgemmcopy kernels 2022-01-05 09:07:28 +01:00
Bine Brank
d30157d891 update configuration of kernels for A64FX and ARMV8SVE 2022-01-05 09:00:54 +01:00
Bine Brank
2e2c02b762 fix sve ztrmm kernel 2022-01-04 14:42:07 +01:00
Bine Brank
68c414d3a6 ztrmm sve copy functions 2022-01-04 14:40:59 +01:00
Bine Brank
ce329ab686 add sve zhemm copy routines 2022-01-03 15:56:05 +01:00
Bine Brank
0140373802 add sve ztrmm 2022-01-02 19:15:33 +01:00
Bine Brank
f7b6912868 ztrmm sve copy kernels 2021-12-30 21:00:16 +01:00
Bine Brank
40b14e4957 fix zgemm kernel 2021-12-29 11:42:04 +01:00
Bine Brank
6ec4aab875 zgemm sve copy routines 2021-12-26 17:05:46 +01:00
Bine Brank
878064f394 sve zgemm kernel 2021-12-26 08:44:05 +01:00
Bine Brank
683a7548bf added macros for sve zgemm kernels 2021-12-25 11:46:41 +01:00
Bine Brank
e3c9947c0f prepare kernel for sve zgemm 2021-12-21 11:19:27 +01:00
Jia-Chen
b610d2de37 optimize cgemm on ARM cortex A53 & cortex A55 2021-12-12 17:22:52 +08:00