Martin Kroeker
09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt
...
Improve the performance of sbgemm_tcopy on neoversen2
2023-03-26 19:04:57 +02:00
Chris Sidebottom
fd4f52c797
Add SVE implementation for sdot/ddot
...
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.
All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
lilianhuang
fdac8a97c1
Add sbgemm_ncopy_8 and sbgemm_tcopy_4
2022-11-29 04:46:14 -05:00
Honglin Zhu
79066b6bf3
Change file name to match the norm and delete useless code.
2022-10-28 17:09:39 +08:00
Honglin Zhu
b00d5b9746
New sbgemm implementation for Neoverse N2
...
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
...
implement ncopy tcopy kernel_8x4
2022-06-29 10:14:21 +08:00
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
2022-06-29 10:14:21 +08:00
Sunita Nadampalli
19c8f615dc
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-07 00:28:17 +00:00