Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
...
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
2018-02-27 10:47:55 +00:00
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
2018-02-25 19:45:16 +01:00
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
...
ref #1439
2018-02-06 11:42:58 +01:00
Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
...
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Ashwin Sekhar T K
4899d67f7d
THUDNERX2T99: Fix clang compilation
2017-08-02 11:28:45 -07:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
2017-02-23 10:02:15 -08:00
Ashwin Sekhar T K
738628e9a8
ARM64: Remove unused code
2017-02-21 21:42:32 -08:00
Ashwin Sekhar T K
ab3ffab96a
THUNDERX2T99: Add Optimized C/Z DOT Implementation
2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K
f036be9ce2
THUNDERX2T99: Add Optimized SDOT Implementation
2017-02-21 03:24:32 -08:00
Ashwin Sekhar T K
faba876fda
THUNDERX2T99: Bug fix in C/Z IAMAX
2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K
172a62d73e
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
2017-02-17 03:06:32 -08:00
Ashwin Sekhar T K
228c75a69c
THUNDERX2T99: Add parallel SCNRM2 Implementation
2017-02-14 04:10:06 -08:00
Ashwin Sekhar T K
8e89668f62
THUNDERX2T99: Fix bug in SNRM2
2017-02-07 02:14:33 -08:00
Ashwin Sekhar T K
f63deae9de
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
2017-02-07 01:35:55 -08:00
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
2017-02-03 03:55:06 -08:00
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
2017-01-19 11:10:57 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
2017-01-16 21:44:19 +05:30
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
...
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
2017-01-10 15:01:37 +05:30
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
2017-01-10 15:01:05 +05:30
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 10:58:21 +05:30
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
2016-07-14 13:49:34 +05:30
Ashwin Sekhar T K
278511ad2d
Cortex-A57: Fix clang compilation errors
2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K
3b5ffb49d3
Cortex-A57: Improve DGEMM 8x4 Implementation
2016-03-24 10:25:18 +05:30
Ashwin Sekhar T K
5ac02f6dc7
Optimize Dgemm 4x4 for Cortex A57
2016-03-14 19:35:23 +05:30