Ashwin Sekhar T K
|
67473d09dd
|
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
|
2017-02-28 01:11:38 -08:00 |
Ashwin Sekhar T K
|
19ba133383
|
THUNDERX2T99: Add Optimized ZGEMM Implementation
|
2017-02-28 05:31:41 +00:00 |
Ashwin Sekhar T K
|
a3935f0dfb
|
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
|
2017-02-23 10:02:15 -08:00 |
Ashwin Sekhar T K
|
738628e9a8
|
ARM64: Remove unused code
|
2017-02-21 21:42:32 -08:00 |
Ashwin Sekhar T K
|
ab3ffab96a
|
THUNDERX2T99: Add Optimized C/Z DOT Implementation
|
2017-02-21 03:40:59 -08:00 |
Ashwin Sekhar T K
|
f036be9ce2
|
THUNDERX2T99: Add Optimized SDOT Implementation
|
2017-02-21 03:24:32 -08:00 |
Ashwin Sekhar T K
|
faba876fda
|
THUNDERX2T99: Bug fix in C/Z IAMAX
|
2017-02-19 23:11:50 -08:00 |
Ashwin Sekhar T K
|
172a62d73e
|
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
|
2017-02-17 03:06:32 -08:00 |
Ashwin Sekhar T K
|
228c75a69c
|
THUNDERX2T99: Add parallel SCNRM2 Implementation
|
2017-02-14 04:10:06 -08:00 |
Ashwin Sekhar T K
|
8e89668f62
|
THUNDERX2T99: Fix bug in SNRM2
|
2017-02-07 02:14:33 -08:00 |
Ashwin Sekhar T K
|
f63deae9de
|
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
|
2017-02-07 01:35:55 -08:00 |
Ashwin Sekhar T K
|
071a830e8b
|
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
|
2017-02-03 03:55:06 -08:00 |
Ashwin Sekhar T K
|
d09f88192c
|
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
|
2017-02-02 15:26:38 +05:30 |
Ashwin Sekhar T K
|
e58233460a
|
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
|
2017-02-02 15:26:22 +05:30 |
Ashwin Sekhar T K
|
99bd2892bf
|
THUNDERX2T99: Add optimized CASUM Implementation
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
ff6f572f2e
|
THUNDERX2T99: Rename labels in for DDOT and SNRM2
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
e0dc5f58c5
|
THUNDERX2T99: Remove Duplicate Code
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
2757b49767
|
THUNDERX2T99: Add Optimized CGEMM Implementation
|
2017-01-30 17:44:26 +05:30 |
Ashwin Sekhar T K
|
907e286eb6
|
THUNDERX2T99: Add threaded SNRM2 Implementation
|
2017-01-24 21:39:29 +05:30 |
Ashwin Sekhar T K
|
cde3aee08b
|
ARM64: Rename kernel files to have consistent naming
|
2017-01-24 14:53:34 +05:30 |
Ashwin Sekhar T K
|
ee6ea7e988
|
THUNDERX2T99: Add Optimized CNRM2 Implementation
|
2017-01-24 10:23:32 +05:30 |
Ashwin Sekhar T K
|
ca0b36b012
|
THUNDERX2T99: Add Optimized SNRM2 Implementation
|
2017-01-24 10:23:21 +05:30 |
Ashwin Sekhar T K
|
d0a79ca6e0
|
THUNDERX2T99: Add threaded DDOT Implementation
|
2017-01-19 11:11:42 +05:30 |
Ashwin Sekhar T K
|
0c07003ccf
|
THUNDERX2T99: Add Optimized DDOT Implementation
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
f33fcedb30
|
THUNDERX2T99: Improve SGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
0f1d6e8b39
|
THUNDERX2T99: Improve DGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
981064acc6
|
THUNDERX2T99: Add Optimized DAXPY Implementation
|
2017-01-19 11:10:57 +05:30 |
Ashwin Sekhar T K
|
f279ff4789
|
THUNDERX2T99: Add Optimized SGEMM Implementation
|
2017-01-16 21:44:33 +05:30 |
Ashwin Sekhar T K
|
759f37feba
|
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
|
2017-01-16 21:44:19 +05:30 |
Ashwin Sekhar T K
|
4b55fae337
|
ARM64: Add Cavium THUNDERX2T99 Target
|
2017-01-11 11:18:40 +05:30 |
Andrew Pinski
|
95649dee28
|
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
|
2017-01-11 11:18:36 +05:30 |
Andrew Pinski
|
8fdb0655e9
|
THUNDERX: Add an optimized version of ddot
|
2017-01-10 15:01:37 +05:30 |
Andrew Pinski
|
fb200c7245
|
ARM64: Add Cavium THUNDERX Target
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
0b8e876d89
|
VULCAN: Add optimized DGEMM implementation
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
4713e7c47f
|
ARM64: Add the VULCAN Target
|
2017-01-10 15:01:17 +05:30 |
Ashwin Sekhar T K
|
6085386b10
|
CORTEXA57: Add assembly kernels for copy routines
|
2017-01-10 15:01:05 +05:30 |
Ashwin Sekhar T K
|
c54a29bb48
|
Cortex A57: Improvements to DGEMM 8x4 kernel
|
2016-07-26 10:58:21 +05:30 |
Ashwin Sekhar T K
|
0a5ff9f9f9
|
Improvements to TRMM and GEMM kernels
|
2016-07-14 13:56:04 +05:30 |
Ashwin Sekhar T K
|
8a40f1355e
|
Improvements to GEMV kernels
|
2016-07-14 13:50:38 +05:30 |
Ashwin Sekhar T K
|
78782485b6
|
Improvements to COPY and IAMAX kernels
|
2016-07-14 13:49:34 +05:30 |
Ashwin Sekhar T K
|
278511ad2d
|
Cortex-A57: Fix clang compilation errors
|
2016-03-24 10:42:04 +05:30 |
Ashwin Sekhar T K
|
3b5ffb49d3
|
Cortex-A57: Improve DGEMM 8x4 Implementation
|
2016-03-24 10:25:18 +05:30 |
Ashwin Sekhar T K
|
5ac02f6dc7
|
Optimize Dgemm 4x4 for Cortex A57
|
2016-03-14 19:35:23 +05:30 |
Ashwin Sekhar T K
|
7aa1ad4923
|
Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
|
2016-03-14 19:33:21 +05:30 |
Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
2016-02-23 22:47:53 +00:00 |
Ashwin Sekhar T K
|
318f0949c3
|
lapack-test fixes in nrm2 kernels for Cortex A57
|
2015-11-23 13:43:36 +05:30 |
Ashwin Sekhar T K
|
98965da2e8
|
lapack-test fixes for Cortex A57
|
2015-11-20 01:15:04 +05:30 |
Ashwin Sekhar T K
|
c99c43d51e
|
Optimized trmm kernels for CORTEXA57
|
2015-11-09 14:15:54 +05:30 |
Ashwin Sekhar T K
|
1397b47197
|
Optimized zgemm kernel for CORTEXA57
|
2015-11-09 14:15:53 +05:30 |
Ashwin Sekhar T K
|
45f78963ac
|
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
|
2015-11-09 14:15:53 +05:30 |