Commit Graph

68 Commits

Author SHA1 Message Date
Ashwin Sekhar T K 67473d09dd THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM 2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K 19ba133383 THUNDERX2T99: Add Optimized ZGEMM Implementation 2017-02-28 05:31:41 +00:00
Ashwin Sekhar T K a3935f0dfb THUNDERX2T99: Add Optimized D/Z NRM2 Implementation 2017-02-23 10:02:15 -08:00
Ashwin Sekhar T K 738628e9a8 ARM64: Remove unused code 2017-02-21 21:42:32 -08:00
Ashwin Sekhar T K ab3ffab96a THUNDERX2T99: Add Optimized C/Z DOT Implementation 2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K f036be9ce2 THUNDERX2T99: Add Optimized SDOT Implementation 2017-02-21 03:24:32 -08:00
Ashwin Sekhar T K faba876fda THUNDERX2T99: Bug fix in C/Z IAMAX 2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K 172a62d73e THUNDERX2T99: Add Optimized C/Z IAMAX Implementation 2017-02-17 03:06:32 -08:00
Ashwin Sekhar T K 228c75a69c THUNDERX2T99: Add parallel SCNRM2 Implementation 2017-02-14 04:10:06 -08:00
Ashwin Sekhar T K 8e89668f62 THUNDERX2T99: Fix bug in SNRM2 2017-02-07 02:14:33 -08:00
Ashwin Sekhar T K f63deae9de THUNDERX2T99: Add Optimized S/D IAMAX Implementation 2017-02-07 01:35:55 -08:00
Ashwin Sekhar T K 071a830e8b THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 2017-02-03 03:55:06 -08:00
Ashwin Sekhar T K d09f88192c THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations 2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K e58233460a THUDNERX2T99: Add optimized D/C/Z ASUM Implementations 2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K 99bd2892bf THUNDERX2T99: Add optimized CASUM Implementation 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K ff6f572f2e THUNDERX2T99: Rename labels in for DDOT and SNRM2 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K e0dc5f58c5 THUNDERX2T99: Remove Duplicate Code 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K 2757b49767 THUNDERX2T99: Add Optimized CGEMM Implementation 2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K 907e286eb6 THUNDERX2T99: Add threaded SNRM2 Implementation 2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K cde3aee08b ARM64: Rename kernel files to have consistent naming 2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K ee6ea7e988 THUNDERX2T99: Add Optimized CNRM2 Implementation 2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K ca0b36b012 THUNDERX2T99: Add Optimized SNRM2 Implementation 2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K d0a79ca6e0 THUNDERX2T99: Add threaded DDOT Implementation 2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K 0c07003ccf THUNDERX2T99: Add Optimized DDOT Implementation 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K f33fcedb30 THUNDERX2T99: Improve SGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 0f1d6e8b39 THUNDERX2T99: Improve DGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 981064acc6 THUNDERX2T99: Add Optimized DAXPY Implementation 2017-01-19 11:10:57 +05:30
Ashwin Sekhar T K f279ff4789 THUNDERX2T99: Add Optimized SGEMM Implementation 2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K 759f37feba ARM64: Let target VULCAN inherit THUNDERX2T99 properties 2017-01-16 21:44:19 +05:30
Ashwin Sekhar T K 4b55fae337 ARM64: Add Cavium THUNDERX2T99 Target 2017-01-11 11:18:40 +05:30
Andrew Pinski 95649dee28 THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski 8fdb0655e9 THUNDERX: Add an optimized version of ddot 2017-01-10 15:01:37 +05:30
Andrew Pinski fb200c7245 ARM64: Add Cavium THUNDERX Target 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 0b8e876d89 VULCAN: Add optimized DGEMM implementation 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 4713e7c47f ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K 6085386b10 CORTEXA57: Add assembly kernels for copy routines 2017-01-10 15:01:05 +05:30
Ashwin Sekhar T K c54a29bb48 Cortex A57: Improvements to DGEMM 8x4 kernel 2016-07-26 10:58:21 +05:30
Ashwin Sekhar T K 0a5ff9f9f9 Improvements to TRMM and GEMM kernels 2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K 8a40f1355e Improvements to GEMV kernels 2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K 78782485b6 Improvements to COPY and IAMAX kernels 2016-07-14 13:49:34 +05:30
Ashwin Sekhar T K 278511ad2d Cortex-A57: Fix clang compilation errors 2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K 3b5ffb49d3 Cortex-A57: Improve DGEMM 8x4 Implementation 2016-03-24 10:25:18 +05:30
Ashwin Sekhar T K 5ac02f6dc7 Optimize Dgemm 4x4 for Cortex A57 2016-03-14 19:35:23 +05:30
Ashwin Sekhar T K 7aa1ad4923 Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
2016-03-14 19:33:21 +05:30
Zhang Xianyi 74b0672223 Fix c/zaxpyc kernel bug on Cortex-A57. 2016-02-23 22:47:53 +00:00
Ashwin Sekhar T K 318f0949c3 lapack-test fixes in nrm2 kernels for Cortex A57 2015-11-23 13:43:36 +05:30
Ashwin Sekhar T K 98965da2e8 lapack-test fixes for Cortex A57 2015-11-20 01:15:04 +05:30
Ashwin Sekhar T K c99c43d51e Optimized trmm kernels for CORTEXA57 2015-11-09 14:15:54 +05:30
Ashwin Sekhar T K 1397b47197 Optimized zgemm kernel for CORTEXA57 2015-11-09 14:15:53 +05:30
Ashwin Sekhar T K 45f78963ac Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30