Commit Graph

75 Commits

Author SHA1 Message Date
Craig Donner c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Ashwin Sekhar T K fa9ca65c0e ARM64: Fix utest dsdot errors 2018-02-27 10:47:55 +00:00
Martin Kroeker c9d408064a
Use dot.S also for DSDOT on CORTEXA57 2018-02-25 19:48:09 +01:00
Martin Kroeker 288d1a3f6e
Use dot.S also for DSDOT on ARMV8 2018-02-25 19:45:16 +01:00
Martin Kroeker b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
ref #1439
2018-02-06 11:42:58 +01:00
Ashwin Sekhar T K a0128aa489 ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.

To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Ashwin Sekhar T K 4899d67f7d THUDNERX2T99: Fix clang compilation 2017-08-02 11:28:45 -07:00
Ashwin Sekhar T K 67473d09dd THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM 2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K 19ba133383 THUNDERX2T99: Add Optimized ZGEMM Implementation 2017-02-28 05:31:41 +00:00
Ashwin Sekhar T K a3935f0dfb THUNDERX2T99: Add Optimized D/Z NRM2 Implementation 2017-02-23 10:02:15 -08:00
Ashwin Sekhar T K 738628e9a8 ARM64: Remove unused code 2017-02-21 21:42:32 -08:00
Ashwin Sekhar T K ab3ffab96a THUNDERX2T99: Add Optimized C/Z DOT Implementation 2017-02-21 03:40:59 -08:00
Ashwin Sekhar T K f036be9ce2 THUNDERX2T99: Add Optimized SDOT Implementation 2017-02-21 03:24:32 -08:00
Ashwin Sekhar T K faba876fda THUNDERX2T99: Bug fix in C/Z IAMAX 2017-02-19 23:11:50 -08:00
Ashwin Sekhar T K 172a62d73e THUNDERX2T99: Add Optimized C/Z IAMAX Implementation 2017-02-17 03:06:32 -08:00
Ashwin Sekhar T K 228c75a69c THUNDERX2T99: Add parallel SCNRM2 Implementation 2017-02-14 04:10:06 -08:00
Ashwin Sekhar T K 8e89668f62 THUNDERX2T99: Fix bug in SNRM2 2017-02-07 02:14:33 -08:00
Ashwin Sekhar T K f63deae9de THUNDERX2T99: Add Optimized S/D IAMAX Implementation 2017-02-07 01:35:55 -08:00
Ashwin Sekhar T K 071a830e8b THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 2017-02-03 03:55:06 -08:00
Ashwin Sekhar T K d09f88192c THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations 2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K e58233460a THUDNERX2T99: Add optimized D/C/Z ASUM Implementations 2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K 99bd2892bf THUNDERX2T99: Add optimized CASUM Implementation 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K ff6f572f2e THUNDERX2T99: Rename labels in for DDOT and SNRM2 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K e0dc5f58c5 THUNDERX2T99: Remove Duplicate Code 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K 2757b49767 THUNDERX2T99: Add Optimized CGEMM Implementation 2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K 907e286eb6 THUNDERX2T99: Add threaded SNRM2 Implementation 2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K cde3aee08b ARM64: Rename kernel files to have consistent naming 2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K ee6ea7e988 THUNDERX2T99: Add Optimized CNRM2 Implementation 2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K ca0b36b012 THUNDERX2T99: Add Optimized SNRM2 Implementation 2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K d0a79ca6e0 THUNDERX2T99: Add threaded DDOT Implementation 2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K 0c07003ccf THUNDERX2T99: Add Optimized DDOT Implementation 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K f33fcedb30 THUNDERX2T99: Improve SGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 0f1d6e8b39 THUNDERX2T99: Improve DGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 981064acc6 THUNDERX2T99: Add Optimized DAXPY Implementation 2017-01-19 11:10:57 +05:30
Ashwin Sekhar T K f279ff4789 THUNDERX2T99: Add Optimized SGEMM Implementation 2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K 759f37feba ARM64: Let target VULCAN inherit THUNDERX2T99 properties 2017-01-16 21:44:19 +05:30
Ashwin Sekhar T K 4b55fae337 ARM64: Add Cavium THUNDERX2T99 Target 2017-01-11 11:18:40 +05:30
Andrew Pinski 95649dee28 THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski 8fdb0655e9 THUNDERX: Add an optimized version of ddot 2017-01-10 15:01:37 +05:30
Andrew Pinski fb200c7245 ARM64: Add Cavium THUNDERX Target 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 0b8e876d89 VULCAN: Add optimized DGEMM implementation 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 4713e7c47f ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K 6085386b10 CORTEXA57: Add assembly kernels for copy routines 2017-01-10 15:01:05 +05:30
Ashwin Sekhar T K c54a29bb48 Cortex A57: Improvements to DGEMM 8x4 kernel 2016-07-26 10:58:21 +05:30
Ashwin Sekhar T K 0a5ff9f9f9 Improvements to TRMM and GEMM kernels 2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K 8a40f1355e Improvements to GEMV kernels 2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K 78782485b6 Improvements to COPY and IAMAX kernels 2016-07-14 13:49:34 +05:30
Ashwin Sekhar T K 278511ad2d Cortex-A57: Fix clang compilation errors 2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K 3b5ffb49d3 Cortex-A57: Improve DGEMM 8x4 Implementation 2016-03-24 10:25:18 +05:30
Ashwin Sekhar T K 5ac02f6dc7 Optimize Dgemm 4x4 for Cortex A57 2016-03-14 19:35:23 +05:30