Commit Graph

790 Commits

Author SHA1 Message Date
Martin Kroeker 60eea75409 Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
More optimized implementations for ThunderX2T99
2017-02-04 17:25:43 +01:00
Ashwin Sekhar T K d09f88192c THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations 2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K e58233460a THUDNERX2T99: Add optimized D/C/Z ASUM Implementations 2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K 99bd2892bf THUNDERX2T99: Add optimized CASUM Implementation 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K ff6f572f2e THUNDERX2T99: Rename labels in for DDOT and SNRM2 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K e0dc5f58c5 THUNDERX2T99: Remove Duplicate Code 2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K 2757b49767 THUNDERX2T99: Add Optimized CGEMM Implementation 2017-01-30 17:44:26 +05:30
Zhang Xianyi ff41e13385 Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
Add more THUNDERX2T99 Optimized APIs
2017-01-25 22:17:05 +08:00
Ashwin Sekhar T K 907e286eb6 THUNDERX2T99: Add threaded SNRM2 Implementation 2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K cde3aee08b ARM64: Rename kernel files to have consistent naming 2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K ee6ea7e988 THUNDERX2T99: Add Optimized CNRM2 Implementation 2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K ca0b36b012 THUNDERX2T99: Add Optimized SNRM2 Implementation 2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K d0a79ca6e0 THUNDERX2T99: Add threaded DDOT Implementation 2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K 0c07003ccf THUNDERX2T99: Add Optimized DDOT Implementation 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K f33fcedb30 THUNDERX2T99: Improve SGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 0f1d6e8b39 THUNDERX2T99: Improve DGEMM 2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K 981064acc6 THUNDERX2T99: Add Optimized DAXPY Implementation 2017-01-19 11:10:57 +05:30
Shivraj Patil a4d97d980f Added rot functions.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-01-17 12:15:07 +05:30
Ashwin Sekhar T K f279ff4789 THUNDERX2T99: Add Optimized SGEMM Implementation 2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K 759f37feba ARM64: Let target VULCAN inherit THUNDERX2T99 properties 2017-01-16 21:44:19 +05:30
Zhang Xianyi 0863a0d4b4 Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar 28e2fab33e prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two 2017-01-11 11:56:50 +01:00
Ashwin Sekhar T K 4b55fae337 ARM64: Add Cavium THUNDERX2T99 Target 2017-01-11 11:18:40 +05:30
Andrew Pinski 95649dee28 THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski 8fdb0655e9 THUNDERX: Add an optimized version of ddot 2017-01-10 15:01:37 +05:30
Andrew Pinski fb200c7245 ARM64: Add Cavium THUNDERX Target 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 0b8e876d89 VULCAN: Add optimized DGEMM implementation 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 4713e7c47f ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K 6085386b10 CORTEXA57: Add assembly kernels for copy routines 2017-01-10 15:01:05 +05:30
kaustubh 1480f3df71 Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:27:23 +05:30
kaustubh 88afb3bc94 Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:22:09 +05:30
Zhang Xianyi b678471d65 Merge branch 'z13' into develop
Conflicts:
	CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi 864e202afd Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3 2017-01-09 05:48:09 -05:00
Abdurrauf 6418667818 dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
Shivraj Patil a9bf8a781a Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-12-27 11:33:51 +05:30
kaustubh 5f93aa5f87 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-14 14:05:11 +05:30
kaustubh 9db451acd0 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 14:02:14 +05:30
kaustubh 3eaff85191 Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 11:41:17 +05:30
kaustubh 00abce3b93 Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-11-22 11:21:03 +05:30
Andrew becf8bc7a0 remove dead code 2016-10-31 12:46:56 +01:00
kaustubh f3419e634c SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-17 18:29:38 +05:30
Zhang Xianyi 7472c79ea6 Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
2016-10-17 11:33:16 +08:00
kaustubh 90e2321ac3 STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-14 16:41:28 +05:30
Martin Kroeker 4998e19869 Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
Martin Kroeker 91610f3835 Update zdot_msa.c 2016-10-05 18:59:09 +02:00
Martin Kroeker 6e22ecf102 Update zdot.c 2016-10-05 18:58:03 +02:00
Martin Kroeker 6221d6df5f Update zdot.c 2016-10-05 18:57:14 +02:00
Martin Kroeker 16446d1d23 Remove explicit include of complex.h 2016-09-29 23:45:56 +02:00
Martin Kroeker a6e9e0b94b Remove explicit include of complex.h 2016-09-29 23:43:28 +02:00
Martin Kroeker 3178e4fea0 Remove explicit include of complex.h 2016-09-29 23:41:43 +02:00