Martin Kroeker
|
60eea75409
|
Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
More optimized implementations for ThunderX2T99
|
2017-02-04 17:25:43 +01:00 |
Ashwin Sekhar T K
|
d09f88192c
|
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
|
2017-02-02 15:26:38 +05:30 |
Ashwin Sekhar T K
|
e58233460a
|
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
|
2017-02-02 15:26:22 +05:30 |
Ashwin Sekhar T K
|
99bd2892bf
|
THUNDERX2T99: Add optimized CASUM Implementation
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
ff6f572f2e
|
THUNDERX2T99: Rename labels in for DDOT and SNRM2
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
e0dc5f58c5
|
THUNDERX2T99: Remove Duplicate Code
|
2017-01-30 17:44:32 +05:30 |
Ashwin Sekhar T K
|
2757b49767
|
THUNDERX2T99: Add Optimized CGEMM Implementation
|
2017-01-30 17:44:26 +05:30 |
Zhang Xianyi
|
ff41e13385
|
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
Add more THUNDERX2T99 Optimized APIs
|
2017-01-25 22:17:05 +08:00 |
Ashwin Sekhar T K
|
907e286eb6
|
THUNDERX2T99: Add threaded SNRM2 Implementation
|
2017-01-24 21:39:29 +05:30 |
Ashwin Sekhar T K
|
cde3aee08b
|
ARM64: Rename kernel files to have consistent naming
|
2017-01-24 14:53:34 +05:30 |
Ashwin Sekhar T K
|
ee6ea7e988
|
THUNDERX2T99: Add Optimized CNRM2 Implementation
|
2017-01-24 10:23:32 +05:30 |
Ashwin Sekhar T K
|
ca0b36b012
|
THUNDERX2T99: Add Optimized SNRM2 Implementation
|
2017-01-24 10:23:21 +05:30 |
Ashwin Sekhar T K
|
d0a79ca6e0
|
THUNDERX2T99: Add threaded DDOT Implementation
|
2017-01-19 11:11:42 +05:30 |
Ashwin Sekhar T K
|
0c07003ccf
|
THUNDERX2T99: Add Optimized DDOT Implementation
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
f33fcedb30
|
THUNDERX2T99: Improve SGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
0f1d6e8b39
|
THUNDERX2T99: Improve DGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
981064acc6
|
THUNDERX2T99: Add Optimized DAXPY Implementation
|
2017-01-19 11:10:57 +05:30 |
Shivraj Patil
|
a4d97d980f
|
Added rot functions.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2017-01-17 12:15:07 +05:30 |
Ashwin Sekhar T K
|
f279ff4789
|
THUNDERX2T99: Add Optimized SGEMM Implementation
|
2017-01-16 21:44:33 +05:30 |
Ashwin Sekhar T K
|
759f37feba
|
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
|
2017-01-16 21:44:19 +05:30 |
Zhang Xianyi
|
0863a0d4b4
|
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
|
2017-01-16 13:20:10 +08:00 |
Werner Saar
|
28e2fab33e
|
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
|
2017-01-11 11:56:50 +01:00 |
Ashwin Sekhar T K
|
4b55fae337
|
ARM64: Add Cavium THUNDERX2T99 Target
|
2017-01-11 11:18:40 +05:30 |
Andrew Pinski
|
95649dee28
|
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
|
2017-01-11 11:18:36 +05:30 |
Andrew Pinski
|
8fdb0655e9
|
THUNDERX: Add an optimized version of ddot
|
2017-01-10 15:01:37 +05:30 |
Andrew Pinski
|
fb200c7245
|
ARM64: Add Cavium THUNDERX Target
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
0b8e876d89
|
VULCAN: Add optimized DGEMM implementation
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
4713e7c47f
|
ARM64: Add the VULCAN Target
|
2017-01-10 15:01:17 +05:30 |
Ashwin Sekhar T K
|
6085386b10
|
CORTEXA57: Add assembly kernels for copy routines
|
2017-01-10 15:01:05 +05:30 |
kaustubh
|
1480f3df71
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2017-01-09 18:27:23 +05:30 |
kaustubh
|
88afb3bc94
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2017-01-09 18:22:09 +05:30 |
Zhang Xianyi
|
b678471d65
|
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
|
2017-01-09 05:52:42 -05:00 |
Zhang Xianyi
|
864e202afd
|
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
|
2017-01-09 05:48:09 -05:00 |
Abdurrauf
|
6418667818
|
dtrmm and dgemm for z13
|
2017-01-04 19:32:33 +04:00 |
Shivraj Patil
|
a9bf8a781a
|
Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-12-27 11:33:51 +05:30 |
kaustubh
|
5f93aa5f87
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-14 14:05:11 +05:30 |
kaustubh
|
9db451acd0
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-13 14:02:14 +05:30 |
kaustubh
|
3eaff85191
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-13 11:41:17 +05:30 |
kaustubh
|
00abce3b93
|
Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-11-22 11:21:03 +05:30 |
Andrew
|
becf8bc7a0
|
remove dead code
|
2016-10-31 12:46:56 +01:00 |
kaustubh
|
f3419e634c
|
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-10-17 18:29:38 +05:30 |
Zhang Xianyi
|
7472c79ea6
|
Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
|
2016-10-17 11:33:16 +08:00 |
kaustubh
|
90e2321ac3
|
STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-10-14 16:41:28 +05:30 |
Martin Kroeker
|
4998e19869
|
Change file comments to work around clang 3.9 assembler bug
|
2016-10-13 16:51:08 +02:00 |
Martin Kroeker
|
91610f3835
|
Update zdot_msa.c
|
2016-10-05 18:59:09 +02:00 |
Martin Kroeker
|
6e22ecf102
|
Update zdot.c
|
2016-10-05 18:58:03 +02:00 |
Martin Kroeker
|
6221d6df5f
|
Update zdot.c
|
2016-10-05 18:57:14 +02:00 |
Martin Kroeker
|
16446d1d23
|
Remove explicit include of complex.h
|
2016-09-29 23:45:56 +02:00 |
Martin Kroeker
|
a6e9e0b94b
|
Remove explicit include of complex.h
|
2016-09-29 23:43:28 +02:00 |
Martin Kroeker
|
3178e4fea0
|
Remove explicit include of complex.h
|
2016-09-29 23:41:43 +02:00 |