Martin Kroeker
dc34a0da96
Merge pull request #915 from mdong/small_fix_for_icc
...
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Martin Kroeker
9e2f316ede
Power8 inline assembly fixes
...
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
2017-02-13 23:38:50 +01:00
Martin Kroeker
60eea75409
Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
...
More optimized implementations for ThunderX2T99
2017-02-04 17:25:43 +01:00
Ashwin Sekhar T K
d09f88192c
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
2017-02-02 15:26:38 +05:30
Ashwin Sekhar T K
e58233460a
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
2017-02-02 15:26:22 +05:30
Ashwin Sekhar T K
99bd2892bf
THUNDERX2T99: Add optimized CASUM Implementation
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
ff6f572f2e
THUNDERX2T99: Rename labels in for DDOT and SNRM2
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
e0dc5f58c5
THUNDERX2T99: Remove Duplicate Code
2017-01-30 17:44:32 +05:30
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Zhang Xianyi
ff41e13385
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
...
Add more THUNDERX2T99 Optimized APIs
2017-01-25 22:17:05 +08:00
Ashwin Sekhar T K
907e286eb6
THUNDERX2T99: Add threaded SNRM2 Implementation
2017-01-24 21:39:29 +05:30
Ashwin Sekhar T K
cde3aee08b
ARM64: Rename kernel files to have consistent naming
2017-01-24 14:53:34 +05:30
Ashwin Sekhar T K
ee6ea7e988
THUNDERX2T99: Add Optimized CNRM2 Implementation
2017-01-24 10:23:32 +05:30
Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
2017-01-19 11:10:57 +05:30
Shivraj Patil
a4d97d980f
Added rot functions.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-01-17 12:15:07 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
2017-01-16 21:44:19 +05:30
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
...
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar
28e2fab33e
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
2017-01-11 11:56:50 +01:00
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
...
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
2017-01-10 15:01:37 +05:30
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
2017-01-10 15:01:05 +05:30
kaustubh
1480f3df71
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:27:23 +05:30
kaustubh
88afb3bc94
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:22:09 +05:30
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
2017-01-09 05:48:09 -05:00
Abdurrauf
6418667818
dtrmm and dgemm for z13
2017-01-04 19:32:33 +04:00
Shivraj Patil
a9bf8a781a
Added prefetch to CGEMV and ZGEMV.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-12-27 11:33:51 +05:30
kaustubh
5f93aa5f87
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-14 14:05:11 +05:30
kaustubh
9db451acd0
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 14:02:14 +05:30
kaustubh
3eaff85191
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 11:41:17 +05:30
kaustubh
00abce3b93
Add data prefetch in DOT and ASUM functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-11-22 11:21:03 +05:30
Andrew
becf8bc7a0
remove dead code
2016-10-31 12:46:56 +01:00
kaustubh
f3419e634c
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-17 18:29:38 +05:30
Zhang Xianyi
7472c79ea6
Merge pull request #984 from ksraste/develop
...
STRSM, DTRSM functions data prefetch
2016-10-17 11:33:16 +08:00
kaustubh
90e2321ac3
STRSM, DTRSM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-14 16:41:28 +05:30
Martin Kroeker
4998e19869
Change file comments to work around clang 3.9 assembler bug
2016-10-13 16:51:08 +02:00
Martin Kroeker
91610f3835
Update zdot_msa.c
2016-10-05 18:59:09 +02:00
Martin Kroeker
6e22ecf102
Update zdot.c
2016-10-05 18:58:03 +02:00
Martin Kroeker
6221d6df5f
Update zdot.c
2016-10-05 18:57:14 +02:00
Martin Kroeker
16446d1d23
Remove explicit include of complex.h
2016-09-29 23:45:56 +02:00