Ashwin Sekhar T K
|
cde3aee08b
|
ARM64: Rename kernel files to have consistent naming
|
2017-01-24 14:53:34 +05:30 |
Ashwin Sekhar T K
|
ee6ea7e988
|
THUNDERX2T99: Add Optimized CNRM2 Implementation
|
2017-01-24 10:23:32 +05:30 |
Ashwin Sekhar T K
|
ca0b36b012
|
THUNDERX2T99: Add Optimized SNRM2 Implementation
|
2017-01-24 10:23:21 +05:30 |
Ashwin Sekhar T K
|
d0a79ca6e0
|
THUNDERX2T99: Add threaded DDOT Implementation
|
2017-01-19 11:11:42 +05:30 |
Ashwin Sekhar T K
|
0c07003ccf
|
THUNDERX2T99: Add Optimized DDOT Implementation
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
f33fcedb30
|
THUNDERX2T99: Improve SGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
0f1d6e8b39
|
THUNDERX2T99: Improve DGEMM
|
2017-01-19 11:11:07 +05:30 |
Ashwin Sekhar T K
|
981064acc6
|
THUNDERX2T99: Add Optimized DAXPY Implementation
|
2017-01-19 11:10:57 +05:30 |
Ashwin Sekhar T K
|
f279ff4789
|
THUNDERX2T99: Add Optimized SGEMM Implementation
|
2017-01-16 21:44:33 +05:30 |
Ashwin Sekhar T K
|
759f37feba
|
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
|
2017-01-16 21:44:19 +05:30 |
Zhang Xianyi
|
0863a0d4b4
|
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
|
2017-01-16 13:20:10 +08:00 |
Werner Saar
|
28e2fab33e
|
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
|
2017-01-11 11:56:50 +01:00 |
Ashwin Sekhar T K
|
4b55fae337
|
ARM64: Add Cavium THUNDERX2T99 Target
|
2017-01-11 11:18:40 +05:30 |
Andrew Pinski
|
95649dee28
|
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
|
2017-01-11 11:18:36 +05:30 |
Andrew Pinski
|
8fdb0655e9
|
THUNDERX: Add an optimized version of ddot
|
2017-01-10 15:01:37 +05:30 |
Andrew Pinski
|
fb200c7245
|
ARM64: Add Cavium THUNDERX Target
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
0b8e876d89
|
VULCAN: Add optimized DGEMM implementation
|
2017-01-10 15:01:37 +05:30 |
Ashwin Sekhar T K
|
4713e7c47f
|
ARM64: Add the VULCAN Target
|
2017-01-10 15:01:17 +05:30 |
Ashwin Sekhar T K
|
6085386b10
|
CORTEXA57: Add assembly kernels for copy routines
|
2017-01-10 15:01:05 +05:30 |
kaustubh
|
1480f3df71
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2017-01-09 18:27:23 +05:30 |
kaustubh
|
88afb3bc94
|
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2017-01-09 18:22:09 +05:30 |
Zhang Xianyi
|
b678471d65
|
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
|
2017-01-09 05:52:42 -05:00 |
Zhang Xianyi
|
864e202afd
|
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
|
2017-01-09 05:48:09 -05:00 |
Abdurrauf
|
6418667818
|
dtrmm and dgemm for z13
|
2017-01-04 19:32:33 +04:00 |
Shivraj Patil
|
a9bf8a781a
|
Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-12-27 11:33:51 +05:30 |
kaustubh
|
5f93aa5f87
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-14 14:05:11 +05:30 |
kaustubh
|
9db451acd0
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-13 14:02:14 +05:30 |
kaustubh
|
3eaff85191
|
Updated data prefetch in TRSM, ASUM, DOT functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-12-13 11:41:17 +05:30 |
kaustubh
|
00abce3b93
|
Add data prefetch in DOT and ASUM functions
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-11-22 11:21:03 +05:30 |
Andrew
|
becf8bc7a0
|
remove dead code
|
2016-10-31 12:46:56 +01:00 |
kaustubh
|
f3419e634c
|
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-10-17 18:29:38 +05:30 |
Zhang Xianyi
|
7472c79ea6
|
Merge pull request #984 from ksraste/develop
STRSM, DTRSM functions data prefetch
|
2016-10-17 11:33:16 +08:00 |
kaustubh
|
90e2321ac3
|
STRSM, DTRSM functions data prefetch
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
|
2016-10-14 16:41:28 +05:30 |
Martin Kroeker
|
4998e19869
|
Change file comments to work around clang 3.9 assembler bug
|
2016-10-13 16:51:08 +02:00 |
Martin Kroeker
|
91610f3835
|
Update zdot_msa.c
|
2016-10-05 18:59:09 +02:00 |
Martin Kroeker
|
6e22ecf102
|
Update zdot.c
|
2016-10-05 18:58:03 +02:00 |
Martin Kroeker
|
6221d6df5f
|
Update zdot.c
|
2016-10-05 18:57:14 +02:00 |
Martin Kroeker
|
16446d1d23
|
Remove explicit include of complex.h
|
2016-09-29 23:45:56 +02:00 |
Martin Kroeker
|
a6e9e0b94b
|
Remove explicit include of complex.h
|
2016-09-29 23:43:28 +02:00 |
Martin Kroeker
|
3178e4fea0
|
Remove explicit include of complex.h
|
2016-09-29 23:41:43 +02:00 |
Martin Kroeker
|
95c245ddb0
|
Remove explicit include of complex.h
|
2016-09-29 23:40:36 +02:00 |
Martin Kroeker
|
4b1b27347f
|
Remove explicit include of complex.h
|
2016-09-29 23:39:35 +02:00 |
Shivraj Patil
|
54747fe24a
|
DGEMM function split and data prefech
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-09-22 17:25:46 +05:30 |
Zhang Xianyi
|
515bc56ea9
|
Refs #946. Use nrm2 reference implementation for Power8.
|
2016-08-18 18:59:43 -07:00 |
Zhang Xianyi
|
ae70b916f4
|
Refs #929. Deal with zero and NaNs for scale.
|
2016-08-18 10:24:42 -07:00 |
Shivraj Patil
|
9687437928
|
MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-08-10 17:44:22 +05:30 |
Shivraj Patil
|
d1c6469283
|
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-08-08 11:58:01 +05:30 |
Ashwin Sekhar T K
|
c54a29bb48
|
Cortex A57: Improvements to DGEMM 8x4 kernel
|
2016-07-26 10:58:21 +05:30 |
Shivraj Patil
|
beb1d076a4
|
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-07-15 18:38:25 +05:30 |
Zhang Xianyi
|
8a592ee386
|
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
|
2016-07-14 15:47:55 -04:00 |