Ashwin Sekhar T K
ca0b36b012
THUNDERX2T99: Add Optimized SNRM2 Implementation
2017-01-24 10:23:21 +05:30
Ashwin Sekhar T K
d0a79ca6e0
THUNDERX2T99: Add threaded DDOT Implementation
2017-01-19 11:11:42 +05:30
Ashwin Sekhar T K
0c07003ccf
THUNDERX2T99: Add Optimized DDOT Implementation
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
f33fcedb30
THUNDERX2T99: Improve SGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
0f1d6e8b39
THUNDERX2T99: Improve DGEMM
2017-01-19 11:11:07 +05:30
Ashwin Sekhar T K
981064acc6
THUNDERX2T99: Add Optimized DAXPY Implementation
2017-01-19 11:10:57 +05:30
Shivraj Patil
a4d97d980f
Added rot functions.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-01-17 12:15:07 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K
759f37feba
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
2017-01-16 21:44:19 +05:30
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
...
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar
28e2fab33e
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
2017-01-11 11:56:50 +01:00
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Andrew Pinski
95649dee28
THUNDERX: Add optimized version of daxpy
...
This is better for single core but does not change anything for multiple cores
2017-01-11 11:18:36 +05:30
Andrew Pinski
8fdb0655e9
THUNDERX: Add an optimized version of ddot
2017-01-10 15:01:37 +05:30
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
Ashwin Sekhar T K
6085386b10
CORTEXA57: Add assembly kernels for copy routines
2017-01-10 15:01:05 +05:30
kaustubh
1480f3df71
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:27:23 +05:30
kaustubh
88afb3bc94
Add msa optimization for AXPY, COPY, SCALE, SWAP
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2017-01-09 18:22:09 +05:30
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Zhang Xianyi
864e202afd
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
2017-01-09 05:48:09 -05:00
Abdurrauf
6418667818
dtrmm and dgemm for z13
2017-01-04 19:32:33 +04:00
Shivraj Patil
a9bf8a781a
Added prefetch to CGEMV and ZGEMV.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-12-27 11:33:51 +05:30
kaustubh
5f93aa5f87
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-14 14:05:11 +05:30
kaustubh
9db451acd0
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 14:02:14 +05:30
kaustubh
3eaff85191
Updated data prefetch in TRSM, ASUM, DOT functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-12-13 11:41:17 +05:30
kaustubh
00abce3b93
Add data prefetch in DOT and ASUM functions
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-11-22 11:21:03 +05:30
Andrew
becf8bc7a0
remove dead code
2016-10-31 12:46:56 +01:00
kaustubh
f3419e634c
SGEMM, DGEMM, CGEMM, ZGEMM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-17 18:29:38 +05:30
Zhang Xianyi
7472c79ea6
Merge pull request #984 from ksraste/develop
...
STRSM, DTRSM functions data prefetch
2016-10-17 11:33:16 +08:00
kaustubh
90e2321ac3
STRSM, DTRSM functions data prefetch
...
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
2016-10-14 16:41:28 +05:30
Martin Kroeker
4998e19869
Change file comments to work around clang 3.9 assembler bug
2016-10-13 16:51:08 +02:00
Martin Kroeker
91610f3835
Update zdot_msa.c
2016-10-05 18:59:09 +02:00
Martin Kroeker
6e22ecf102
Update zdot.c
2016-10-05 18:58:03 +02:00
Martin Kroeker
6221d6df5f
Update zdot.c
2016-10-05 18:57:14 +02:00
Martin Kroeker
16446d1d23
Remove explicit include of complex.h
2016-09-29 23:45:56 +02:00
Martin Kroeker
a6e9e0b94b
Remove explicit include of complex.h
2016-09-29 23:43:28 +02:00
Martin Kroeker
3178e4fea0
Remove explicit include of complex.h
2016-09-29 23:41:43 +02:00
Martin Kroeker
95c245ddb0
Remove explicit include of complex.h
2016-09-29 23:40:36 +02:00
Martin Kroeker
4b1b27347f
Remove explicit include of complex.h
2016-09-29 23:39:35 +02:00
Shivraj Patil
54747fe24a
DGEMM function split and data prefech
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 10:58:21 +05:30
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
...
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
2016-07-14 13:49:34 +05:30
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi
4a30a2584a
Merge pull request #897 from ksraste/develop
...
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
mdong
098d8ec5d6
remove input from clobbered list
2016-06-24 16:37:58 -04:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Kaustubh Raste
011431b9d7
STRSM optimized for MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30
Kaustubh Raste
c8a7860eb3
STRSM optimized
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Zhang Xianyi
2daad2bcb5
Merge pull request #893 from biddisco/develop
...
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
2016-05-30 14:52:58 +08:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Aleksey Kuleshov
fca66262c4
mips64/axpy: fix error when INCY == 0
2016-05-23 13:30:27 +03:00
Werner Saar
412bcd187a
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
2016-05-23 11:20:41 +02:00
Werner Saar
bd06b246cc
Merge pull request #890 from wernsaar/develop
...
optimized dtrsm_kernel_LT for POWER8
2016-05-22 16:01:35 +02:00
Werner Saar
8b140220c8
optimized dtrsm_kernel_LT for POWER8
2016-05-22 15:20:04 +02:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Kaustubh Raste
ad9f317870
STRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-20 10:59:03 +05:30
Shivraj Patil
c4ba40e308
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-19 11:04:42 +05:30
Zhang Xianyi
7a19065369
Merge pull request #878 from ksraste/develop
...
DTRSM bug fix for MIPS P5600 and I6400
2016-05-19 11:16:43 +08:00
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
2016-05-17 14:45:27 +02:00
Kaustubh Raste
d7cbc7ac13
DTRSM bug fix for MIPS P5600 and I6400
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-17 15:48:02 +05:30
Werner Saar
88011f625d
Merge pull request #876 from wernsaar/develop
...
optimized dgemm on power8 for 20 threads
2016-05-16 14:52:40 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Kaustubh Raste
edb5980c13
DTRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-09 15:15:26 +05:30
Shivraj Patil
085cf236c2
conflict resolved by syncing with 'xianyi:develop'
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-04 11:07:14 +05:30
Shivraj Patil
b7b3d8ec8e
DGEMM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-03 14:42:26 +05:30
Zhang Xianyi
cd7af5260a
Merge pull request #847 from sva-img/develop
...
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
2016-04-29 11:44:36 -04:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
298b13bba4
updated some kernel files for EXCAVATOR
2016-04-25 10:36:23 +02:00
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Shivraj Patil
2c3dfe2bf3
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
...
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-04-22 14:03:18 +05:30
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
wernsaar
6fbca2a4a1
Merge pull request #845 from wernsaar/develop
...
optimized sgemm for power8
2016-04-20 13:44:22 +02:00
Werner Saar
0001260f4b
optimized sgemm
2016-04-20 13:06:38 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Zhang Xianyi
dd43661cfd
Init IBM z system (s390x) porting.
2016-04-15 18:02:24 -04:00
Zhang Xianyi
f24d5307cf
Refs #834 . Fix zgemv config bug on Steamroller.
2016-04-12 22:26:11 +08:00
Werner Saar
8037d78eed
bugfix for arm scal.c and zscal.c
2016-04-11 11:21:36 +02:00
wernsaar
0a4276bc2f
Merge pull request #837 from wernsaar/develop
...
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 11:13:27 +02:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Zhang Xianyi
d4380c1fe4
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
2016-04-07 01:44:18 +08:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
d4c0330967
updated cgemm- and ctrmm-kernel for POWER8
2016-04-03 14:30:49 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00