Martin Kroeker
3178e4fea0
Remove explicit include of complex.h
2016-09-29 23:41:43 +02:00
Martin Kroeker
95c245ddb0
Remove explicit include of complex.h
2016-09-29 23:40:36 +02:00
Martin Kroeker
4b1b27347f
Remove explicit include of complex.h
2016-09-29 23:39:35 +02:00
Shivraj Patil
54747fe24a
DGEMM function split and data prefech
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 10:58:21 +05:30
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
...
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
2016-07-14 13:49:34 +05:30
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi
4a30a2584a
Merge pull request #897 from ksraste/develop
...
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Kaustubh Raste
011431b9d7
STRSM optimized for MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30
Kaustubh Raste
c8a7860eb3
STRSM optimized
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Zhang Xianyi
2daad2bcb5
Merge pull request #893 from biddisco/develop
...
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
2016-05-30 14:52:58 +08:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Aleksey Kuleshov
fca66262c4
mips64/axpy: fix error when INCY == 0
2016-05-23 13:30:27 +03:00
Werner Saar
412bcd187a
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
2016-05-23 11:20:41 +02:00
Werner Saar
bd06b246cc
Merge pull request #890 from wernsaar/develop
...
optimized dtrsm_kernel_LT for POWER8
2016-05-22 16:01:35 +02:00
Werner Saar
8b140220c8
optimized dtrsm_kernel_LT for POWER8
2016-05-22 15:20:04 +02:00
Werner Saar
8fb5a1aaff
added optimized dtrsm_LT kernel for POWER8
2016-05-22 13:09:05 +02:00
Kaustubh Raste
ad9f317870
STRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-20 10:59:03 +05:30
Shivraj Patil
c4ba40e308
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-19 11:04:42 +05:30
Zhang Xianyi
7a19065369
Merge pull request #878 from ksraste/develop
...
DTRSM bug fix for MIPS P5600 and I6400
2016-05-19 11:16:43 +08:00
Werner Saar
6a2bde7a2d
optimized dgemm and dgetrf for POWER8
2016-05-17 14:45:27 +02:00
Kaustubh Raste
d7cbc7ac13
DTRSM bug fix for MIPS P5600 and I6400
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-17 15:48:02 +05:30
Werner Saar
88011f625d
Merge pull request #876 from wernsaar/develop
...
optimized dgemm on power8 for 20 threads
2016-05-16 14:52:40 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Kaustubh Raste
edb5980c13
DTRSM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-09 15:15:26 +05:30
Shivraj Patil
085cf236c2
conflict resolved by syncing with 'xianyi:develop'
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-04 11:07:14 +05:30
Shivraj Patil
b7b3d8ec8e
DGEMM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-03 14:42:26 +05:30
Zhang Xianyi
cd7af5260a
Merge pull request #847 from sva-img/develop
...
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
2016-04-29 11:44:36 -04:00
Werner Saar
56948dbf0f
optimized dgemm for POWER8
2016-04-29 12:52:47 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
298b13bba4
updated some kernel files for EXCAVATOR
2016-04-25 10:36:23 +02:00
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
a3da10662f
added sgemm_tcopy_8_power8.S
2016-04-23 10:04:41 +02:00
Werner Saar
d46f07bb4e
added cgemm_tcopy_8_power8.S
2016-04-23 07:37:18 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Shivraj Patil
2c3dfe2bf3
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
...
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-04-22 14:03:18 +05:30
Werner Saar
9276c9012f
Optimized sgemm and dgemm and tested again.
2016-04-21 11:37:57 +02:00
wernsaar
6fbca2a4a1
Merge pull request #845 from wernsaar/develop
...
optimized sgemm for power8
2016-04-20 13:44:22 +02:00
Werner Saar
0001260f4b
optimized sgemm
2016-04-20 13:06:38 +02:00
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Zhang Xianyi
dd43661cfd
Init IBM z system (s390x) porting.
2016-04-15 18:02:24 -04:00
Zhang Xianyi
f24d5307cf
Refs #834 . Fix zgemv config bug on Steamroller.
2016-04-12 22:26:11 +08:00
Werner Saar
8037d78eed
bugfix for arm scal.c and zscal.c
2016-04-11 11:21:36 +02:00
wernsaar
0a4276bc2f
Merge pull request #837 from wernsaar/develop
...
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 11:13:27 +02:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Zhang Xianyi
d4380c1fe4
Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver.
2016-04-07 01:44:18 +08:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
d4c0330967
updated cgemm- and ctrmm-kernel for POWER8
2016-04-03 14:30:49 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
68a69c5b50
added optimized dgemv_n kernel for POWER8
2016-03-30 11:10:53 +02:00
Werner Saar
c2464a7c4a
added optimized casum kernel for POWER8
2016-03-28 14:12:08 +02:00
Werner Saar
294f933869
added optimized zasum kernel for POWER8
2016-03-28 13:37:32 +02:00
Werner Saar
f59c9bd6ef
added optimized sasum kernel for POWER8
2016-03-28 12:44:25 +02:00
Werner Saar
c53be46d78
added optimized dasum kernel for POWER8
2016-03-28 12:17:15 +02:00
Werner Saar
659ed16591
added otimized cswap and zswap kernels for POWER8
2016-03-27 18:31:37 +02:00
Werner Saar
35c98a3556
added optimized zscal kernel for POWER8
2016-03-27 16:31:50 +02:00
Werner Saar
f1a5dd06c5
added optimized sscal kernel for POWER8
2016-03-27 11:05:56 +02:00
wernsaar
e125a3dc33
Merge pull request #824 from wernsaar/develop
...
added optimized drot-kernel and srot-kernel for POWER8
2016-03-27 10:43:17 +02:00
Werner Saar
35f1f21a7f
added drot- and srot-kernel optimimized for POWER8
2016-03-27 08:57:11 +02:00
Zhang Xianyi
7b4b7179ba
Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
...
Cortex-A57: Fixes and Optimizations
2016-03-27 00:04:20 -04:00
Werner Saar
3d9a50e841
added optimized sswap kernel for POWER8
2016-03-25 17:34:55 +01:00
Werner Saar
828c849b44
added optimized ccopy kernel for POWER8
2016-03-25 16:54:25 +01:00
Werner Saar
ecc0bc9813
added optimized scopy kernel for POWER8
2016-03-25 16:06:56 +01:00
Werner Saar
12f209b7b0
added optimized zswap kernel for POWER8
2016-03-25 15:27:34 +01:00
Werner Saar
7316a87930
added optimized dswap kernel for POWER8
2016-03-25 14:35:43 +01:00
Werner Saar
0bff057a87
added optimized dcopy kernel for POWER8
2016-03-25 13:03:02 +01:00
Werner Saar
1e6cf9808c
added optimized dscal kernel for POWER8
2016-03-25 09:42:08 +01:00
Ashwin Sekhar T K
278511ad2d
Cortex-A57: Fix clang compilation errors
2016-03-24 10:42:04 +05:30
Ashwin Sekhar T K
3b5ffb49d3
Cortex-A57: Improve DGEMM 8x4 Implementation
2016-03-24 10:25:18 +05:30
Werner Saar
55eda3813b
added optimized zaxpy kernel for POWER8
2016-03-23 11:20:23 +01:00
Werner Saar
0664ba4c97
added optimized daxpy kernel for POWER8
2016-03-22 14:50:03 +01:00
Werner Saar
11c44dede1
added optimized sdot kernel for POWER8
2016-03-21 13:18:23 +01:00
Werner Saar
9e4584d069
added optimized zdot kernel for POWER8
2016-03-21 10:12:07 +01:00
Werner Saar
cd9fafc054
ddot for POWER8: updated licence information
2016-03-20 11:19:27 +01:00
Werner Saar
84b92e6373
added optimized ddot kernel for POWER8
2016-03-20 11:06:06 +01:00
wernsaar
c279a53ed8
Merge pull request #806 from wernsaar/develop
...
adding optimized single precision blas level3 kernels for POWER8
2016-03-18 12:46:16 +01:00
Werner Saar
e1df5a6e23
fixed sgemm- and strmm-kernel
2016-03-18 12:12:03 +01:00
Werner Saar
5c658f8746
add optimized cgemm- and ctrmm-kernel for POWER8
2016-03-18 08:17:25 +01:00
Ashwin Sekhar T K
5ac02f6dc7
Optimize Dgemm 4x4 for Cortex A57
2016-03-14 19:35:23 +05:30
Ashwin Sekhar T K
7aa1ad4923
Functional Assembly Kernels for CortexA57
...
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
2016-03-14 19:33:21 +05:30
Werner Saar
dcd15b546c
BUGFIX: KERNEL.POWER8
2016-03-14 14:36:59 +01:00
Werner Saar
96284ab295
added sgemm- and strmm-kernel for POWER8
2016-03-14 13:52:44 +01:00
Werner Saar
faa5e2e5e3
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
2016-03-10 11:10:38 +01:00
Werner Saar
fdf291be30
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
2016-03-10 09:42:07 +01:00
Werner Saar
c99cc41cbd
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
2016-03-09 14:02:03 +01:00
Werner Saar
acdff55a6a
Bugfix for ztrmv
2016-03-07 09:39:34 +01:00
Zhang Xianyi
7d6b68eb4a
Refs #786 . Revert to default assembly kernel.
2016-03-07 11:34:58 +08:00
Werner Saar
cd5241d0cf
modified KERNEL for power, to use the generic DSDOT-KERNEL
2016-03-06 09:07:24 +01:00
Zhang Xianyi
8c43d7fa5f
Merge remote-tracking branch 'origin/power8' into develop
...
Refs #774
2016-03-05 06:03:19 -05:00