Commit Graph

2262 Commits

Author SHA1 Message Date
Zhang Xianyi 821affb9a0 Update doc for 0.2.19. 2016-08-31 23:58:29 -04:00
Zhang Xianyi 515bc56ea9 Refs #946. Use nrm2 reference implementation for Power8. 2016-08-18 18:59:43 -07:00
Zhang Xianyi ae70b916f4 Refs #929. Deal with zero and NaNs for scale. 2016-08-18 10:24:42 -07:00
Zhang Xianyi 9ea0144482 Merge pull request #941 from sva-img/develop
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
2016-08-18 09:31:31 -04:00
Zhang Xianyi 1f217a6175 Merge pull request #943 from ibmsoe/IBMMASS_Support
Added support of IBM's MASS library that optimizes performance on Pow…
2016-08-12 17:20:59 -04:00
nishidha@us.ibm.com 78348a2853 Added support of IBM's MASS library that optimizes performance on Power architectures 2016-08-11 14:43:26 +05:30
Shivraj Patil 9687437928 MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil d1c6469283 MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Zhang Xianyi b544be914d Merge pull request #933 from ashwinyes/develop_aarch64_20160726_Dgemm_8x4_Opts
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 09:54:31 -04:00
Ashwin Sekhar T K c54a29bb48 Cortex A57: Improvements to DGEMM 8x4 kernel 2016-07-26 10:58:21 +05:30
Zhang Xianyi ff4c5deafa Merge pull request #930 from sva-img/develop
P6600/I6400 Build fix.
2016-07-22 11:42:30 -04:00
Shivraj Patil 22b9c2747d P6600/I6400 Build fix. Reverted the changes which was done to support for MIPS n32 ABI
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-22 18:45:06 +05:30
Zhang Xianyi 27b5211ccd Merge pull request #927 from sva-img/develop
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
2016-07-15 11:17:30 -04:00
Shivraj Patil beb1d076a4 Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi 9e44f3ddd0 Refs #917 Avoid detecting gfortran bug on IBM POWER + Ubuntu 2016-07-14 13:09:36 -07:00
Zhang Xianyi eece9fd889 Merge pull request #926 from vriera/develop
Complete support for MIPS n32 ABI
2016-07-14 15:49:33 -04:00
Zhang Xianyi 5dfa0712c3 Merge pull request #925 from martin-frbg/develop
Update zgetrf2.f, cpuid_x86.c, dynamic.c
2016-07-14 15:48:58 -04:00
Zhang Xianyi 8a592ee386 Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Zhang Xianyi 7f2409a8e1 Merge pull request #918 from sva-img/develop
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM.
2016-07-14 15:45:39 -04:00
Vicente Olivert Riera 7f28cd1f88 Complete support for MIPS n32 ABI
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
2016-07-14 17:51:04 +01:00
Martin Kroeker 154729908e Update cpuid_x86.c 2016-07-14 17:29:34 +02:00
Martin Kroeker 97bd1e42c8 Update cpuid_x86.c 2016-07-14 12:25:17 +02:00
Martin Kroeker 7de829f713 Update dynamic.c
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
Martin Kroeker 9b69d8a8e5 Update zgetrf2.f
Trivial typo correction (ZERBLA => XERBLA) to fix #910
2016-07-14 11:41:57 +02:00
Ashwin Sekhar T K 0a5ff9f9f9 Improvements to TRMM and GEMM kernels 2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K 8a40f1355e Improvements to GEMV kernels 2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K 78782485b6 Improvements to COPY and IAMAX kernels 2016-07-14 13:49:34 +05:30
Ashwin Sekhar T K 8d86d14d3f Add time prints in benchmark output 2016-07-14 13:48:13 +05:30
Ashwin Sekhar T K 925d4e1dc6 Add IAMAX and NRM2 benchmarks 2016-07-14 13:46:01 +05:30
Shivraj Patil 57df7956ee Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi 437c7d64f2 Merge pull request #913 from dpfoose/develop
Small change to allow compiling with USE_OPENMP on MSVC
2016-06-27 10:05:30 -04:00
Zhang Xianyi ca5c25c870 Merge pull request #907 from jeromerobert/bug786
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
2016-06-27 10:04:54 -04:00
Zhang Xianyi 4a30a2584a Merge pull request #897 from ksraste/develop
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
Daniel Patrick Foose a94f2b7848 Change to allow compiling with USE_OPENMP on MSVC
MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition.
2016-06-14 14:37:28 -04:00
Jerome Robert d346c533b1 Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
2016-06-07 16:11:09 +02:00
Werner Saar f04af36ad0 Merge pull request #898 from wernsaar/develop
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Werner Saar 41000c8443 added directory for optimized lapack fortan codes and added dlaqr5.f 2016-05-31 12:53:07 +02:00
Kaustubh Raste 011431b9d7 STRSM optimized for MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30
Kaustubh Raste c8a7860eb3 STRSM optimized
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-30 21:17:00 +05:30
Zhang Xianyi 2daad2bcb5 Merge pull request #893 from biddisco/develop
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
2016-05-30 14:52:58 +08:00
Zhang Xianyi bac478d17e Merge pull request #891 from rndfax/develop
mips64/axpy: fix error when INCY == 0
2016-05-30 14:52:40 +08:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Aleksey Kuleshov fca66262c4 mips64/axpy: fix error when INCY == 0 2016-05-23 13:30:27 +03:00
Werner Saar 412bcd187a optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S 2016-05-23 11:20:41 +02:00
Werner Saar bd06b246cc Merge pull request #890 from wernsaar/develop
optimized dtrsm_kernel_LT for POWER8
2016-05-22 16:01:35 +02:00
Werner Saar 8b140220c8 optimized dtrsm_kernel_LT for POWER8 2016-05-22 15:20:04 +02:00
Werner Saar 318cad9c37 added trsm bencharks for POWER8 to benchmark/Makefile 2016-05-22 13:51:47 +02:00
Werner Saar 8fb5a1aaff added optimized dtrsm_LT kernel for POWER8 2016-05-22 13:09:05 +02:00
Zhang Xianyi 7d0358475d Merge the patch for musl libc. 2016-05-22 01:08:44 +08:00
Zhang Xianyi b46f680f01 Merge pull request #887 from ksraste/develop
STRSM optimization for MIPS P5600 and I6400 using MSA
2016-05-21 07:17:21 +08:00