Martin Kroeker
3178e4fea0
Remove explicit include of complex.h
2016-09-29 23:41:43 +02:00
Martin Kroeker
95c245ddb0
Remove explicit include of complex.h
2016-09-29 23:40:36 +02:00
Martin Kroeker
4b1b27347f
Remove explicit include of complex.h
2016-09-29 23:39:35 +02:00
Zhang Xianyi
161c927071
Merge pull request #968 from buffer51/develop
...
Updated CROSS_SUFFIX regex to work with CC containing arguments
2016-09-22 11:34:57 -04:00
Zhang Xianyi
662f89f059
Merge pull request #969 from sva-img/develop
...
DGEMM function split and data prefech
2016-09-22 11:33:51 -04:00
Shivraj Patil
54747fe24a
DGEMM function split and data prefech
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Paul MUSTIÈRE
157ee498ac
Updated CROSS_SUFFIX regex to work with CC containing arguments
2016-09-14 11:42:22 -07:00
Zhang Xianyi
b09cc3b9bb
Merge pull request #958 from intelfx/remove-stabs
...
common_arm.h, common_mips.h: get rid of .func directives
2016-09-13 16:15:37 -04:00
Ivan Shapovalov
6c0862a94f
common_arm.h, common_mips.h: get rid of .func directives
...
.func/.endfunc are gcc/gas-specific directives for generating stabs
debug information (and nothing more). This is near-useless now because
DWARF is commonly used, and not implemented in Clang. Hence building
OpenBLAS with Clang fails, and there is no sane way to detect GCC vs.
anything else with preprocessor definitions.
Hence, just remove these directives.
2016-09-09 03:37:11 +03:00
Zhang Xianyi
842d842751
Update develop for 0.2.20.dev.
2016-09-01 00:01:23 -04:00
Zhang Xianyi
85636ff1a0
Merge branch 'develop'
2016-08-31 23:58:42 -04:00
Zhang Xianyi
821affb9a0
Update doc for 0.2.19.
2016-08-31 23:58:29 -04:00
Zhang Xianyi
515bc56ea9
Refs #946 . Use nrm2 reference implementation for Power8.
2016-08-18 18:59:43 -07:00
Zhang Xianyi
ae70b916f4
Refs #929 . Deal with zero and NaNs for scale.
2016-08-18 10:24:42 -07:00
Zhang Xianyi
9ea0144482
Merge pull request #941 from sva-img/develop
...
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
2016-08-18 09:31:31 -04:00
Zhang Xianyi
1f217a6175
Merge pull request #943 from ibmsoe/IBMMASS_Support
...
Added support of IBM's MASS library that optimizes performance on Pow…
2016-08-12 17:20:59 -04:00
nishidha@us.ibm.com
78348a2853
Added support of IBM's MASS library that optimizes performance on Power architectures
2016-08-11 14:43:26 +05:30
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Zhang Xianyi
b544be914d
Merge pull request #933 from ashwinyes/develop_aarch64_20160726_Dgemm_8x4_Opts
...
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 09:54:31 -04:00
Ashwin Sekhar T K
c54a29bb48
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 10:58:21 +05:30
Zhang Xianyi
ff4c5deafa
Merge pull request #930 from sva-img/develop
...
P6600/I6400 Build fix.
2016-07-22 11:42:30 -04:00
Shivraj Patil
22b9c2747d
P6600/I6400 Build fix. Reverted the changes which was done to support for MIPS n32 ABI
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-22 18:45:06 +05:30
Zhang Xianyi
27b5211ccd
Merge pull request #927 from sva-img/develop
...
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
2016-07-15 11:17:30 -04:00
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi
9e44f3ddd0
Refs #917 Avoid detecting gfortran bug on IBM POWER + Ubuntu
2016-07-14 13:09:36 -07:00
Zhang Xianyi
eece9fd889
Merge pull request #926 from vriera/develop
...
Complete support for MIPS n32 ABI
2016-07-14 15:49:33 -04:00
Zhang Xianyi
5dfa0712c3
Merge pull request #925 from martin-frbg/develop
...
Update zgetrf2.f, cpuid_x86.c, dynamic.c
2016-07-14 15:48:58 -04:00
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
...
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Zhang Xianyi
7f2409a8e1
Merge pull request #918 from sva-img/develop
...
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM.
2016-07-14 15:45:39 -04:00
Vicente Olivert Riera
7f28cd1f88
Complete support for MIPS n32 ABI
...
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
2016-07-14 17:51:04 +01:00
Martin Kroeker
154729908e
Update cpuid_x86.c
2016-07-14 17:29:34 +02:00
Martin Kroeker
97bd1e42c8
Update cpuid_x86.c
2016-07-14 12:25:17 +02:00
Martin Kroeker
7de829f713
Update dynamic.c
...
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
Martin Kroeker
9b69d8a8e5
Update zgetrf2.f
...
Trivial typo correction (ZERBLA => XERBLA) to fix #910
2016-07-14 11:41:57 +02:00
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K
8a40f1355e
Improvements to GEMV kernels
2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K
78782485b6
Improvements to COPY and IAMAX kernels
2016-07-14 13:49:34 +05:30
Ashwin Sekhar T K
8d86d14d3f
Add time prints in benchmark output
2016-07-14 13:48:13 +05:30
Ashwin Sekhar T K
925d4e1dc6
Add IAMAX and NRM2 benchmarks
2016-07-14 13:46:01 +05:30
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Zhang Xianyi
437c7d64f2
Merge pull request #913 from dpfoose/develop
...
Small change to allow compiling with USE_OPENMP on MSVC
2016-06-27 10:05:30 -04:00
Zhang Xianyi
ca5c25c870
Merge pull request #907 from jeromerobert/bug786
...
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
2016-06-27 10:04:54 -04:00
Zhang Xianyi
4a30a2584a
Merge pull request #897 from ksraste/develop
...
STRSM optimized for MSA
2016-06-27 10:04:18 -04:00
mdong
098d8ec5d6
remove input from clobbered list
2016-06-24 16:37:58 -04:00
Daniel Patrick Foose
a94f2b7848
Change to allow compiling with USE_OPENMP on MSVC
...
MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition.
2016-06-14 14:37:28 -04:00
Jerome Robert
d346c533b1
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
...
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
2016-06-07 16:11:09 +02:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Werner Saar
41000c8443
added directory for optimized lapack fortan codes and added dlaqr5.f
2016-05-31 12:53:07 +02:00
Kaustubh Raste
011431b9d7
STRSM optimized for MSA
...
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
2016-05-31 10:17:23 +05:30