Commit Graph

3390 Commits

Author SHA1 Message Date
Howard Su ff1da01476 USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU.
2016-10-13 12:37:50 +00:00
Zhang Xianyi ef52a9266b Fixed #979. Patch for NetBSD. 2016-10-13 10:17:07 +08:00
Zhang Xianyi 4f38ae3199 Merge pull request #970 from martin-frbg/develop
Remove implicit inclusions of complex.h in various zdot implementations
2016-10-13 10:13:56 +08:00
Zhang Xianyi 4baf0c7cfc Merge pull request #980 from kiwifb/utest_ldflags
make utest/Makefile respect LDFLAGS
2016-10-13 10:13:12 +08:00
Zhang Xianyi 595a0224e4 Merge pull request #973 from vladimir-ch/fix-lapacke-xlarfb
LAPACKE: fix wrong direction check in LAPACKE_?larfb_work
2016-10-13 10:12:35 +08:00
François Bissey f124ffab47 make utest/Makefile respect LDFLAGS 2016-10-13 09:32:25 +13:00
Martin Kroeker 91610f3835 Update zdot_msa.c 2016-10-05 18:59:09 +02:00
Martin Kroeker 6e22ecf102 Update zdot.c 2016-10-05 18:58:03 +02:00
Martin Kroeker 6221d6df5f Update zdot.c 2016-10-05 18:57:14 +02:00
Vladimir Chalupecky 117d3371d4 LAPACKE: fix wrong direction check in LAPACKE_?larfb_work
Closes #971
2016-10-01 05:31:30 +09:00
Martin Kroeker 16446d1d23 Remove explicit include of complex.h 2016-09-29 23:45:56 +02:00
Martin Kroeker a6e9e0b94b Remove explicit include of complex.h 2016-09-29 23:43:28 +02:00
Martin Kroeker 3178e4fea0 Remove explicit include of complex.h 2016-09-29 23:41:43 +02:00
Martin Kroeker 95c245ddb0 Remove explicit include of complex.h 2016-09-29 23:40:36 +02:00
Martin Kroeker 4b1b27347f Remove explicit include of complex.h 2016-09-29 23:39:35 +02:00
Zhang Xianyi 161c927071 Merge pull request #968 from buffer51/develop
Updated CROSS_SUFFIX regex to work with CC containing arguments
2016-09-22 11:34:57 -04:00
Zhang Xianyi 662f89f059 Merge pull request #969 from sva-img/develop
DGEMM function split and data prefech
2016-09-22 11:33:51 -04:00
Shivraj Patil 54747fe24a DGEMM function split and data prefech
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-09-22 17:25:46 +05:30
Paul MUSTIÈRE 157ee498ac Updated CROSS_SUFFIX regex to work with CC containing arguments 2016-09-14 11:42:22 -07:00
Zhang Xianyi b09cc3b9bb Merge pull request #958 from intelfx/remove-stabs
common_arm.h, common_mips.h: get rid of .func directives
2016-09-13 16:15:37 -04:00
Ivan Shapovalov 6c0862a94f common_arm.h, common_mips.h: get rid of .func directives
.func/.endfunc are gcc/gas-specific directives for generating stabs
debug information (and nothing more). This is near-useless now because
DWARF is commonly used, and not implemented in Clang. Hence building
OpenBLAS with Clang fails, and there is no sane way to detect GCC vs.
anything else with preprocessor definitions.

Hence, just remove these directives.
2016-09-09 03:37:11 +03:00
Zhang Xianyi 842d842751 Update develop for 0.2.20.dev. 2016-09-01 00:01:23 -04:00
Zhang Xianyi 85636ff1a0 Merge branch 'develop' 2016-08-31 23:58:42 -04:00
Zhang Xianyi 821affb9a0 Update doc for 0.2.19. 2016-08-31 23:58:29 -04:00
Zhang Xianyi 515bc56ea9 Refs #946. Use nrm2 reference implementation for Power8. 2016-08-18 18:59:43 -07:00
Zhang Xianyi ae70b916f4 Refs #929. Deal with zero and NaNs for scale. 2016-08-18 10:24:42 -07:00
Zhang Xianyi 9ea0144482 Merge pull request #941 from sva-img/develop
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
2016-08-18 09:31:31 -04:00
Zhang Xianyi 1f217a6175 Merge pull request #943 from ibmsoe/IBMMASS_Support
Added support of IBM's MASS library that optimizes performance on Pow…
2016-08-12 17:20:59 -04:00
nishidha@us.ibm.com 78348a2853 Added support of IBM's MASS library that optimizes performance on Power architectures 2016-08-11 14:43:26 +05:30
Shivraj Patil 9687437928 MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil d1c6469283 MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Zhang Xianyi b544be914d Merge pull request #933 from ashwinyes/develop_aarch64_20160726_Dgemm_8x4_Opts
Cortex A57: Improvements to DGEMM 8x4 kernel
2016-07-26 09:54:31 -04:00
Ashwin Sekhar T K c54a29bb48 Cortex A57: Improvements to DGEMM 8x4 kernel 2016-07-26 10:58:21 +05:30
Zhang Xianyi ff4c5deafa Merge pull request #930 from sva-img/develop
P6600/I6400 Build fix.
2016-07-22 11:42:30 -04:00
Shivraj Patil 22b9c2747d P6600/I6400 Build fix. Reverted the changes which was done to support for MIPS n32 ABI
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-22 18:45:06 +05:30
Zhang Xianyi 27b5211ccd Merge pull request #927 from sva-img/develop
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
2016-07-15 11:17:30 -04:00
Shivraj Patil beb1d076a4 Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi 9e44f3ddd0 Refs #917 Avoid detecting gfortran bug on IBM POWER + Ubuntu 2016-07-14 13:09:36 -07:00
Zhang Xianyi eece9fd889 Merge pull request #926 from vriera/develop
Complete support for MIPS n32 ABI
2016-07-14 15:49:33 -04:00
Zhang Xianyi 5dfa0712c3 Merge pull request #925 from martin-frbg/develop
Update zgetrf2.f, cpuid_x86.c, dynamic.c
2016-07-14 15:48:58 -04:00
Zhang Xianyi 8a592ee386 Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Zhang Xianyi 7f2409a8e1 Merge pull request #918 from sva-img/develop
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM.
2016-07-14 15:45:39 -04:00
Vicente Olivert Riera 7f28cd1f88 Complete support for MIPS n32 ABI
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
2016-07-14 17:51:04 +01:00
Martin Kroeker 154729908e Update cpuid_x86.c 2016-07-14 17:29:34 +02:00
Martin Kroeker 97bd1e42c8 Update cpuid_x86.c 2016-07-14 12:25:17 +02:00
Martin Kroeker 7de829f713 Update dynamic.c
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
Martin Kroeker 9b69d8a8e5 Update zgetrf2.f
Trivial typo correction (ZERBLA => XERBLA) to fix #910
2016-07-14 11:41:57 +02:00
Ashwin Sekhar T K 0a5ff9f9f9 Improvements to TRMM and GEMM kernels 2016-07-14 13:56:04 +05:30
Ashwin Sekhar T K 8a40f1355e Improvements to GEMV kernels 2016-07-14 13:50:38 +05:30
Ashwin Sekhar T K 78782485b6 Improvements to COPY and IAMAX kernels 2016-07-14 13:49:34 +05:30