Commit Graph

7452 Commits

Author SHA1 Message Date
Ashwin Sekhar T K a0128aa489 ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.

To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Martin Kroeker 627133f9ad Merge pull request #1333 from martin-frbg/haswell32
Fix 32bit HASWELL builds
2017-10-24 11:25:03 +02:00
Martin Kroeker 0e2cf102e1 Fix 32bit HASWELL 2017-10-24 10:07:44 +02:00
Martin Kroeker 5e3e91d0fc Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision 2017-10-22 18:18:51 +02:00
Martin Kroeker 28c3fa8950 Add dsdot 2017-10-16 23:29:03 +02:00
Martin Kroeker 8ac87c1cb6 Implement DSDOT with unchanged sdot microkernels 2017-10-16 23:27:51 +02:00
Martin Kroeker b7cee00455 Merge pull request #1327 from martin-frbg/cmake-relapack
Make ReLAPACK available in cmake builds
2017-10-12 20:26:35 +02:00
Martin Kroeker 962b20a9bb Optionally add ReLAPACK to LIB_COMPONENTS 2017-10-12 17:02:01 +02:00
Martin Kroeker fbf83f4833 Add cmake build list file for ReLAPACK 2017-10-12 17:00:00 +02:00
Martin Kroeker 78cec6209c Add ReLAPACK option 2017-10-12 16:58:37 +02:00
Martin Kroeker c460027dbe Merge pull request #1325 from grisuthedragon/patch-1
Update README.md to include POWER8
2017-10-10 12:14:34 +02:00
Martin Köhler bfa9b9f6b2 Update README.md
Add POWER 8 to the list of additional architectures.
2017-10-10 10:12:04 +02:00
Martin Kroeker c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Martin Kroeker db72ad8f6a Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker 97ecd4996a Merge pull request #1319 from martin-frbg/issue601
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
2017-10-08 23:31:06 +02:00
Martin Kroeker 1eb43cccad Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
2017-10-08 23:30:46 +02:00
Martin Kroeker 9d92f526dd Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker 514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon 30486a356c Reduce number of data partitions in n. 2017-10-04 12:37:49 -07:00
Martin Kroeker e1b2502840 Merge pull request #1316 from timmoon10/develop
Variable thread count for multi-threaded GEMMs
2017-10-04 20:35:00 +02:00
Tim Moon 9de52b489a Cleaning up and documenting multi-threaded GEMM code. 2017-10-03 16:32:08 -07:00
Tim Moon 860dcfc703 Use 2D thread distribution for small GEMMs.
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Martin Kroeker f96afd94b0 Fix out-of-bounds accesses where the data should be zero anyway 2017-10-01 01:06:39 +02:00
Martin Kroeker ebe84215e4 Merge pull request #1318 from pv/potrf-smoketest
Add trivial smoketest for xpotrf
2017-09-30 21:31:28 +02:00
Pauli Virtanen 845e6d750f Add trivial smoketest for xpotrf 2017-09-30 19:07:54 +02:00
Tim Moon a89d6711c6 Increasing flexibility of GEMM benchmark.
m, n, and k can be set to arbitrary constants. A and B matrices can be transposed independently.
2017-09-28 12:56:29 -07:00
Martin Kroeker 9c017a2218 Save and restore VSX registers 2017-09-28 12:17:09 +02:00
Tim Moon 0e6b11b708 Merge https://github.com/timmoon10/OpenBLAS into develop 2017-09-27 19:26:38 -07:00
Tim Moon 6aaa107865 Reducing threads for multi-threaded GEMMs on small matrices. 2017-09-27 19:25:33 -07:00
Martin Kroeker 00c42dc815 Merge pull request #1314 from martin-frbg/nofortran-fix-2
Rewrite NOFORTRAN conditionals
2017-09-26 10:34:18 +02:00
Martin Kroeker 79e754e548 Rewrite NOFORTRAN conditionals
... so that they do not trigger accidentally when NOFORTRAN is empty/unset
2017-09-25 23:45:14 +02:00
Martin Kroeker 2ccd7f6e0c Merge pull request #1310 from sva-img/develop
Added mips I6500 core
2017-09-22 09:34:54 +02:00
Shivraj Patil e3d844b062 Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-09-22 11:57:43 +05:30
Martin Kroeker def146efed Merge pull request #1308 from sebastien-villemot/develop
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
2017-09-19 14:04:37 +02:00
Sébastien Villemot 7543e578a4 Add support for TARGET=ZARCH_GENERIC and TARGET=Z13 2017-09-19 12:16:42 +02:00
Martin Kroeker 601c71fe54 Merge pull request #1304 from martin-frbg/aix-build-fixes
(Plain make) build system fixes for AIX
2017-09-18 10:16:40 +02:00
Martin Kroeker 3810a6fd99 (Plain make) build system fixes for AIX
- retry fortran compiler test with aix-specific option if generic -m32/-m64 fails
- pass any custom ARFLAGS to lapack
- no addition of -m32/-m64 to the CFLAGS and FFLAGS on AIX
2017-09-18 01:29:21 +02:00
Martin Kroeker 742f54c235 Merge pull request #1303 from martin-frbg/imatcopy-rowscols
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
2017-09-14 21:46:26 +02:00
Martin Kroeker d674fbb4c7 Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Equivalent of #1244 (issue #899) for the non-complex cases. Fixes #1289
2017-09-14 19:59:05 +02:00
Martin Kroeker 2922c15f36 Merge pull request #1302 from martin-frbg/nofortran-fix
Remove default FEXTRALIBS in NOFORTRAN case
2017-09-14 11:54:20 +02:00
Martin Kroeker 3a245a376f Remove default FEXTRALIBS in NOFORTRAN case 2017-09-14 09:21:04 +02:00
Martin Kroeker 46c9357c72 Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Martin Kroeker 1c3e2d3dd5 Merge pull request #1293 from embray/cygwin/install
More canonical installation on Cygwin
2017-09-09 23:46:27 +02:00
Martin Kroeker f66d908282 Merge pull request #1299 from martin-frbg/race_fixes
Fix thread data races uncovered by gcc thread sanitizer
2017-09-09 23:41:53 +02:00
Martin Kroeker ba1f91f17b Convert another caller of "allocation" to LOCK_COMMAND
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker f460776f0f Fix thread data races 2017-09-09 19:07:06 +02:00
Martin Kroeker e882f3d6f3 Fix thread data race in memory.c 2017-09-09 18:58:38 +02:00
Erik M. Bray dddedbab5d More canonical installation on Cygwin:
* The DLL is named cygopenblas.dll, not libopenblas.dll
* The import lib (still called libopenblas.dll.a) is installed
2017-09-07 14:18:56 +02:00
Abdurrauf 1cfdb2295d Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Martin Kroeker 00740c0e34 Merge pull request #1290 from martin-frbg/imatcopy
Use in-place transform shortcut only if matrix is square
2017-09-03 13:02:10 +02:00