Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
...
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Martin Kroeker
627133f9ad
Merge pull request #1333 from martin-frbg/haswell32
...
Fix 32bit HASWELL builds
2017-10-24 11:25:03 +02:00
Martin Kroeker
0e2cf102e1
Fix 32bit HASWELL
2017-10-24 10:07:44 +02:00
Martin Kroeker
5e3e91d0fc
Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision
2017-10-22 18:18:51 +02:00
Martin Kroeker
28c3fa8950
Add dsdot
2017-10-16 23:29:03 +02:00
Martin Kroeker
8ac87c1cb6
Implement DSDOT with unchanged sdot microkernels
2017-10-16 23:27:51 +02:00
Martin Kroeker
b7cee00455
Merge pull request #1327 from martin-frbg/cmake-relapack
...
Make ReLAPACK available in cmake builds
2017-10-12 20:26:35 +02:00
Martin Kroeker
962b20a9bb
Optionally add ReLAPACK to LIB_COMPONENTS
2017-10-12 17:02:01 +02:00
Martin Kroeker
fbf83f4833
Add cmake build list file for ReLAPACK
2017-10-12 17:00:00 +02:00
Martin Kroeker
78cec6209c
Add ReLAPACK option
2017-10-12 16:58:37 +02:00
Martin Kroeker
c460027dbe
Merge pull request #1325 from grisuthedragon/patch-1
...
Update README.md to include POWER8
2017-10-10 12:14:34 +02:00
Martin Köhler
bfa9b9f6b2
Update README.md
...
Add POWER 8 to the list of additional architectures.
2017-10-10 10:12:04 +02:00
Martin Kroeker
c7a8512d12
Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names ( #1323 )
...
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295 )
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Martin Kroeker
db72ad8f6a
Merge pull request #1320 from timmoon10/develop
...
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker
97ecd4996a
Merge pull request #1319 from martin-frbg/issue601
...
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
2017-10-08 23:31:06 +02:00
Martin Kroeker
1eb43cccad
Merge pull request #1317 from martin-frbg/power8-asm
...
Save and restore VSX registers
2017-10-08 23:30:46 +02:00
Martin Kroeker
9d92f526dd
Comment out a code block that performs out-of-bounds memory accesses
...
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
...
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon
30486a356c
Reduce number of data partitions in n.
2017-10-04 12:37:49 -07:00
Martin Kroeker
e1b2502840
Merge pull request #1316 from timmoon10/develop
...
Variable thread count for multi-threaded GEMMs
2017-10-04 20:35:00 +02:00
Tim Moon
9de52b489a
Cleaning up and documenting multi-threaded GEMM code.
2017-10-03 16:32:08 -07:00
Tim Moon
860dcfc703
Use 2D thread distribution for small GEMMs.
...
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Martin Kroeker
f96afd94b0
Fix out-of-bounds accesses where the data should be zero anyway
2017-10-01 01:06:39 +02:00
Martin Kroeker
ebe84215e4
Merge pull request #1318 from pv/potrf-smoketest
...
Add trivial smoketest for xpotrf
2017-09-30 21:31:28 +02:00
Pauli Virtanen
845e6d750f
Add trivial smoketest for xpotrf
2017-09-30 19:07:54 +02:00
Tim Moon
a89d6711c6
Increasing flexibility of GEMM benchmark.
...
m, n, and k can be set to arbitrary constants. A and B matrices can be transposed independently.
2017-09-28 12:56:29 -07:00
Martin Kroeker
9c017a2218
Save and restore VSX registers
2017-09-28 12:17:09 +02:00
Tim Moon
0e6b11b708
Merge https://github.com/timmoon10/OpenBLAS into develop
2017-09-27 19:26:38 -07:00
Tim Moon
6aaa107865
Reducing threads for multi-threaded GEMMs on small matrices.
2017-09-27 19:25:33 -07:00
Martin Kroeker
00c42dc815
Merge pull request #1314 from martin-frbg/nofortran-fix-2
...
Rewrite NOFORTRAN conditionals
2017-09-26 10:34:18 +02:00
Martin Kroeker
79e754e548
Rewrite NOFORTRAN conditionals
...
... so that they do not trigger accidentally when NOFORTRAN is empty/unset
2017-09-25 23:45:14 +02:00
Martin Kroeker
2ccd7f6e0c
Merge pull request #1310 from sva-img/develop
...
Added mips I6500 core
2017-09-22 09:34:54 +02:00
Shivraj Patil
e3d844b062
Added mips I6500 core
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-09-22 11:57:43 +05:30
Martin Kroeker
def146efed
Merge pull request #1308 from sebastien-villemot/develop
...
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
2017-09-19 14:04:37 +02:00
Sébastien Villemot
7543e578a4
Add support for TARGET=ZARCH_GENERIC and TARGET=Z13
2017-09-19 12:16:42 +02:00
Martin Kroeker
601c71fe54
Merge pull request #1304 from martin-frbg/aix-build-fixes
...
(Plain make) build system fixes for AIX
2017-09-18 10:16:40 +02:00
Martin Kroeker
3810a6fd99
(Plain make) build system fixes for AIX
...
- retry fortran compiler test with aix-specific option if generic -m32/-m64 fails
- pass any custom ARFLAGS to lapack
- no addition of -m32/-m64 to the CFLAGS and FFLAGS on AIX
2017-09-18 01:29:21 +02:00
Martin Kroeker
742f54c235
Merge pull request #1303 from martin-frbg/imatcopy-rowscols
...
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
2017-09-14 21:46:26 +02:00
Martin Kroeker
d674fbb4c7
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
...
Equivalent of #1244 (issue #899 ) for the non-complex cases. Fixes #1289
2017-09-14 19:59:05 +02:00
Martin Kroeker
2922c15f36
Merge pull request #1302 from martin-frbg/nofortran-fix
...
Remove default FEXTRALIBS in NOFORTRAN case
2017-09-14 11:54:20 +02:00
Martin Kroeker
3a245a376f
Remove default FEXTRALIBS in NOFORTRAN case
2017-09-14 09:21:04 +02:00
Martin Kroeker
46c9357c72
Merge pull request #1288 from quickwritereader/develop
...
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Martin Kroeker
1c3e2d3dd5
Merge pull request #1293 from embray/cygwin/install
...
More canonical installation on Cygwin
2017-09-09 23:46:27 +02:00
Martin Kroeker
f66d908282
Merge pull request #1299 from martin-frbg/race_fixes
...
Fix thread data races uncovered by gcc thread sanitizer
2017-09-09 23:41:53 +02:00
Martin Kroeker
ba1f91f17b
Convert another caller of "allocation" to LOCK_COMMAND
...
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
f460776f0f
Fix thread data races
2017-09-09 19:07:06 +02:00
Martin Kroeker
e882f3d6f3
Fix thread data race in memory.c
2017-09-09 18:58:38 +02:00
Erik M. Bray
dddedbab5d
More canonical installation on Cygwin:
...
* The DLL is named cygopenblas.dll, not libopenblas.dll
* The import lib (still called libopenblas.dll.a) is installed
2017-09-07 14:18:56 +02:00
Abdurrauf
1cfdb2295d
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
2017-09-06 16:41:08 +04:00
Martin Kroeker
00740c0e34
Merge pull request #1290 from martin-frbg/imatcopy
...
Use in-place transform shortcut only if matrix is square
2017-09-03 13:02:10 +02:00