Commit Graph

3078 Commits

Author SHA1 Message Date
Martin Kroeker 5f402b7759
Remove unused (loop?) variable j from the gemv_n_4 implementations 2017-11-14 23:29:42 +01:00
Martin Kroeker 65bf0a343c
Remove unused variable btpr 2017-11-14 23:25:50 +01:00
Martin Kroeker acf3d34bc5
Silence an unused variable warning with a cast
l2 cache size is not universally needed to assign default unrolling limits, but neither putting its declaration inside an ifdef nor cloning it into all ifdef sections that need it really makes sense here.
2017-11-14 23:23:44 +01:00
Martin Kroeker 8e75f7dcb4
Merge pull request #1353 from xoviat/patch-1
[appveyor] use flang from conda-forge
2017-11-10 22:16:31 +01:00
Martin Kroeker bd3546704c
Merge pull request #1356 from martin-frbg/lapack-issue196
Break out of potentially infinite rescaling loop after 1000 iterations
2017-11-10 22:15:27 +01:00
Martin Kroeker 2df1e3372d
Break out of potentially infinite rescaling loop after 1000 iterations
Inf values in the input vector will survive rescaling, causing an infinite loop. The value of 1000 is arbitrarily chosen as a large but finite value with the intention to never interfere with regular calculations.
2017-11-10 20:02:21 +01:00
Martin Kroeker 4271b2b158
Merge pull request #1354 from martin-frbg/shmem
Try to handle shmget or shmat failing
2017-11-10 09:11:03 +01:00
Martin Kroeker 148493df89
Merge branch 'develop' into shmem 2017-11-09 23:25:15 +01:00
Martin Kroeker 415555a9c1
Merge branch 'develop' into shmem 2017-11-09 23:20:54 +01:00
Martin Kroeker 2a6fef9a55
Try to handle shmget or shmat failing
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
xoviat 307305aeb5
[appeyor] use flang from conda-forge
This flang will be updated in the future. We leave cmake because it's
not yet released with fortran support
2017-11-09 15:10:02 -06:00
Martin Kroeker cc26cdce0c
Merge pull request #1352 from martin-frbg/issue1351
Output an error message when shmat() fails
2017-11-09 21:08:16 +01:00
Martin Kroeker d8576826c4
Output an error message when shmat() fails
Observed in #1351 with SELinux as the likely culprit. Without the message, the user saw a segfault with no apparent reason
2017-11-09 17:31:44 +01:00
Martin Kroeker c6968edec4
Merge pull request #1350 from insertinterestingnamehere/flang
WIP: Support for Flang on Windows
2017-11-08 11:40:13 +01:00
Isuru Fernando 9268314290 Fix gensymbol script 2017-11-06 21:12:38 -06:00
Ian Henriksen 3ace0fda3f
Merge pull request #1 from xoviat/patch-1
[appveyor] fixes
2017-11-06 15:17:24 -06:00
xoviat 3cfc64404a
[appveyor] fixes 2017-11-06 15:05:20 -06:00
Ian Henriksen 72956e8950 Build MATGEN LAPACK routines by default when building with CMake. 2017-11-06 14:47:27 -06:00
Ian Henriksen 505dc08635 Update lapacke.cmake with routines added in LAPACK 3.7.0. 2017-11-06 14:43:33 -06:00
Ian Henriksen 61587b0670 Update lapack.cmake with additional routines from LAPACK version 3.7.0. 2017-11-06 14:41:02 -06:00
Ian Henriksen 632fc75d77 Allow using compilers other than gfortran in conjunction with
MSVC or clang-cl.
2017-11-06 14:39:12 -06:00
Martin Kroeker 2c222f1faa
Modify complex CBLAS functions to take void pointers
Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)
2017-11-05 15:53:14 +01:00
Martin Kroeker 66ac898f64
Change prototypes of all complex functions to use void*
Change prototypes of complex functions to use void pointers like the other implementations of CBLAS
2017-11-05 15:42:33 +01:00
Martin Kroeker ab87ee6b48 Merge pull request #1329 from martin-frbg/dsdot
(Trivial) optimized dsdot implementation for HASWELL
2017-10-25 19:13:38 +02:00
Martin Kroeker a07807caac Eliminate loop code when called as/from dsdot 2017-10-25 16:45:41 +02:00
Martin Kroeker b71f4fe681 Merge pull request #1334 from ashwinyes/develop_aarch64_20171024_addlocallabels
ARM64: Convert all labels to local labels
2017-10-24 19:50:03 +02:00
Ashwin Sekhar T K a0128aa489 ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.

To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Martin Kroeker 627133f9ad Merge pull request #1333 from martin-frbg/haswell32
Fix 32bit HASWELL builds
2017-10-24 11:25:03 +02:00
Martin Kroeker 0e2cf102e1 Fix 32bit HASWELL 2017-10-24 10:07:44 +02:00
Martin Kroeker 5e3e91d0fc Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision 2017-10-22 18:18:51 +02:00
Martin Kroeker 28c3fa8950 Add dsdot 2017-10-16 23:29:03 +02:00
Martin Kroeker 8ac87c1cb6 Implement DSDOT with unchanged sdot microkernels 2017-10-16 23:27:51 +02:00
Martin Kroeker b7cee00455 Merge pull request #1327 from martin-frbg/cmake-relapack
Make ReLAPACK available in cmake builds
2017-10-12 20:26:35 +02:00
Martin Kroeker 962b20a9bb Optionally add ReLAPACK to LIB_COMPONENTS 2017-10-12 17:02:01 +02:00
Martin Kroeker fbf83f4833 Add cmake build list file for ReLAPACK 2017-10-12 17:00:00 +02:00
Martin Kroeker 78cec6209c Add ReLAPACK option 2017-10-12 16:58:37 +02:00
Martin Kroeker c460027dbe Merge pull request #1325 from grisuthedragon/patch-1
Update README.md to include POWER8
2017-10-10 12:14:34 +02:00
Martin Köhler bfa9b9f6b2 Update README.md
Add POWER 8 to the list of additional architectures.
2017-10-10 10:12:04 +02:00
Martin Kroeker c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Martin Kroeker db72ad8f6a Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker 97ecd4996a Merge pull request #1319 from martin-frbg/issue601
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
2017-10-08 23:31:06 +02:00
Martin Kroeker 1eb43cccad Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
2017-10-08 23:30:46 +02:00
Martin Kroeker 9d92f526dd Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker 514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon 30486a356c Reduce number of data partitions in n. 2017-10-04 12:37:49 -07:00
Martin Kroeker e1b2502840 Merge pull request #1316 from timmoon10/develop
Variable thread count for multi-threaded GEMMs
2017-10-04 20:35:00 +02:00
Tim Moon 9de52b489a Cleaning up and documenting multi-threaded GEMM code. 2017-10-03 16:32:08 -07:00
Tim Moon 860dcfc703 Use 2D thread distribution for small GEMMs.
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Martin Kroeker f96afd94b0 Fix out-of-bounds accesses where the data should be zero anyway 2017-10-01 01:06:39 +02:00
Martin Kroeker ebe84215e4 Merge pull request #1318 from pv/potrf-smoketest
Add trivial smoketest for xpotrf
2017-09-30 21:31:28 +02:00