On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497. This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue. |
||
---|---|---|
.. | ||
getf2 | ||
getrf | ||
getrs | ||
laswp | ||
lauu2 | ||
lauum | ||
potf2 | ||
potrf | ||
trti2 | ||
trtri | ||
CMakeLists.txt | ||
Makefile |