Commit Graph

3154 Commits

Author SHA1 Message Date
Martin Kroeker
474f7e9583 Add SYMBOLPREFIX and -SUFFIX options and improve help output 2018-10-06 14:28:04 +02:00
Martin Kroeker
28aa94bf4b Include thread numbers in failure message from blas_thread_init
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker
288aeea8a2 Fix default settings - USE_TLS and USE_SIMPLE_THREADED_LEVEL3 should both be off 2018-09-19 18:08:31 +02:00
Martin Kroeker
1ad1e79062 Catch inadvertent USE_TLS=0 declaration
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509 Do not use the new TLS code for non-threaded builds even if USE_TLS is set
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
ec0cac1669 Merge pull request #4 from xianyi/develop
Update branch
2018-09-16 12:36:49 +02:00
Martin Kroeker
fd081a91e4 Merge pull request #1759 from martin-frbg/lapack283
Remove an unused variable from several LAPACKE 2stage_work functions
2018-09-11 13:52:09 +02:00
Martin Kroeker
094f8c3b57 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:53:47 +02:00
Martin Kroeker
5cf090f516 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:52:30 +02:00
Martin Kroeker
58363542e7 remove unused variable ldb_t
Copied from Reference-LAPACK PR283
2018-09-11 10:51:17 +02:00
Martin Kroeker
3abc22a5bf Merge pull request #1757 from brada4/develop
fix small typo in strmm_ LN
2018-09-09 22:55:15 +02:00
Andrew
1e531701b7 fix small typo 2018-09-09 16:52:25 +02:00
Martin Kroeker
5d42b6ea04 Merge pull request #1756 from martin-frbg/issue1754
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
2018-09-07 11:02:18 +02:00
Martin Kroeker
ba4f433321 Merge pull request #1749 from martin-frbg/issue1531
Fix ARMV8 cross-compilation for IOS
2018-09-07 11:02:01 +02:00
Martin Kroeker
4cf7315a5d Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS 2018-09-06 21:41:54 +02:00
Martin Kroeker
b57af93792 just make CBLAS_LAYOUT an alias of the existing CBLAS_ORDER
to avoid having to change all instances of enum CBLAS_ORDER in this file
2018-09-06 16:54:31 +02:00
Martin Kroeker
8aeab0601e Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
fixes #1754
2018-09-06 16:39:52 +02:00
Martin Kroeker
1cb7b9015e Conditional compilation of assembly files that IOS does not like 2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2 Fix paths to C kernels for nrm2 2018-09-04 10:51:19 +02:00
Martin Kroeker
9e2bb0c641 Update with the changes from 0.3.3 2018-08-31 00:21:13 +02:00
Martin Kroeker
dbfd7524cd Update version to 0.3.4.dev 2018-08-31 00:19:21 +02:00
Martin Kroeker
2982ce505d Update version to 0.3.4.dev 2018-08-31 00:18:37 +02:00
Martin Kroeker
5bac15adbd Merge pull request #1746 from martin-frbg/issue1674
Assume cross-compilation if host and target os differ
2018-08-30 17:48:07 +02:00
Martin Kroeker
e17f969fa0 Assume cross-compilation if host and target os differ
fixes 1674
2018-08-30 13:28:46 +02:00
Martin Kroeker
e11126b26a Merge pull request #1745 from martin-frbg/issue1743
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…
2018-08-29 07:43:58 +02:00
Martin Kroeker
74608e470d Merge pull request #1744 from martin-frbg/lapack272
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
2018-08-28 22:58:58 +02:00
Martin Kroeker
f3fd44a731 Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
fixes #1743
2018-08-28 21:34:07 +02:00
Martin Kroeker
9e917b16db Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
This could cause spurious "parameter has an illegal value" errors in DSYEVR and related routines, see https://github.com/Reference-LAPACK/lapack/issues/262
2018-08-28 21:11:54 +02:00
Martin Kroeker
8440a4cb1a Merge pull request #1742 from martin-frbg/interim033
Add combination of old and new thread memory code selectable by new option USE_TLS
2018-08-28 08:02:15 +02:00
Martin Kroeker
b55690a659 typo fix 2018-08-26 11:31:07 +02:00
Martin Kroeker
b902a40986 Rewrite glibc version check 2018-08-26 11:18:02 +02:00
Martin Kroeker
5991d1a6cd Update memory.c 2018-08-25 22:12:40 +02:00
Martin Kroeker
b1b743f434 Merge branch 'develop' into interim033 2018-08-25 19:45:19 +02:00
Martin Kroeker
2caa2210bb Add USE_TLS option to choose between old and new implementation of memory.c 2018-08-25 19:37:11 +02:00
Martin Kroeker
2a589c4b28 Add USE_TLS option to switch between old and new memory.c 2018-08-25 19:36:12 +02:00
Martin Kroeker
fd42ca462d Combo of default pre-0.3.1 memory.c and band-aided version of PR1739 2018-08-25 19:35:16 +02:00
Martin Kroeker
52d3f7af50 Merge pull request #1738 from sharkcz/s390x
detect z14 arch on s390x
2018-08-16 09:46:34 +02:00
Dan Horák
5c6e020f49 detect z14 arch on s390x 2018-08-14 12:30:38 +02:00
Martin Kroeker
d4d3113adc Merge pull request #1731 from fenrus75/readme
add short blurb about avx512 and needed compiler to README
2018-08-13 00:01:37 +02:00
Martin Kroeker
375dff54fc Merge pull request #1733 from fenrus75/dsymv
Add an AVX512 enabled DSYMV (L) function
2018-08-12 18:18:36 +02:00
Martin Kroeker
a5f165275a Merge pull request #1732 from fenrus75/dgemv
Add an AVX512 enabled DGEMV (n)  function
2018-08-12 18:17:42 +02:00
Martin Kroeker
8c13aa495a Merge pull request #1730 from fenrus75/fix-sdot
Fix typo in sdot function
2018-08-12 18:17:01 +02:00
Martin Kroeker
1ee6d087c3 Merge pull request #1729 from fenrus75/dscal
Add an AVX512 enabled DSCAL function
2018-08-12 18:16:45 +02:00
Martin Kroeker
a95a784ab2 Merge pull request #1723 from maamountki/develop
Disable zgemv scale in gemv benchmark by default
2018-08-11 21:08:45 +02:00
Arjan van de Ven
9bec34cb67 Add an AVX512 enabled DSYMV (L) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:46:24 +00:00
Arjan van de Ven
87bebdbd8a Add an AVX512 enabled DGEMV (n) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:38:12 +00:00
Arjan van de Ven
9493f26309 add short blurb about avx512 and needed compiler to README 2018-08-11 17:21:46 +00:00
Arjan van de Ven
36add7570a Fix typo in sdot function
it looks like my previous pull request was short the final commit;
fix a typo in sdot
2018-08-11 17:16:45 +00:00
Arjan van de Ven
cacacc8007 Add an AVX512 enabled DSCAL function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)

For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
2018-08-11 17:14:57 +00:00
Martin Kroeker
1a00ef3d27 Merge pull request #1725 from fenrus75/axpy
Add a AVX512 enabled SAXPY/DAXPY functions
2018-08-11 11:01:20 +02:00