Martin Kroeker
591cca7cb0
Check availability of immintrin.h in the AVX512 compatibility test
2018-10-04 07:35:30 +02:00
Andrew
3439158dea
address #1782 2nd loop
2018-10-03 21:20:50 +02:00
Arjan van de Ven
45fe8cb0c5
Create a AVX512 enabled version of DGEMM
...
This patch adds dgemm_kernel_4x8_skylakex.c which is
* dgemm_kernel_4x8_haswell.s converted to C + intrinsics
* 8x8 support added
* 8x8 kernel implemented using AVX512
Performance is a work in progress, but already shows a 10% - 20%
increase for a wide range of matrix sizes.
2018-10-03 14:45:25 +00:00
Martin Kroeker
544b069e85
Merge pull request #1780 from martin-frbg/issue1774-2
...
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-29 09:27:47 +02:00
Martin Kroeker
9b2a7ad40d
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
second part of fix for #1774 , containing files missed in #1775
2018-09-28 23:05:15 +02:00
Martin Kroeker
10ce70701a
Merge pull request #1778 from fengrl/develop
...
test_axpy work error on LOONGSON3A platform #1777
2018-09-26 11:14:58 +02:00
fengruilin
6fc85a6359
test_axpy work error on LOONGSON3A platform #1777
2018-09-26 15:14:04 +08:00
Martin Kroeker
831c661386
Merge pull request #1775 from martin-frbg/issue1774
...
Convert fldmia/fstmia instructions to UAL syntax for clang7
2018-09-25 18:58:39 +02:00
Martin Kroeker
7e5df34e6a
Convert fldmia/fstmia instructions to UAL syntax for clang7
...
fixes #1774
2018-09-25 09:41:58 +02:00
Martin Kroeker
4f45040b89
Merge pull request #1773 from martin-frbg/issue1767
...
Include thread numbers in failure message from blas_thread_init
2018-09-23 23:25:15 +02:00
Martin Kroeker
28aa94bf4b
Include thread numbers in failure message from blas_thread_init
...
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker
56e7c68810
Merge pull request #1771 from staticfloat/sf/ldflags
...
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
2018-09-22 13:11:39 +02:00
Martin Kroeker
cf6df9464c
Document the stub status of the QUAD_PRECiSION code ( #1772 )
...
* Document the stub status of the QUAD_PRECiSION code inherited from GotoBLAS2
in response to #1769
2018-09-22 12:31:37 +02:00
Elliot Saba
6f77af2eef
Add `$(LDFLAGS)` to `$(CC)` and `$(FC)` invocations within `exports/Makefile`
2018-09-21 09:19:51 +00:00
Martin Kroeker
4d183e5567
Merge pull request #1765 from martin-frbg/issue1761
...
Do not use the new TLS-enabled memory allocator for non-threaded builds, and disable TLS by default in gmake as well
2018-09-19 22:02:21 +02:00
Martin Kroeker
34d55fd165
Merge pull request #1764 from yurivict/64-suffix
...
Allow to install the 'interface64' version concurrently with the regular version
2018-09-19 18:16:38 +02:00
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
...
Add explicit casts to silence compiler warnings
2018-09-19 18:16:21 +02:00
Martin Kroeker
288aeea8a2
Fix default settings - USE_TLS and USE_SIMPLE_THREADED_LEVEL3 should both be off
2018-09-19 18:08:31 +02:00
Martin Kroeker
1ad1e79062
Catch inadvertent USE_TLS=0 declaration
...
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker
b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
...
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker
ec0cac1669
Merge pull request #4 from xianyi/develop
...
Update branch
2018-09-16 12:36:49 +02:00
Yuri
2349e15149
Allow to install the 'interfare64' version concurrently with the regular version
2018-09-15 21:00:03 -07:00
Martin Kroeker
f3c262156e
Add an explicit cast to silence a warning
...
for #1710
2018-09-13 14:24:29 +02:00
Martin Kroeker
30f5a69ab8
Add explicit cast to silence a warning
...
for #1710
2018-09-13 14:23:31 +02:00
Martin Kroeker
fd081a91e4
Merge pull request #1759 from martin-frbg/lapack283
...
Remove an unused variable from several LAPACKE 2stage_work functions
2018-09-11 13:52:09 +02:00
Martin Kroeker
094f8c3b57
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:53:47 +02:00
Martin Kroeker
5cf090f516
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:52:30 +02:00
Martin Kroeker
58363542e7
remove unused variable ldb_t
...
Copied from Reference-LAPACK PR283
2018-09-11 10:51:17 +02:00
Martin Kroeker
3abc22a5bf
Merge pull request #1757 from brada4/develop
...
fix small typo in strmm_ LN
2018-09-09 22:55:15 +02:00
Andrew
1e531701b7
fix small typo
2018-09-09 16:52:25 +02:00
Martin Kroeker
5d42b6ea04
Merge pull request #1756 from martin-frbg/issue1754
...
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
2018-09-07 11:02:18 +02:00
Martin Kroeker
ba4f433321
Merge pull request #1749 from martin-frbg/issue1531
...
Fix ARMV8 cross-compilation for IOS
2018-09-07 11:02:01 +02:00
Martin Kroeker
4cf7315a5d
Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS
2018-09-06 21:41:54 +02:00
Martin Kroeker
b57af93792
just make CBLAS_LAYOUT an alias of the existing CBLAS_ORDER
...
to avoid having to change all instances of enum CBLAS_ORDER in this file
2018-09-06 16:54:31 +02:00
Martin Kroeker
8aeab0601e
Follow netlib renaming/aliasing CBLAS_ORDER to CBLAS_LAYOUT
...
fixes #1754
2018-09-06 16:39:52 +02:00
Martin Kroeker
1cb7b9015e
Conditional compilation of assembly files that IOS does not like
2018-09-04 11:06:51 +02:00
Martin Kroeker
a4bd41e9f2
Fix paths to C kernels for nrm2
2018-09-04 10:51:19 +02:00
Martin Kroeker
9e2bb0c641
Update with the changes from 0.3.3
2018-08-31 00:21:13 +02:00
Martin Kroeker
dbfd7524cd
Update version to 0.3.4.dev
2018-08-31 00:19:21 +02:00
Martin Kroeker
2982ce505d
Update version to 0.3.4.dev
2018-08-31 00:18:37 +02:00
Martin Kroeker
fd8d1868a1
Updates for 0.3.3
2018-08-31 00:07:48 +02:00
Martin Kroeker
f0563f14ba
Version 0.3.3
2018-08-30 23:43:57 +02:00
Martin Kroeker
3197f86762
Version 0.3.3
2018-08-30 23:43:14 +02:00
Martin Kroeker
422a8fa953
Merge pull request #1747 from xianyi/develop
...
Merge develop into 0.3.x for 0.3.3
2018-08-30 23:42:19 +02:00
Martin Kroeker
5bac15adbd
Merge pull request #1746 from martin-frbg/issue1674
...
Assume cross-compilation if host and target os differ
2018-08-30 17:48:07 +02:00
Martin Kroeker
e17f969fa0
Assume cross-compilation if host and target os differ
...
fixes 1674
2018-08-30 13:28:46 +02:00
Martin Kroeker
e11126b26a
Merge pull request #1745 from martin-frbg/issue1743
...
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-gen…
2018-08-29 07:43:58 +02:00
Martin Kroeker
74608e470d
Merge pull request #1744 from martin-frbg/lapack272
...
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
2018-08-28 22:58:58 +02:00
Martin Kroeker
f3fd44a731
Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic
...
fixes #1743
2018-08-28 21:34:07 +02:00
Martin Kroeker
9e917b16db
Fix missing replacements of ILAENV by ILAENV_2STAGE (lapack PR 272)
...
This could cause spurious "parameter has an illegal value" errors in DSYEVR and related routines, see https://github.com/Reference-LAPACK/lapack/issues/262
2018-08-28 21:11:54 +02:00