Commit Graph

3108 Commits

Author SHA1 Message Date
Martin Kroeker
83da278093 Update common.h 2018-06-06 09:27:49 +02:00
Martin Kroeker
358d4df2bd Merge branch 'develop' into issue1593-2 2018-06-06 09:21:41 +02:00
Martin Kroeker
06d43760e4 Restore _Atomic define before stdatomic.h for old gcc
see #1593
2018-06-06 09:18:10 +02:00
Martin Kroeker
a4af8861ff Merge pull request #1597 from martin-frbg/cmake-avx512
Check build system support for AVX512 instructions
2018-06-06 07:22:20 +02:00
Martin Kroeker
7fb62aed7e Check build system support for AVX512 instructions 2018-06-05 23:29:33 +02:00
Martin Kroeker
f6021c798d Re-enable QUIET_MAKE 2018-06-05 19:09:38 +02:00
Martin Kroeker
e8002536ec disable quiet_make for the moment 2018-06-05 18:23:01 +02:00
Martin Kroeker
ce6317f6c0 Merge pull request #1594 from martin-frbg/issue1593
Fix inverted condition in _Atomic declaration
2018-06-05 16:02:51 +02:00
Martin Kroeker
15a78d6b66 export NO_AVX512 setting 2018-06-05 15:58:34 +02:00
Martin Kroeker
354a976a59 Fix inverted condition in _Atomic declaration
fixes #1593
2018-06-05 10:31:34 +02:00
Martin Kroeker
38ad05bd04 Extend loop range to find SkylakeX in force_coretype 2018-06-05 10:26:49 +02:00
Martin Kroeker
b7feded85a Propagate NO_AVX512 via CCOMMON_OPT 2018-06-05 10:24:05 +02:00
Martin Kroeker
dc9fe05ab5 Update cpuid_x86.c 2018-06-04 17:10:19 +02:00
Martin Kroeker
8be027e4c6 Update dynamic.c 2018-06-04 14:36:39 +02:00
Martin Kroeker
ac7b6e3e9a Fix misplaced endif 2018-06-04 08:23:40 +02:00
Martin Kroeker
fc66a0ec0b Merge pull request #1590 from martin-frbg/avx512_check
Disable AVX512 (Skylake X) support if the build system is too old
2018-06-04 08:18:38 +02:00
Arjan van de Ven
89372e0993 Use AVX512 also for DGEMM
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM

Performance for the not-retuned version is in the 30% range
2018-06-03 22:17:27 +00:00
Martin Kroeker
ef626c6824 typo fix 2018-06-04 00:13:19 +02:00
Martin Kroeker
83fec56a3f Disable AVX512 (Skylake X) support if the build system is too old 2018-06-04 00:01:11 +02:00
Martin Kroeker
5a51cf4576 Separate Skylake X from Skylake 2018-06-03 23:41:33 +02:00
Martin Kroeker
5a92b311e0 Separate Skylake X from Skylake 2018-06-03 23:29:07 +02:00
Martin Kroeker
a7d0f49cec Add SKYLAKEX to DYNAMIC_CORE list only if AVX512 is available 2018-06-03 23:13:25 +02:00
Martin Kroeker
f1fb9a4745 Propagate NO_AVX512 if needed 2018-06-03 13:48:27 +02:00
Martin Kroeker
0023515733 Typo fix (misplaced parenthesis) 2018-06-03 13:22:59 +02:00
Arjan van de Ven
99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker
36c4523d85 Merge pull request #1587 from matthew-brett/fix-compile-error-early-glibc
Revert "take out unused variables"
2018-06-02 10:02:38 +02:00
Matthew Brett
a8002e283a Revert "take out unused variables"
This reverts commit e5752ff9b3.

The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.

Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
401adddb2b Merge pull request #1585 from martin-frbg/lapack-253
Fixes from Lapack-Reference PR 253
2018-06-01 18:59:33 +02:00
Martin Kroeker
c5b13d4e10 Fixes from netlib PR 253 2018-06-01 15:14:45 +02:00
Martin Kroeker
677e42d7b0 Fixes from netlib PR 253
When minimal workspace is given in ?hesv_aa, ?sysv_aa, ?hesv_aa_2stage, ?sysv_aa_2stage, now no error is given
Quick return for ?laqr1
2018-06-01 15:12:59 +02:00
Martin Kroeker
e2a8c35e5a Fixes from netlib PR253
LAPACKE interfaces for Aasen's functions now call ?sytrf_aa and ?hetrf_aa instead of ?sytrf and ?hetrf
2018-06-01 15:08:14 +02:00
Martin Kroeker
1a49fb1c05 Merge pull request #1584 from martin-frbg/issue1503
Work around name clash with Windows10's winnt.h
2018-05-31 21:56:04 +02:00
Martin Kroeker
8562d5787a Merge pull request #1583 from martin-frbg/issue1575
Handle INCX=0,INCY=0 case
2018-05-31 21:55:26 +02:00
Martin Kroeker
93f1eb09c3 Merge pull request #1582 from martin-frbg/develop-031
Update version number on the develop branch to 0.3.1.dev
2018-05-31 21:55:07 +02:00
Martin Kroeker
c90bbda3df Merge pull request #1581 from martin-frbg/issue1574-2
Fix paths to LIN and EIG tests
2018-05-31 21:54:45 +02:00
Martin Kroeker
7df8c4f76f typo fix 2018-05-31 17:23:08 +02:00
Martin Kroeker
2fc748bf72 Restore optimized swap kernel now that we have a proper fix 2018-05-31 13:41:12 +02:00
Martin Kroeker
a91f1587b9 Work around name clash with Windows10's winnt.h
fixes #1503
2018-05-31 13:26:00 +02:00
Martin Kroeker
d1b7be14aa Handle INCX=0,INCY=0 case
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
2018-05-31 12:52:04 +02:00
Martin Kroeker
b491b10057 Update version to 0.3.1.dev 2018-05-31 12:44:36 +02:00
Martin Kroeker
5fae96fb70 Update version to 0.3.1.dev 2018-05-31 12:43:45 +02:00
Martin Kroeker
a7dbd4c57d Fix paths to LIN and EIG tests
should fix 1574
2018-05-31 11:19:33 +02:00
Martin Kroeker
2cae104b5e Merge pull request #1579 from martin-frbg/issue1574
Adapt lapack-test and blas-test to changes in netlib directory layout
2018-05-29 22:02:06 +02:00
Martin Kroeker
908d40be71 Adapt lapack-test and blas-test to changes in netlib directory layout
partial fix for #1574 - the problem with lapack_testing.py looks like an upstream bug
2018-05-29 14:27:46 +02:00
Zhang Xianyi
43e592ceb3 Add -lm for Android.
Conflicts:
	exports/Makefile
2018-05-24 21:02:42 +08:00
Martin Kroeker
f0f27868d8 Merge pull request #1572 from martin-frbg/issue1571
Use the new zrot.c on POWER8 for crot as well
2018-05-23 22:55:37 +02:00
Martin Kroeker
961d25e9c7 Use the new zrot.c on POWER8 for crot as well
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
2018-05-23 22:54:39 +02:00
Martin Kroeker
939452ea9d Merge pull request #1570 from xianyi/develop
Update release-0.3.0 branch to match develop
v0.3.0
2018-05-23 15:12:20 +02:00
Martin Kroeker
f5959f2543 Merge pull request #1567 from martin-frbg/mipstrmm
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
2018-05-17 20:50:23 +02:00
Martin Kroeker
82012b960b Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
... as it was just a silly workaround for the issue seen in #1563, caused by #1419
2018-05-17 20:30:03 +02:00