Martin Kroeker
fc66a0ec0b
Merge pull request #1590 from martin-frbg/avx512_check
...
Disable AVX512 (Skylake X) support if the build system is too old
2018-06-04 08:18:38 +02:00
Arjan van de Ven
89372e0993
Use AVX512 also for DGEMM
...
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM
Performance for the not-retuned version is in the 30% range
2018-06-03 22:17:27 +00:00
Martin Kroeker
ef626c6824
typo fix
2018-06-04 00:13:19 +02:00
Martin Kroeker
83fec56a3f
Disable AVX512 (Skylake X) support if the build system is too old
2018-06-04 00:01:11 +02:00
Martin Kroeker
5a51cf4576
Separate Skylake X from Skylake
2018-06-03 23:41:33 +02:00
Martin Kroeker
5a92b311e0
Separate Skylake X from Skylake
2018-06-03 23:29:07 +02:00
Martin Kroeker
a7d0f49cec
Add SKYLAKEX to DYNAMIC_CORE list only if AVX512 is available
2018-06-03 23:13:25 +02:00
Martin Kroeker
f1fb9a4745
Propagate NO_AVX512 if needed
2018-06-03 13:48:27 +02:00
Martin Kroeker
0023515733
Typo fix (misplaced parenthesis)
2018-06-03 13:22:59 +02:00
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
...
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker
36c4523d85
Merge pull request #1587 from matthew-brett/fix-compile-error-early-glibc
...
Revert "take out unused variables"
2018-06-02 10:02:38 +02:00
Matthew Brett
a8002e283a
Revert "take out unused variables"
...
This reverts commit e5752ff9b3
.
The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.
Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
401adddb2b
Merge pull request #1585 from martin-frbg/lapack-253
...
Fixes from Lapack-Reference PR 253
2018-06-01 18:59:33 +02:00
Martin Kroeker
c5b13d4e10
Fixes from netlib PR 253
2018-06-01 15:14:45 +02:00
Martin Kroeker
677e42d7b0
Fixes from netlib PR 253
...
When minimal workspace is given in ?hesv_aa, ?sysv_aa, ?hesv_aa_2stage, ?sysv_aa_2stage, now no error is given
Quick return for ?laqr1
2018-06-01 15:12:59 +02:00
Martin Kroeker
e2a8c35e5a
Fixes from netlib PR253
...
LAPACKE interfaces for Aasen's functions now call ?sytrf_aa and ?hetrf_aa instead of ?sytrf and ?hetrf
2018-06-01 15:08:14 +02:00
Martin Kroeker
1a49fb1c05
Merge pull request #1584 from martin-frbg/issue1503
...
Work around name clash with Windows10's winnt.h
2018-05-31 21:56:04 +02:00
Martin Kroeker
8562d5787a
Merge pull request #1583 from martin-frbg/issue1575
...
Handle INCX=0,INCY=0 case
2018-05-31 21:55:26 +02:00
Martin Kroeker
93f1eb09c3
Merge pull request #1582 from martin-frbg/develop-031
...
Update version number on the develop branch to 0.3.1.dev
2018-05-31 21:55:07 +02:00
Martin Kroeker
c90bbda3df
Merge pull request #1581 from martin-frbg/issue1574-2
...
Fix paths to LIN and EIG tests
2018-05-31 21:54:45 +02:00
Martin Kroeker
7df8c4f76f
typo fix
2018-05-31 17:23:08 +02:00
Martin Kroeker
2fc748bf72
Restore optimized swap kernel now that we have a proper fix
2018-05-31 13:41:12 +02:00
Martin Kroeker
a91f1587b9
Work around name clash with Windows10's winnt.h
...
fixes #1503
2018-05-31 13:26:00 +02:00
Martin Kroeker
d1b7be14aa
Handle INCX=0,INCY=0 case
...
Fixes #1575 (sswap/dswap failing the swap utest on x86) as suggested by atsampson.
2018-05-31 12:52:04 +02:00
Martin Kroeker
b491b10057
Update version to 0.3.1.dev
2018-05-31 12:44:36 +02:00
Martin Kroeker
5fae96fb70
Update version to 0.3.1.dev
2018-05-31 12:43:45 +02:00
Martin Kroeker
a7dbd4c57d
Fix paths to LIN and EIG tests
...
should fix 1574
2018-05-31 11:19:33 +02:00
Martin Kroeker
2cae104b5e
Merge pull request #1579 from martin-frbg/issue1574
...
Adapt lapack-test and blas-test to changes in netlib directory layout
2018-05-29 22:02:06 +02:00
Martin Kroeker
908d40be71
Adapt lapack-test and blas-test to changes in netlib directory layout
...
partial fix for #1574 - the problem with lapack_testing.py looks like an upstream bug
2018-05-29 14:27:46 +02:00
Zhang Xianyi
43e592ceb3
Add -lm for Android.
...
Conflicts:
exports/Makefile
2018-05-24 21:02:42 +08:00
Martin Kroeker
f0f27868d8
Merge pull request #1572 from martin-frbg/issue1571
...
Use the new zrot.c on POWER8 for crot as well
2018-05-23 22:55:37 +02:00
Martin Kroeker
961d25e9c7
Use the new zrot.c on POWER8 for crot as well
...
fixes #1571 (the old zrot.S assembly does not handle incx=0 correctly)
2018-05-23 22:54:39 +02:00
Martin Kroeker
939452ea9d
Merge pull request #1570 from xianyi/develop
...
Update release-0.3.0 branch to match develop
2018-05-23 15:12:20 +02:00
Martin Kroeker
f5959f2543
Merge pull request #1567 from martin-frbg/mipstrmm
...
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
2018-05-17 20:50:23 +02:00
Martin Kroeker
82012b960b
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
...
... as it was just a silly workaround for the issue seen in #1563 , caused by #1419
2018-05-17 20:30:03 +02:00
Martin Kroeker
8dd3515fa2
Merge pull request #1565 from martin-frbg/mipstypo
...
Remove extraneous brace from previous commit of mips dsdot fix
2018-05-17 20:22:58 +02:00
Martin Kroeker
95f7f0229c
Remove extraneous brace from previous commit
2018-05-17 18:43:59 +02:00
Martin Kroeker
5082fe4306
Merge pull request #1564 from martin-frbg/issue1563
...
Revert changes from PR#1419
2018-05-17 14:04:13 +02:00
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
...
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Martin Kroeker
9a400b7014
Merge pull request #1562 from martin-frbg/issue1561
...
Use correct data type for initializers of v2f64, v4f32
2018-05-15 17:46:09 +02:00
Martin Kroeker
893b535540
Use correct data type for initializers of v2f64, v4f32
...
Fixes #1561
2018-05-15 14:42:12 +02:00
Martin Kroeker
6791294312
Merge pull request #1559 from martin-frbg/buildconf
...
Add build-time configuration options to pkgconfig file
2018-05-14 18:49:53 +02:00
Martin Kroeker
ddb8b124de
Merge pull request #1558 from martin-frbg/instpc
...
Overwrite any pre-existing openblas.pc rather than append to it
2018-05-14 17:38:12 +02:00
Martin Kroeker
191746c493
Merge pull request #1557 from martin-frbg/getconfig
...
Add threading and OpenMP information to output
2018-05-14 17:37:55 +02:00
Martin Kroeker
eb9b021d38
Add build-time configuration options to pkgconfig file
2018-05-14 00:10:15 +02:00
Martin Kroeker
7d7564568c
Add build-time configuration options to pkgconfig file
2018-05-14 00:09:35 +02:00
Martin Kroeker
a07843bc93
Overwrite any pre-existing openblas.pc rather than append to it
2018-05-12 22:11:27 +02:00
Martin Kroeker
41ae8e8d67
Add threading and OpenMP information to output
...
For #1416 and #1529 , more information about the options OpenBLAS was built with is needed. Additionally we may want to add this data to the openblas.pc file (but not all projects use pkgconfig, and as far as I am aware the cmake module for accessing it does not make such "private" declarations available)
2018-05-12 12:11:38 +02:00
Zhang Xianyi
9c1aa0b0fe
Merge pull request #1556 from WestAlgo/develop
...
move _Atomic define to common.h
2018-05-11 17:02:47 +08:00
zhiyong.dang
53457f222f
move _Atomic define to common.h
2018-05-11 00:13:16 -07:00