Commit Graph

3078 Commits

Author SHA1 Message Date
Craig Donner c2545b0fd6 Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
2018-06-11 10:17:16 +01:00
Martin Kroeker e65f451409
include CMakePackageConfigHelpers 2018-06-10 15:09:43 +02:00
Martin Kroeker 02634b549b
Add template for OpenBLASConfig.cmake 2018-06-10 09:25:46 +02:00
Martin Kroeker 0bea6bb9e7
Create OpenBLASConfig.cmake from cmake as well 2018-06-10 09:24:37 +02:00
Martin Kroeker 3313e4b946
Merge pull request #1608 from martin-frbg/issue874
Enable parallel make on MS Windows by default
2018-06-09 19:57:33 +02:00
Martin Kroeker e9cd11768c
Enable parallel make on MS Windows by default
fixes #874
2018-06-09 17:54:36 +02:00
Martin Kroeker 63f7395fb4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:31:38 +02:00
Martin Kroeker 1cbd8f3ae4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:30:46 +02:00
Martin Kroeker 6c2d90ba77
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:29:17 +02:00
Martin Kroeker 0297b3211a
Merge pull request #1605 from oon3m0oo/develop
Improve performance of GEMM for small matrices when SMP is defined.
2018-06-09 12:42:34 +02:00
Craig Donner 66316b9f4c Improve performance of GEMM for small matrices when SMP is defined.
Always checking num_cpu_avail() regardless of whether threading will actually
be used adds noticeable overhead for small matrices.  Most other uses of
num_cpu_avail() do so only if threading will be used, so do the same here.
2018-06-07 15:29:13 +01:00
Martin Kroeker 6adc4b7b36
Merge pull request #1601 from martin-frbg/zaxpy
Use a single thread for small input size in zaxpy
2018-06-07 14:09:58 +02:00
Martin Kroeker 2ade0ef085
Merge pull request #1600 from martin-frbg/noyield
Use usleep instead of sched_yield by default
2018-06-07 12:42:00 +02:00
Martin Kroeker e8880c1699
Use a single thread for small input size
copies daxpy improvement from #27, see #1560
2018-06-07 10:26:55 +02:00
Martin Kroeker ed7c4a043b
Use usleep instead of sched_yield by default
sched_yield only burns cpu cycles, fixes #900,  see also #923, #1560
2018-06-07 10:18:26 +02:00
Martin Kroeker cf234a0561
Merge pull request #1589 from fenrus75/skylakex
Initial support for SkylakeX / AVX512
2018-06-06 22:07:09 +02:00
Martin Kroeker ae2a33128b
Merge pull request #1599 from martin-frbg/c_check_avx512
Improved AVX512 test case for c_check
2018-06-06 18:42:42 +02:00
Martin Kroeker e4718b1fee
Better AVX512 test case 2018-06-06 16:51:30 +02:00
Martin Kroeker 9b87b64262
Improve AVX512 testcase
clang 3.4 managed to accept the original test code, only to fail on the actual Skylake asm later
2018-06-06 16:49:00 +02:00
Martin Kroeker 0218b884c1
Merge pull request #1598 from martin-frbg/issue1593-2
Restore _Atomic define before stdatomic.h for old gcc
2018-06-06 12:48:26 +02:00
Martin Kroeker 83da278093
Update common.h 2018-06-06 09:27:49 +02:00
Martin Kroeker 358d4df2bd
Merge branch 'develop' into issue1593-2 2018-06-06 09:21:41 +02:00
Martin Kroeker 06d43760e4
Restore _Atomic define before stdatomic.h for old gcc
see #1593
2018-06-06 09:18:10 +02:00
Martin Kroeker a4af8861ff
Merge pull request #1597 from martin-frbg/cmake-avx512
Check build system support for AVX512 instructions
2018-06-06 07:22:20 +02:00
Martin Kroeker 7fb62aed7e
Check build system support for AVX512 instructions 2018-06-05 23:29:33 +02:00
Martin Kroeker f6021c798d
Re-enable QUIET_MAKE 2018-06-05 19:09:38 +02:00
Martin Kroeker e8002536ec
disable quiet_make for the moment 2018-06-05 18:23:01 +02:00
Martin Kroeker ce6317f6c0
Merge pull request #1594 from martin-frbg/issue1593
Fix inverted condition in _Atomic declaration
2018-06-05 16:02:51 +02:00
Martin Kroeker 15a78d6b66
export NO_AVX512 setting 2018-06-05 15:58:34 +02:00
Martin Kroeker 354a976a59
Fix inverted condition in _Atomic declaration
fixes #1593
2018-06-05 10:31:34 +02:00
Martin Kroeker 38ad05bd04
Extend loop range to find SkylakeX in force_coretype 2018-06-05 10:26:49 +02:00
Martin Kroeker b7feded85a
Propagate NO_AVX512 via CCOMMON_OPT 2018-06-05 10:24:05 +02:00
Martin Kroeker dc9fe05ab5
Update cpuid_x86.c 2018-06-04 17:10:19 +02:00
Martin Kroeker 8be027e4c6
Update dynamic.c 2018-06-04 14:36:39 +02:00
Martin Kroeker ac7b6e3e9a
Fix misplaced endif 2018-06-04 08:23:40 +02:00
Martin Kroeker fc66a0ec0b
Merge pull request #1590 from martin-frbg/avx512_check
Disable AVX512 (Skylake X) support if the build system is too old
2018-06-04 08:18:38 +02:00
Arjan van de Ven 89372e0993 Use AVX512 also for DGEMM
this required switching to the generic gemm_beta code (which is faster anyway on SKX)
for both DGEMM and SGEMM

Performance for the not-retuned version is in the 30% range
2018-06-03 22:17:27 +00:00
Martin Kroeker ef626c6824
typo fix 2018-06-04 00:13:19 +02:00
Martin Kroeker 83fec56a3f
Disable AVX512 (Skylake X) support if the build system is too old 2018-06-04 00:01:11 +02:00
Martin Kroeker 5a51cf4576
Separate Skylake X from Skylake 2018-06-03 23:41:33 +02:00
Martin Kroeker 5a92b311e0
Separate Skylake X from Skylake 2018-06-03 23:29:07 +02:00
Martin Kroeker a7d0f49cec
Add SKYLAKEX to DYNAMIC_CORE list only if AVX512 is available 2018-06-03 23:13:25 +02:00
Martin Kroeker f1fb9a4745
Propagate NO_AVX512 if needed 2018-06-03 13:48:27 +02:00
Martin Kroeker 0023515733
Typo fix (misplaced parenthesis) 2018-06-03 13:22:59 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker 36c4523d85
Merge pull request #1587 from matthew-brett/fix-compile-error-early-glibc
Revert "take out unused variables"
2018-06-02 10:02:38 +02:00
Matthew Brett a8002e283a Revert "take out unused variables"
This reverts commit e5752ff9b3.

The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.

Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker 401adddb2b
Merge pull request #1585 from martin-frbg/lapack-253
Fixes from Lapack-Reference PR 253
2018-06-01 18:59:33 +02:00
Martin Kroeker c5b13d4e10
Fixes from netlib PR 253 2018-06-01 15:14:45 +02:00
Martin Kroeker 677e42d7b0
Fixes from netlib PR 253
When minimal workspace is given in ?hesv_aa, ?sysv_aa, ?hesv_aa_2stage, ?sysv_aa_2stage, now no error is given
Quick return for ?laqr1
2018-06-01 15:12:59 +02:00