Commit Graph

383 Commits

Author SHA1 Message Date
Arjan van de Ven
99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Matthew Brett
a8002e283a Revert "take out unused variables"
This reverts commit e5752ff9b3.

The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.

Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
191746c493 Merge pull request #1557 from martin-frbg/getconfig
Add threading and OpenMP information to output
2018-05-14 17:37:55 +02:00
Martin Kroeker
41ae8e8d67 Add threading and OpenMP information to output
For #1416 and #1529, more information about the options OpenBLAS was built with is needed. Additionally we may want to add this data to the openblas.pc file (but not all projects use pkgconfig, and as far as I am aware the cmake module for accessing it does not make such "private" declarations available)
2018-05-12 12:11:38 +02:00
zhiyong.dang
53457f222f move _Atomic define to common.h 2018-05-11 00:13:16 -07:00
Zhiyong Dang
3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Zhiyong Dang
1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Martin Kroeker
f29389c7ac Merge pull request #1520 from martin-frbg/cpucounts
Catch invalid cpu count returned by CPU_COUNT_S
2018-04-14 22:24:34 +02:00
Martin Kroeker
7c861605b2 Catch invalid cpu count returned by CPU_COUNT_S
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker
d636b418af Merge pull request #1504 from ararslan/aa/openbsd
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan
a41d241a0e Add support for DragonFly BSD 2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52 Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
01c4b82f04 Update memory.c 2018-03-31 22:32:06 +02:00
Martin Kroeker
93db123f7e Update memory.c 2018-03-29 13:13:49 +02:00
Martin Kroeker
752fdb5dd8 Add workaround for old gcc and clang versions
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker
7646974227 Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker
8866e393a2 Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker
3119b2ab4c Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray
8f5f614615 On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe. 2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd Perform blas_thread_shutdown with pthread_atfork() on Cygwin
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Andrew
e5752ff9b3 take out unused variables 2018-01-20 11:42:31 +01:00
Andrew
8a0b086b28 add missing bracket for old glibc (cppcheck) 2018-01-12 22:35:48 +01:00
Andrew
8aafa0473c address last warnings as seen by gcc7 2018-01-01 20:57:12 +01:00
Andrew
bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Martin Kroeker
28ae3ca76f Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the 
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
07e7c36dac Handle shmem init failures in cpu affinity setup code
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker
2a6fef9a55 Try to handle shmget or shmat failing
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker
514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Martin Kroeker
ba1f91f17b Convert another caller of "allocation" to LOCK_COMMAND
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
f460776f0f Fix thread data races 2017-09-09 19:07:06 +02:00
Martin Kroeker
e882f3d6f3 Fix thread data race in memory.c 2017-09-09 18:58:38 +02:00
Sacha Refshauge
37858d1146 Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Isuru Fernando
2f12ea017b No strncasecmp with MSVC 2017-08-08 00:07:25 +05:30
Martin Kroeker
ebb04e3265 Merge pull request #1259 from isuruf/cmake
CMake Improvements
2017-08-02 15:31:05 +02:00
Isuru Fernando
d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Martin Kroeker
63cfa32691 Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds 2017-07-31 21:02:43 +02:00
Martin Kroeker
c4af196a2d Honor cgroup/cpuset limits when enumerating cpus 2017-07-25 22:47:34 +02:00
Martin Kroeker
480e697681 Revert "Honor cgroup/cpuset limits when enumerating cpus" (#1246) 2017-07-24 16:17:50 +02:00
Martin Kroeker
80373ea039 More fixes for silly misedits 2017-07-15 12:48:42 +02:00
Martin Kroeker
d12b75a6c4 Fixup braces lost in previous edit 2017-07-15 11:53:28 +02:00
Martin Kroeker
7294fb1d9d Merge branch 'develop' into cgroups 2017-07-15 10:40:42 +02:00
Martin Kroeker
731c518cff Add files via upload 2017-07-11 18:42:39 +02:00
Martin Kroeker
29fc429d9a Honor cgroup/cpuset constraints when enumerating cpus 2017-07-11 18:27:33 +02:00
Martin Kroeker
3db2adf872 Merge pull request #1230 from martin-frbg/rhel5
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker
c1cf62d2c0 Add sched_getcpu implementation for pre-2.6 glibc
Fixes #1210, compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Neil Shipp
34513be726 Add Microsoft Windows 10 UWP build support 2017-06-23 13:07:34 -07:00
Neil Shipp
65e56cb29d Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
James Cowgill
59c97cfee4 memory: Fix buffer overflow when position == NUM_BUFFERS 2017-05-05 17:47:03 +01:00
James Cowgill
5fecfe0f42 memory: switch loop condition around in blas_memory_free
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.

This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto
9c884986ad Add an extra familiy/model combination used by AMD Steamrolller (Godavari). 2017-04-19 19:15:47 +02:00