Commit Graph

326 Commits

Author SHA1 Message Date
Martin Kroeker
8be027e4c6 Update dynamic.c 2018-06-04 14:36:39 +02:00
Martin Kroeker
ac7b6e3e9a Fix misplaced endif 2018-06-04 08:23:40 +02:00
Martin Kroeker
ef626c6824 typo fix 2018-06-04 00:13:19 +02:00
Martin Kroeker
5a51cf4576 Separate Skylake X from Skylake 2018-06-03 23:41:33 +02:00
Arjan van de Ven
99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Matthew Brett
a8002e283a Revert "take out unused variables"
This reverts commit e5752ff9b3.

The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.

Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
a91f1587b9 Work around name clash with Windows10's winnt.h
fixes #1503
2018-05-31 13:26:00 +02:00
Martin Kroeker
191746c493 Merge pull request #1557 from martin-frbg/getconfig
Add threading and OpenMP information to output
2018-05-14 17:37:55 +02:00
Martin Kroeker
41ae8e8d67 Add threading and OpenMP information to output
For #1416 and #1529, more information about the options OpenBLAS was built with is needed. Additionally we may want to add this data to the openblas.pc file (but not all projects use pkgconfig, and as far as I am aware the cmake module for accessing it does not make such "private" declarations available)
2018-05-12 12:11:38 +02:00
zhiyong.dang
53457f222f move _Atomic define to common.h 2018-05-11 00:13:16 -07:00
Zhiyong Dang
3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Zhang Xianyi
50acc40613 Merge pull request #1536 from WestAlgo/develop
Fix race condition in blas_server_omp.c
2018-05-11 10:09:14 +08:00
Martin Kroeker
802cf6b22d Merge pull request #1486 from martin-frbg/atomic
Use _Atomic instead of volatile for thread safety where C11 is supported
2018-04-27 23:09:57 +02:00
Zhiyong Dang
1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Martin Kroeker
f29389c7ac Merge pull request #1520 from martin-frbg/cpucounts
Catch invalid cpu count returned by CPU_COUNT_S
2018-04-14 22:24:34 +02:00
Martin Kroeker
7c861605b2 Catch invalid cpu count returned by CPU_COUNT_S
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker
20c6c38e51 Merge branch 'develop' into atomic 2018-04-07 12:09:39 +02:00
Martin Kroeker
d636b418af Merge pull request #1504 from ararslan/aa/openbsd
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan
a41d241a0e Add support for DragonFly BSD 2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52 Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
01c4b82f04 Update memory.c 2018-03-31 22:32:06 +02:00
Martin Kroeker
93db123f7e Update memory.c 2018-03-29 13:13:49 +02:00
Martin Kroeker
752fdb5dd8 Add workaround for old gcc and clang versions
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker
6a99fcce94 Use _Atomic instead of volatile for thread safety where C11 is supported
Suggested by dodomorandi in #660
2018-03-10 00:03:49 +01:00
Martin Kroeker
7646974227 Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker
8866e393a2 Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker
3119b2ab4c Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray
8f5f614615 On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe. 2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd Perform blas_thread_shutdown with pthread_atfork() on Cygwin
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Martin Kroeker
e388459a27 Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Andrew
e5752ff9b3 take out unused variables 2018-01-20 11:42:31 +01:00
Andrew
8a0b086b28 add missing bracket for old glibc (cppcheck) 2018-01-12 22:35:48 +01:00
Martin Kroeker
42285d8e70 Merge pull request #1410 from brada4/develop
Address warnings #1357
2018-01-06 20:02:46 +01:00
Andrew
8aafa0473c address last warnings as seen by gcc7 2018-01-01 20:57:12 +01:00
Andrew
11a627c54e remove surplus parentheses to silence clang5 2018-01-01 20:56:26 +01:00
Martin Kroeker
cc9500db41 Merge pull request #1403 from brada4/develop
Address few more warnings
2017-12-30 14:51:34 +01:00
Andrew
bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Martin Kroeker
177b78c8b4 Issue1388 (#1389)
* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262 - should fix #1388

* Calculation of range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262

* Calculation of chunk range limits was ignoring num_cpu

bug introduced by me in #1262
2017-12-09 22:29:03 +01:00
Andrew
281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
c49c6b237d Merge pull request #1382 from martin-frbg/dtrmv-1332
Work around errors in multithreaded dtrmv
2017-12-05 19:53:23 +01:00
Martin Kroeker
28ae3ca76f Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the 
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
b414283f48 Disable gemv unrolling
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Andrew
ef95cd471f elminate unread variable, after reiteration 3 of them (clang4) 2017-11-25 02:54:37 +01:00
Andrew
e14d50d86e eliminate Wunused-const gcc7 warning 2017-11-24 19:13:24 +01:00
Martin Kroeker
07e7c36dac Handle shmem init failures in cpu affinity setup code
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker
2a6fef9a55 Try to handle shmget or shmat failing
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker
db72ad8f6a Merge pull request #1320 from timmoon10/develop
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker
514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon
30486a356c Reduce number of data partitions in n. 2017-10-04 12:37:49 -07:00
Tim Moon
9de52b489a Cleaning up and documenting multi-threaded GEMM code. 2017-10-03 16:32:08 -07:00