Martin Kroeker
c49c6b237d
Merge pull request #1382 from martin-frbg/dtrmv-1332
...
Work around errors in multithreaded dtrmv
2017-12-05 19:53:23 +01:00
Martin Kroeker
28ae3ca76f
Limit MAX_CPU to 1024 for now
...
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348 , the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
b414283f48
Disable gemv unrolling
...
as a (hopefully temporary) workaround for #1332
2017-12-03 22:41:54 +01:00
Andrew
ef95cd471f
elminate unread variable, after reiteration 3 of them (clang4)
2017-11-25 02:54:37 +01:00
Andrew
e14d50d86e
eliminate Wunused-const gcc7 warning
2017-11-24 19:13:24 +01:00
Martin Kroeker
07e7c36dac
Handle shmem init failures in cpu affinity setup code
...
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker
2a6fef9a55
Try to handle shmget or shmat failing
...
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker
db72ad8f6a
Merge pull request #1320 from timmoon10/develop
...
2D thread distribution for multi-threaded GEMMs
2017-10-08 23:31:33 +02:00
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
...
CMake improvements
2017-10-06 21:13:45 +02:00
Tim Moon
30486a356c
Reduce number of data partitions in n.
2017-10-04 12:37:49 -07:00
Tim Moon
9de52b489a
Cleaning up and documenting multi-threaded GEMM code.
2017-10-03 16:32:08 -07:00
Tim Moon
860dcfc703
Use 2D thread distribution for small GEMMs.
...
Allows maximum use of available cores if one of M and N is small and the other is large.
2017-10-03 13:43:39 -07:00
Tim Moon
6aaa107865
Reducing threads for multi-threaded GEMMs on small matrices.
2017-09-27 19:25:33 -07:00
Martin Kroeker
ba1f91f17b
Convert another caller of "allocation" to LOCK_COMMAND
...
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
f460776f0f
Fix thread data races
2017-09-09 19:07:06 +02:00
Martin Kroeker
e882f3d6f3
Fix thread data race in memory.c
2017-09-09 18:58:38 +02:00
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
2017-08-19 15:07:42 +10:00
Isuru Fernando
2f12ea017b
No strncasecmp with MSVC
2017-08-08 00:07:25 +05:30
Martin Kroeker
719fcc56b0
Merge pull request #1262 from martin-frbg/xmv_thread-splitting
...
Make sure that range limit of last thread never exceeds data size
2017-08-06 14:11:44 +02:00
Martin Kroeker
ebb04e3265
Merge pull request #1259 from isuruf/cmake
...
CMake Improvements
2017-08-02 15:31:05 +02:00
Martin Kroeker
0ba64cee60
Update trmv_thread.c
2017-08-02 12:03:54 +02:00
Martin Kroeker
c4e5ba1bfe
Make sure that range_n of last thread never exceeds the actual data size when splitting the workload
2017-08-02 00:37:58 +02:00
Martin Kroeker
a6f533b248
Revert "Fix calculated range limit exceeding actual data size for last thread"
2017-08-01 19:28:08 +02:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
e70a6b92bf
Merge pull request #1257 from martin-frbg/cgroups-prereq
...
Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds
2017-08-01 11:23:03 +02:00
Martin Kroeker
63cfa32691
Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds
2017-07-31 21:02:43 +02:00
Martin Kroeker
585c0010a5
Fix range limit exceeding actual data size in last step
2017-07-28 00:27:02 +02:00
Martin Kroeker
857f61bc5d
Fix range limit exceeding data size in last step
2017-07-28 00:21:53 +02:00
Martin Kroeker
9332042d5f
Fix range exceeding actual data size in quick_divide
2017-07-28 00:13:24 +02:00
Martin Kroeker
c4af196a2d
Honor cgroup/cpuset limits when enumerating cpus
2017-07-25 22:47:34 +02:00
Martin Kroeker
480e697681
Revert "Honor cgroup/cpuset limits when enumerating cpus" ( #1246 )
2017-07-24 16:17:50 +02:00
Martin Kroeker
80373ea039
More fixes for silly misedits
2017-07-15 12:48:42 +02:00
Martin Kroeker
d12b75a6c4
Fixup braces lost in previous edit
2017-07-15 11:53:28 +02:00
Martin Kroeker
7294fb1d9d
Merge branch 'develop' into cgroups
2017-07-15 10:40:42 +02:00
Zhang Xianyi
2a7c6930ac
Merge pull request #1234 from brada4/develop
...
Fix write past fixed size buffer
2017-07-13 20:27:37 +08:00
Andrew
529bfc36ec
Fix write past fixed size buffer
2017-07-12 00:59:30 +02:00
Martin Kroeker
731c518cff
Add files via upload
2017-07-11 18:42:39 +02:00
Martin Kroeker
29fc429d9a
Honor cgroup/cpuset constraints when enumerating cpus
2017-07-11 18:27:33 +02:00
Martin Kroeker
3db2adf872
Merge pull request #1230 from martin-frbg/rhel5
...
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker
c1cf62d2c0
Add sched_getcpu implementation for pre-2.6 glibc
...
Fixes #1210 , compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Zhang Xianyi
bfe1656b8b
Merge pull request #1225 from martin-frbg/stolen_from_wernsaar_fork
...
fixed syrk_thread.c taken from wernsaar
2017-07-07 15:43:33 +08:00
Martin Kroeker
49e62c0e77
fixed syrk_thread.c taken from wernsaar
...
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
2017-07-06 17:30:12 +02:00
Neil Shipp
34513be726
Add Microsoft Windows 10 UWP build support
2017-06-23 13:07:34 -07:00
Neil Shipp
65e56cb29d
Add 64bit support for Microsoft Visual Studio
2017-06-21 13:38:22 -07:00
James Cowgill
59c97cfee4
memory: Fix buffer overflow when position == NUM_BUFFERS
2017-05-05 17:47:03 +01:00
James Cowgill
5fecfe0f42
memory: switch loop condition around in blas_memory_free
...
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
2017-04-19 19:15:47 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00