Martin Kroeker
d0ec4325cf
Add cpuid for AMD Ryzen 2
2018-07-03 21:03:24 +02:00
Martin Kroeker
a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
...
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker
9d15a3bd16
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
...
fixes 1659
2018-07-02 14:40:41 +02:00
Martin Kroeker
3d3c19717c
Merge pull request #1655 from martin-frbg/issue1641
...
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
2018-07-01 08:41:22 +02:00
Martin Kroeker
4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
...
fixes #1641
2018-06-30 23:57:50 +02:00
Martin Kroeker
750162a05f
Try gradual fallback for cores not in the dynamic core list
2018-06-25 21:02:31 +02:00
Martin Kroeker
e6d93f20f1
Merge pull request #2 from martin-frbg/develop
...
merge develop
2018-06-25 20:48:10 +02:00
Martin Kroeker
1833a67071
Add support for a user-defined list of dynamic targets
2018-06-23 19:42:15 +02:00
Craig Donner
28c28ed275
Fix data races reported by TSAN.
2018-06-21 16:41:02 +01:00
oon3m0oo
a399d00425
Further improvements to memory.c. ( #1625 )
...
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
over-allocating; this saves having external structures do the bookkeeping, and
reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
read than before
2018-06-20 22:04:03 +02:00
Martin Kroeker
2d8cc7193a
Support upcoming Intel Cannon Lake CPUs as Skylake X ( #1621 )
...
* Support upcoming Cannon Lake as Skylake X
2018-06-17 23:38:14 +02:00
Martin Kroeker
47bf0dba8f
Add build-time option for OMP scheduler; document MULTITHREAD_THRESHOLD range ( #1620 )
...
* Allow choosing the OpenMP scheduler and add range hint for GEMM_MULTITHREAD_THRESHOLD
* Amended description of GEMM_MULTITHREAD_THRESHOLD
to reflect #742 making it track floating point operations rather than matrix size
2018-06-15 11:25:05 +02:00
Craig Donner
bf40f806ef
Remove the need for most locking in memory.c.
...
Using thread local storage for tracking memory allocations means that threads
no longer have to lock at all when doing memory allocations / frees. This
particularly helps the gemm driver since it does an allocation per invocation.
Even without threading at all, this helps, since even calling a lock with
no contention has a cost:
Before this change, no threading:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 102 ns 102 ns 13504412
BM_SGEMM/6 175 ns 175 ns 7997580
BM_SGEMM/8 205 ns 205 ns 6842073
BM_SGEMM/10 266 ns 266 ns 5294919
BM_SGEMM/16 478 ns 478 ns 2963441
BM_SGEMM/20 690 ns 690 ns 2144755
BM_SGEMM/32 1906 ns 1906 ns 716981
BM_SGEMM/40 2983 ns 2983 ns 473218
BM_SGEMM/64 9421 ns 9422 ns 148450
BM_SGEMM/72 12630 ns 12631 ns 112105
BM_SGEMM/80 15845 ns 15846 ns 89118
BM_SGEMM/90 25675 ns 25676 ns 54332
BM_SGEMM/100 29864 ns 29865 ns 47120
BM_SGEMM/112 37841 ns 37842 ns 36717
BM_SGEMM/128 56531 ns 56532 ns 25361
BM_SGEMM/140 75886 ns 75888 ns 18143
BM_SGEMM/150 98493 ns 98496 ns 14299
BM_SGEMM/160 102620 ns 102622 ns 13381
BM_SGEMM/170 135169 ns 135173 ns 10231
BM_SGEMM/180 146170 ns 146172 ns 9535
BM_SGEMM/189 190226 ns 190231 ns 7397
BM_SGEMM/200 194513 ns 194519 ns 7210
BM_SGEMM/256 396561 ns 396573 ns 3531
```
with this change:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 95 ns 95 ns 14500387
BM_SGEMM/6 166 ns 166 ns 8381763
BM_SGEMM/8 196 ns 196 ns 7277044
BM_SGEMM/10 256 ns 256 ns 5515721
BM_SGEMM/16 463 ns 463 ns 3025197
BM_SGEMM/20 636 ns 636 ns 2070213
BM_SGEMM/32 1885 ns 1885 ns 739444
BM_SGEMM/40 2969 ns 2969 ns 472152
BM_SGEMM/64 9371 ns 9372 ns 148932
BM_SGEMM/72 12431 ns 12431 ns 112919
BM_SGEMM/80 15615 ns 15616 ns 89978
BM_SGEMM/90 25397 ns 25398 ns 55041
BM_SGEMM/100 29445 ns 29446 ns 47540
BM_SGEMM/112 37530 ns 37531 ns 37286
BM_SGEMM/128 55373 ns 55375 ns 25277
BM_SGEMM/140 76241 ns 76241 ns 18259
BM_SGEMM/150 102196 ns 102200 ns 13736
BM_SGEMM/160 101521 ns 101525 ns 13556
BM_SGEMM/170 136182 ns 136184 ns 10567
BM_SGEMM/180 146861 ns 146864 ns 9035
BM_SGEMM/189 192632 ns 192632 ns 7231
BM_SGEMM/200 198547 ns 198555 ns 6995
BM_SGEMM/256 392316 ns 392330 ns 3539
```
Before, when built with USE_THREAD=1, GEMM_MULTITHREAD_THRESHOLD = 4, the cost
of small matrix operations was overshadowed by thread locking (look smaller than
32) even when not explicitly spawning threads:
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 328 ns 328 ns 4170562
BM_SGEMM/6 396 ns 396 ns 3536400
BM_SGEMM/8 418 ns 418 ns 3330102
BM_SGEMM/10 491 ns 491 ns 2863047
BM_SGEMM/16 710 ns 710 ns 2028314
BM_SGEMM/20 871 ns 871 ns 1581546
BM_SGEMM/32 2132 ns 2132 ns 657089
BM_SGEMM/40 3197 ns 3196 ns 437969
BM_SGEMM/64 9645 ns 9645 ns 144987
BM_SGEMM/72 35064 ns 32881 ns 50264
BM_SGEMM/80 37661 ns 35787 ns 42080
BM_SGEMM/90 36507 ns 36077 ns 40091
BM_SGEMM/100 32513 ns 31850 ns 48607
BM_SGEMM/112 41742 ns 41207 ns 37273
BM_SGEMM/128 67211 ns 65095 ns 21933
BM_SGEMM/140 68263 ns 67943 ns 19245
BM_SGEMM/150 121854 ns 115439 ns 10660
BM_SGEMM/160 116826 ns 115539 ns 10000
BM_SGEMM/170 126566 ns 122798 ns 11960
BM_SGEMM/180 130088 ns 127292 ns 11503
BM_SGEMM/189 120309 ns 116634 ns 13162
BM_SGEMM/200 114559 ns 110993 ns 10000
BM_SGEMM/256 217063 ns 207806 ns 6417
```
and after, it's gone (note this includes my other change which reduces calls
to num_cpu_avail):
```
----------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------
BM_SGEMM/4 95 ns 95 ns 12347650
BM_SGEMM/6 166 ns 166 ns 8259683
BM_SGEMM/8 193 ns 193 ns 7162210
BM_SGEMM/10 258 ns 258 ns 5415657
BM_SGEMM/16 471 ns 471 ns 2981009
BM_SGEMM/20 666 ns 666 ns 2148002
BM_SGEMM/32 1903 ns 1903 ns 738245
BM_SGEMM/40 2969 ns 2969 ns 473239
BM_SGEMM/64 9440 ns 9440 ns 148442
BM_SGEMM/72 37239 ns 33330 ns 46813
BM_SGEMM/80 57350 ns 55949 ns 32251
BM_SGEMM/90 36275 ns 36249 ns 42259
BM_SGEMM/100 31111 ns 31008 ns 45270
BM_SGEMM/112 43782 ns 40912 ns 34749
BM_SGEMM/128 67375 ns 64406 ns 22443
BM_SGEMM/140 76389 ns 67003 ns 21430
BM_SGEMM/150 72952 ns 71830 ns 19793
BM_SGEMM/160 97039 ns 96858 ns 11498
BM_SGEMM/170 123272 ns 122007 ns 11855
BM_SGEMM/180 126828 ns 126505 ns 11567
BM_SGEMM/189 115179 ns 114665 ns 11044
BM_SGEMM/200 89289 ns 87259 ns 16147
BM_SGEMM/256 226252 ns 222677 ns 7375
```
I've also tested this with ThreadSanitizer and found no data races during
execution. I'm not sure why 200 is always faster than it's neighbors, we must
be hitting some optimal cache size or something.
2018-06-14 16:54:58 +01:00
Martin Kroeker
63f7395fb4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option
2018-06-09 16:31:38 +02:00
Martin Kroeker
38ad05bd04
Extend loop range to find SkylakeX in force_coretype
2018-06-05 10:26:49 +02:00
Martin Kroeker
8be027e4c6
Update dynamic.c
2018-06-04 14:36:39 +02:00
Martin Kroeker
ac7b6e3e9a
Fix misplaced endif
2018-06-04 08:23:40 +02:00
Martin Kroeker
ef626c6824
typo fix
2018-06-04 00:13:19 +02:00
Martin Kroeker
5a51cf4576
Separate Skylake X from Skylake
2018-06-03 23:41:33 +02:00
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
...
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Matthew Brett
a8002e283a
Revert "take out unused variables"
...
This reverts commit e5752ff9b3
.
The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.
Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker
191746c493
Merge pull request #1557 from martin-frbg/getconfig
...
Add threading and OpenMP information to output
2018-05-14 17:37:55 +02:00
Martin Kroeker
41ae8e8d67
Add threading and OpenMP information to output
...
For #1416 and #1529 , more information about the options OpenBLAS was built with is needed. Additionally we may want to add this data to the openblas.pc file (but not all projects use pkgconfig, and as far as I am aware the cmake module for accessing it does not make such "private" declarations available)
2018-05-12 12:11:38 +02:00
zhiyong.dang
53457f222f
move _Atomic define to common.h
2018-05-11 00:13:16 -07:00
Zhiyong Dang
3716267124
Change _STDC_VERSION__ to __STDC_VERSION__
...
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Zhiyong Dang
1b83341d19
Fix race condition in blas_server_omp.c
...
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Martin Kroeker
f29389c7ac
Merge pull request #1520 from martin-frbg/cpucounts
...
Catch invalid cpu count returned by CPU_COUNT_S
2018-04-14 22:24:34 +02:00
Martin Kroeker
7c861605b2
Catch invalid cpu count returned by CPU_COUNT_S
...
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker
d636b418af
Merge pull request #1504 from ararslan/aa/openbsd
...
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan
a41d241a0e
Add support for DragonFly BSD
2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52
Allow building on OpenBSD
...
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
01c4b82f04
Update memory.c
2018-03-31 22:32:06 +02:00
Martin Kroeker
93db123f7e
Update memory.c
2018-03-29 13:13:49 +02:00
Martin Kroeker
752fdb5dd8
Add workaround for old gcc and clang versions
...
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker
7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
2018-02-21 11:45:33 +01:00
Martin Kroeker
8866e393a2
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:12 +01:00
Martin Kroeker
3119b2ab4c
Add locks only for non-OPENMP multithreading
...
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray
8f5f614615
On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.
2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd
Perform blas_thread_shutdown with pthread_atfork() on Cygwin
...
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Andrew
e5752ff9b3
take out unused variables
2018-01-20 11:42:31 +01:00
Andrew
8a0b086b28
add missing bracket for old glibc (cppcheck)
2018-01-12 22:35:48 +01:00
Andrew
8aafa0473c
address last warnings as seen by gcc7
2018-01-01 20:57:12 +01:00
Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Martin Kroeker
28ae3ca76f
Limit MAX_CPU to 1024 for now
...
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348 , the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
07e7c36dac
Handle shmem init failures in cpu affinity setup code
...
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker
2a6fef9a55
Try to handle shmget or shmat failing
...
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
...
CMake improvements
2017-10-06 21:13:45 +02:00
Martin Kroeker
ba1f91f17b
Convert another caller of "allocation" to LOCK_COMMAND
...
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
f460776f0f
Fix thread data races
2017-09-09 19:07:06 +02:00
Martin Kroeker
e882f3d6f3
Fix thread data race in memory.c
2017-09-09 18:58:38 +02:00
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
2017-08-19 15:07:42 +10:00
Isuru Fernando
2f12ea017b
No strncasecmp with MSVC
2017-08-08 00:07:25 +05:30
Martin Kroeker
ebb04e3265
Merge pull request #1259 from isuruf/cmake
...
CMake Improvements
2017-08-02 15:31:05 +02:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
63cfa32691
Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds
2017-07-31 21:02:43 +02:00
Martin Kroeker
c4af196a2d
Honor cgroup/cpuset limits when enumerating cpus
2017-07-25 22:47:34 +02:00
Martin Kroeker
480e697681
Revert "Honor cgroup/cpuset limits when enumerating cpus" ( #1246 )
2017-07-24 16:17:50 +02:00
Martin Kroeker
80373ea039
More fixes for silly misedits
2017-07-15 12:48:42 +02:00
Martin Kroeker
d12b75a6c4
Fixup braces lost in previous edit
2017-07-15 11:53:28 +02:00
Martin Kroeker
7294fb1d9d
Merge branch 'develop' into cgroups
2017-07-15 10:40:42 +02:00
Martin Kroeker
731c518cff
Add files via upload
2017-07-11 18:42:39 +02:00
Martin Kroeker
29fc429d9a
Honor cgroup/cpuset constraints when enumerating cpus
2017-07-11 18:27:33 +02:00
Martin Kroeker
3db2adf872
Merge pull request #1230 from martin-frbg/rhel5
...
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker
c1cf62d2c0
Add sched_getcpu implementation for pre-2.6 glibc
...
Fixes #1210 , compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Neil Shipp
34513be726
Add Microsoft Windows 10 UWP build support
2017-06-23 13:07:34 -07:00
Neil Shipp
65e56cb29d
Add 64bit support for Microsoft Visual Studio
2017-06-21 13:38:22 -07:00
James Cowgill
59c97cfee4
memory: Fix buffer overflow when position == NUM_BUFFERS
2017-05-05 17:47:03 +01:00
James Cowgill
5fecfe0f42
memory: switch loop condition around in blas_memory_free
...
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
2017-04-19 19:15:47 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Andrew
5088523786
detect apollo lake for real
2017-02-20 23:54:59 +01:00
Elliot Saba
1d8ab99e09
Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection
2017-02-10 15:23:55 -08:00
Martin Koehler
76c6e33e54
Enable EXCAVATOR kernels for A12-9800
2017-02-07 21:38:28 +01:00
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
...
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar
c1c5a63d3c
prepared parameter.c for UNROLL values, that are not a power of two
2017-01-11 09:50:28 +01:00
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
jiahaipeng
1aa1e6cb54
modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value
2017-01-10 11:47:06 +08:00
Martin Kroeker
51aa157e64
Relocate declaration of alloc_lock outside ifdef block
2017-01-09 01:10:43 +01:00
Martin Kroeker
87c7d10b34
Fix thread data races detected by helgrind 3.12
...
Ref. #995 , may possibly help solve issues seen in 660,883
2017-01-08 23:33:51 +01:00
Martin Kroeker
0ef7841473
Update xerbla.c
2017-01-04 23:16:48 +01:00
Martin Kroeker
104ad066af
Use appropriate int32/int64 format for error number in message string
2016-12-30 00:45:59 +01:00
Alex Arslan
a16ace68f5
Include system headers on FreeBSD
2016-11-16 21:58:20 -08:00
Martin Kroeker
596ead0f8d
Add files via upload
2016-11-06 23:26:39 +01:00
Zhang Xianyi
66c9a9b33d
Merge pull request #981 from howard0su/develop
...
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
2016-10-17 11:32:57 +08:00
Martin Kroeker
8a8f3932eb
Update dynamic.c
...
Add Bay Trail "Pentium N3520" atom
2016-10-16 22:40:00 +02:00
Howard Su
ff1da01476
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
...
to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU.
2016-10-13 12:37:50 +00:00
Zhang Xianyi
ef52a9266b
Fixed #979 . Patch for NetBSD.
2016-10-13 10:17:07 +08:00
Martin Kroeker
7de829f713
Update dynamic.c
...
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Ashwin Sekhar T K
0fb380c966
Update NUMA CPU binding
...
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
2016-04-29 11:58:15 +05:30
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
2b967590a0
bugfix in dynamic.c
2016-04-25 09:08:38 +02:00
Theoractice
aa744dfa59
Update memory.c
2016-03-22 20:02:37 +08:00
theoractice
61cf8f74d9
Fix access violation on Windows while static linking
2016-03-22 19:14:54 +08:00
Zhang Xianyi
68eb4fa329
Add missing openblas_env makefile.
2016-03-09 14:52:47 -05:00
Zhang Xianyi
05196a8497
Refs #716 . Only call getenv at init function.
2016-03-09 12:50:07 -05:00
Zhang Xianyi
1edf30b790
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
2016-03-01 20:13:08 +08:00
Zhang Xianyi
6b85dbb6dc
Refs #696 . Turn off stack limit setting on Linux.
...
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:21:42 -05:00
Martin Kroeker
935356c34f
Update dynamic.c and cpuid_x86.c for Intel Avoton.
...
Second part of "support Intel Avoton via Nehalem kernel"
2016-02-02 13:42:55 -05:00
Zhang Xianyi
f5df444ceb
Merge pull request #762 from jeromerobert/bug760
...
Let openblas_get_num_threads return the number of active threads
2016-01-26 08:45:16 -06:00
Zhang Xianyi
aaa8551c57
Merge pull request #749 from lotheac/illumos_fixes
...
illumos fixes
2016-01-26 08:42:20 -06:00
Jerome Robert
0d87c1ffb6
Let openblas_get_num_threads return the number of active threads
...
... not the number of allocated threads.
Close #760
2016-01-26 13:04:16 +01:00
Lauri Tirkkonen
e737e32fd1
RLIMIT_NPROC doesn't exist on illumos
2016-01-22 18:55:51 +02:00
Lauri Tirkkonen
97cd4b8aee
illumos fixes to memory.c
2016-01-22 18:55:43 +02:00
Werner Saar
0d22551a6b
increase the stack size limit in the constructor
2015-11-20 09:23:01 +01:00
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
839395fc25
Detect AMD Trinity and Richland.
2015-10-29 02:53:29 +08:00
j-bo
6040858b22
Fix #673
...
Add lacking headers declarations when compiling for Android ARM7
2015-10-27 13:55:24 +01:00
Zhang Xianyi
70642fe4ed
Refs #668 . Raise the signal when pthread_create fails.
...
Thank James K. Lowden for the patch.
2015-10-26 19:02:51 -05:00
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi
1ce054fcb3
Refs #669 . Fixed the build bug with gcc on Mac OS X.
2015-10-22 11:07:35 -05:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi
11ac4665c8
Fixed #654 . Make sure the gotoblas_init function is run before all other static initializations.
2015-10-05 14:14:32 -05:00
Zhang Xianyi
cc7cab8a45
Detect other Intel Skylake cores.
...
http://users.atw.hu/instlatx64/
2015-09-09 10:47:17 -05:00
Yichao Yu
61ae47eb99
Ref #632 . Support Intel Skylake by Haswell kernels.
2015-09-09 11:07:33 -04:00
Grazvydas Ignotas
d3e2f0a1af
add missing barriers
...
should fix issue #597
2015-08-16 15:37:02 +02:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Zhang Xianyi
a11555c715
Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
...
e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7
In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/.
2015-05-20 21:57:27 -05:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
powderluv
ebb9eba987
Fix build with ALLOC_SHM=0 (Android NDK)
...
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
2015-05-10 00:10:26 -07:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi
8977b3f235
Refs #529 . Support Intel Broadwell by Haswell kernels.
2015-04-02 11:08:03 -05:00
Zhang Xianyi
e95d64333a
Refs #519 . Avoid calling strncpy.
2015-03-19 15:57:22 -05:00
Ton van den Heuvel
b6438dedea
Fix issue #508
...
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Hank Anderson
5ae8993752
Added intrinsics for MSVC.
2015-02-25 11:52:51 -06:00
Hank Anderson
84d90d6ed8
Fixed some compiler errors/warnings for clang.
2015-02-25 11:52:25 -06:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
c94fe71278
Removed incoming-stack-boundary for MSVC.
...
Made float type optional for GenerateNamedObjects.
Called GenerateNamedObjects for a couple of driver/others files that
needed NAME/CNAME set.
2015-02-11 10:54:14 -06:00
Hank Anderson
7fa5c4e2fd
Fixed some case issues with ARCH.
...
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Zhang Xianyi
cfa9392ffa
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
2015-02-08 01:30:23 -06:00
Hank Anderson
2828f6630c
Added SMP sources to COMMONOBJS.
2015-02-04 14:01:36 -06:00
Erik Schnetter
65a847cd36
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 12:23:41 -05:00
Hank Anderson
7194424fef
Added missing common objects to the library.
2015-02-02 15:21:29 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00
Hank Anderson
3e8ea7a351
Added COMMONOBJS to driver/others CMakeLists.txt.
2015-01-30 14:06:14 -06:00
Hank Anderson
8d9b196e0d
Moved loop over define combos into a function.
...
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Werner Saar
0dc559ed30
bugfix in dynamic.c
2014-12-28 17:15:42 +01:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00