Commit Graph

525 Commits

Author SHA1 Message Date
Ashwin Sekhar T K 21f46a1cf2 ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
2018-10-17 10:44:37 -07:00
Andrew 3439158dea address #1782 2nd loop 2018-10-03 21:20:50 +02:00
Martin Kroeker 28aa94bf4b
Include thread numbers in failure message from blas_thread_init
to aid in debugging cases like #1767
2018-09-22 14:00:15 +02:00
Martin Kroeker 1ad1e79062
Catch inadvertent USE_TLS=0 declaration
for #1766
2018-09-19 18:03:43 +02:00
Martin Kroeker b402626509
Do not use the new TLS code for non-threaded builds even if USE_TLS is set
Workaround for #1761 as that exposed a problem in the new code (which was intended to speed up multithreaded code only anyway).
2018-09-16 12:43:36 +02:00
Martin Kroeker b55690a659
typo fix 2018-08-26 11:31:07 +02:00
Martin Kroeker b902a40986
Rewrite glibc version check 2018-08-26 11:18:02 +02:00
Martin Kroeker 5991d1a6cd
Update memory.c 2018-08-25 22:12:40 +02:00
Martin Kroeker b1b743f434
Merge branch 'develop' into interim033 2018-08-25 19:45:19 +02:00
Martin Kroeker fd42ca462d
Combo of default pre-0.3.1 memory.c and band-aided version of PR1739 2018-08-25 19:35:16 +02:00
Zoltán Mizsei 6463bffd59 Haiku supporting patches 2018-08-02 20:49:14 +02:00
Martin Kroeker 8ef7d4fb54
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
2018-08-02 18:53:34 +02:00
Craig Donner 6400868e55 Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly.  Now we detect if we've
found enough pages for the allocation and terminate the loop.
2018-08-02 16:21:19 +01:00
Martin Kroeker 66fcdd5be8
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
2018-07-22 16:34:09 +02:00
Martin Kroeker 43ac839c16
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
2018-07-22 09:19:19 +02:00
Martin Kroeker 7ba5936ecd
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
2018-07-19 19:03:45 +02:00
Martin Kroeker b14f44d2ad
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
2018-07-19 08:57:56 +02:00
Martin Kroeker 36aea5ce2d
Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
2018-07-12 14:05:13 +02:00
Martin Kroeker 571e9de2ac
Fix definition of snprintf for MSVC
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
2018-07-12 11:42:25 +02:00
Martin Kroeker 448ed15115
Merge pull request #1678 from martin-frbg/issue1677
Define snprintf for older versions of MSVC
2018-07-12 09:21:34 +02:00
Martin Kroeker 045fb5ea2c
Define snprintf for older versions of MSVC
for #1677
2018-07-12 07:30:58 +02:00
Martin Kroeker 504310eeb9
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
2018-07-04 08:19:40 +02:00
Martin Kroeker ea1f39518f
Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
2018-07-04 08:19:11 +02:00
Martin Kroeker d0ec4325cf
Add cpuid for AMD Ryzen 2 2018-07-03 21:03:24 +02:00
Martin Kroeker a49203b48c
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
for #1641
2018-07-03 17:35:54 +02:00
Martin Kroeker 9d15a3bd16
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
fixes 1659
2018-07-02 14:40:41 +02:00
Martin Kroeker 3d3c19717c
Merge pull request #1655 from martin-frbg/issue1641
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
2018-07-01 08:41:22 +02:00
Martin Kroeker 4e9c34018e
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
fixes #1641
2018-06-30 23:57:50 +02:00
Martin Kroeker 750162a05f
Try gradual fallback for cores not in the dynamic core list 2018-06-25 21:02:31 +02:00
Martin Kroeker e6d93f20f1
Merge pull request #2 from martin-frbg/develop
merge develop
2018-06-25 20:48:10 +02:00
Martin Kroeker 1833a67071
Add support for a user-defined list of dynamic targets 2018-06-23 19:42:15 +02:00
Craig Donner 28c28ed275 Fix data races reported by TSAN. 2018-06-21 16:41:02 +01:00
oon3m0oo a399d00425 Further improvements to memory.c. (#1625)
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
  over-allocating; this saves having external structures do the bookkeeping, and
  reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
  my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
  read than before
2018-06-20 22:04:03 +02:00
Martin Kroeker 2d8cc7193a
Support upcoming Intel Cannon Lake CPUs as Skylake X (#1621)
* Support  upcoming Cannon Lake as Skylake X
2018-06-17 23:38:14 +02:00
Martin Kroeker 47bf0dba8f
Add build-time option for OMP scheduler; document MULTITHREAD_THRESHOLD range (#1620)
* Allow choosing the OpenMP scheduler and add range hint for GEMM_MULTITHREAD_THRESHOLD
* Amended description of GEMM_MULTITHREAD_THRESHOLD
to reflect #742 making it track floating point operations rather than matrix size
2018-06-15 11:25:05 +02:00
Craig Donner bf40f806ef Remove the need for most locking in memory.c.
Using thread local storage for tracking memory allocations means that threads
no longer have to lock at all when doing memory allocations / frees. This
particularly helps the gemm driver since it does an allocation per invocation.
Even without threading at all, this helps, since even calling a lock with
no contention has a cost:

Before this change, no threading:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          102 ns        102 ns   13504412
BM_SGEMM/6          175 ns        175 ns    7997580
BM_SGEMM/8          205 ns        205 ns    6842073
BM_SGEMM/10         266 ns        266 ns    5294919
BM_SGEMM/16         478 ns        478 ns    2963441
BM_SGEMM/20         690 ns        690 ns    2144755
BM_SGEMM/32        1906 ns       1906 ns     716981
BM_SGEMM/40        2983 ns       2983 ns     473218
BM_SGEMM/64        9421 ns       9422 ns     148450
BM_SGEMM/72       12630 ns      12631 ns     112105
BM_SGEMM/80       15845 ns      15846 ns      89118
BM_SGEMM/90       25675 ns      25676 ns      54332
BM_SGEMM/100      29864 ns      29865 ns      47120
BM_SGEMM/112      37841 ns      37842 ns      36717
BM_SGEMM/128      56531 ns      56532 ns      25361
BM_SGEMM/140      75886 ns      75888 ns      18143
BM_SGEMM/150      98493 ns      98496 ns      14299
BM_SGEMM/160     102620 ns     102622 ns      13381
BM_SGEMM/170     135169 ns     135173 ns      10231
BM_SGEMM/180     146170 ns     146172 ns       9535
BM_SGEMM/189     190226 ns     190231 ns       7397
BM_SGEMM/200     194513 ns     194519 ns       7210
BM_SGEMM/256     396561 ns     396573 ns       3531
```
with this change:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   14500387
BM_SGEMM/6          166 ns        166 ns    8381763
BM_SGEMM/8          196 ns        196 ns    7277044
BM_SGEMM/10         256 ns        256 ns    5515721
BM_SGEMM/16         463 ns        463 ns    3025197
BM_SGEMM/20         636 ns        636 ns    2070213
BM_SGEMM/32        1885 ns       1885 ns     739444
BM_SGEMM/40        2969 ns       2969 ns     472152
BM_SGEMM/64        9371 ns       9372 ns     148932
BM_SGEMM/72       12431 ns      12431 ns     112919
BM_SGEMM/80       15615 ns      15616 ns      89978
BM_SGEMM/90       25397 ns      25398 ns      55041
BM_SGEMM/100      29445 ns      29446 ns      47540
BM_SGEMM/112      37530 ns      37531 ns      37286
BM_SGEMM/128      55373 ns      55375 ns      25277
BM_SGEMM/140      76241 ns      76241 ns      18259
BM_SGEMM/150     102196 ns     102200 ns      13736
BM_SGEMM/160     101521 ns     101525 ns      13556
BM_SGEMM/170     136182 ns     136184 ns      10567
BM_SGEMM/180     146861 ns     146864 ns       9035
BM_SGEMM/189     192632 ns     192632 ns       7231
BM_SGEMM/200     198547 ns     198555 ns       6995
BM_SGEMM/256     392316 ns     392330 ns       3539
```

Before, when built with USE_THREAD=1, GEMM_MULTITHREAD_THRESHOLD = 4, the cost
of small matrix operations was overshadowed by thread locking (look smaller than
32) even when not explicitly spawning threads:
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4          328 ns        328 ns    4170562
BM_SGEMM/6          396 ns        396 ns    3536400
BM_SGEMM/8          418 ns        418 ns    3330102
BM_SGEMM/10         491 ns        491 ns    2863047
BM_SGEMM/16         710 ns        710 ns    2028314
BM_SGEMM/20         871 ns        871 ns    1581546
BM_SGEMM/32        2132 ns       2132 ns     657089
BM_SGEMM/40        3197 ns       3196 ns     437969
BM_SGEMM/64        9645 ns       9645 ns     144987
BM_SGEMM/72       35064 ns      32881 ns      50264
BM_SGEMM/80       37661 ns      35787 ns      42080
BM_SGEMM/90       36507 ns      36077 ns      40091
BM_SGEMM/100      32513 ns      31850 ns      48607
BM_SGEMM/112      41742 ns      41207 ns      37273
BM_SGEMM/128      67211 ns      65095 ns      21933
BM_SGEMM/140      68263 ns      67943 ns      19245
BM_SGEMM/150     121854 ns     115439 ns      10660
BM_SGEMM/160     116826 ns     115539 ns      10000
BM_SGEMM/170     126566 ns     122798 ns      11960
BM_SGEMM/180     130088 ns     127292 ns      11503
BM_SGEMM/189     120309 ns     116634 ns      13162
BM_SGEMM/200     114559 ns     110993 ns      10000
BM_SGEMM/256     217063 ns     207806 ns       6417
```
and after, it's gone (note this includes my other change which reduces calls
to num_cpu_avail):
```
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_SGEMM/4           95 ns         95 ns   12347650
BM_SGEMM/6          166 ns        166 ns    8259683
BM_SGEMM/8          193 ns        193 ns    7162210
BM_SGEMM/10         258 ns        258 ns    5415657
BM_SGEMM/16         471 ns        471 ns    2981009
BM_SGEMM/20         666 ns        666 ns    2148002
BM_SGEMM/32        1903 ns       1903 ns     738245
BM_SGEMM/40        2969 ns       2969 ns     473239
BM_SGEMM/64        9440 ns       9440 ns     148442
BM_SGEMM/72       37239 ns      33330 ns      46813
BM_SGEMM/80       57350 ns      55949 ns      32251
BM_SGEMM/90       36275 ns      36249 ns      42259
BM_SGEMM/100      31111 ns      31008 ns      45270
BM_SGEMM/112      43782 ns      40912 ns      34749
BM_SGEMM/128      67375 ns      64406 ns      22443
BM_SGEMM/140      76389 ns      67003 ns      21430
BM_SGEMM/150      72952 ns      71830 ns      19793
BM_SGEMM/160      97039 ns      96858 ns      11498
BM_SGEMM/170     123272 ns     122007 ns      11855
BM_SGEMM/180     126828 ns     126505 ns      11567
BM_SGEMM/189     115179 ns     114665 ns      11044
BM_SGEMM/200      89289 ns      87259 ns      16147
BM_SGEMM/256     226252 ns     222677 ns       7375
```

I've also tested this with ThreadSanitizer and found no data races during
execution.  I'm not sure why 200 is always faster than it's neighbors, we must
be hitting some optimal cache size or something.
2018-06-14 16:54:58 +01:00
Martin Kroeker 63f7395fb4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:31:38 +02:00
Martin Kroeker 38ad05bd04
Extend loop range to find SkylakeX in force_coretype 2018-06-05 10:26:49 +02:00
Martin Kroeker 8be027e4c6
Update dynamic.c 2018-06-04 14:36:39 +02:00
Martin Kroeker ac7b6e3e9a
Fix misplaced endif 2018-06-04 08:23:40 +02:00
Martin Kroeker ef626c6824
typo fix 2018-06-04 00:13:19 +02:00
Martin Kroeker 5a51cf4576
Separate Skylake X from Skylake 2018-06-03 23:41:33 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Matthew Brett a8002e283a Revert "take out unused variables"
This reverts commit e5752ff9b3.

The variables i and n are used in the `#if !__GLIBC_PREREQ(2, 7)`
branch.

Closes gh-1586.
2018-06-01 23:20:00 +01:00
Martin Kroeker 191746c493
Merge pull request #1557 from martin-frbg/getconfig
Add threading and OpenMP information to output
2018-05-14 17:37:55 +02:00
Martin Kroeker 41ae8e8d67
Add threading and OpenMP information to output
For #1416 and #1529, more information about the options OpenBLAS was built with is needed. Additionally we may want to add this data to the openblas.pc file (but not all projects use pkgconfig, and as far as I am aware the cmake module for accessing it does not make such "private" declarations available)
2018-05-12 12:11:38 +02:00
zhiyong.dang 53457f222f move _Atomic define to common.h 2018-05-11 00:13:16 -07:00
Zhiyong Dang 3716267124 Change _STDC_VERSION__ to __STDC_VERSION__
Change-Id: Id3fa4e8d9eedd4ef7230df69b611e7f397301a42
2018-05-11 12:15:08 +08:00
Zhiyong Dang 1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Martin Kroeker f29389c7ac
Merge pull request #1520 from martin-frbg/cpucounts
Catch invalid cpu count returned by CPU_COUNT_S
2018-04-14 22:24:34 +02:00
Martin Kroeker 7c861605b2
Catch invalid cpu count returned by CPU_COUNT_S
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker d636b418af
Merge pull request #1504 from ararslan/aa/openbsd
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan a41d241a0e
Add support for DragonFly BSD 2018-04-03 16:39:29 -07:00
Alex Arslan 8da6b6ae52
Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker 01c4b82f04
Update memory.c 2018-03-31 22:32:06 +02:00
Martin Kroeker 93db123f7e
Update memory.c 2018-03-29 13:13:49 +02:00
Martin Kroeker 752fdb5dd8
Add workaround for old gcc and clang versions
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker 7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker 8866e393a2
Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker 3119b2ab4c
Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray 8f5f614615 On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe. 2018-02-06 12:23:27 +01:00
Erik M. Bray f5fc109fbd Perform blas_thread_shutdown with pthread_atfork() on Cygwin
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Andrew e5752ff9b3 take out unused variables 2018-01-20 11:42:31 +01:00
Andrew 8a0b086b28 add missing bracket for old glibc (cppcheck) 2018-01-12 22:35:48 +01:00
Andrew 8aafa0473c address last warnings as seen by gcc7 2018-01-01 20:57:12 +01:00
Andrew bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Martin Kroeker 28ae3ca76f
Limit MAX_CPU to 1024 for now
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348, the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the 
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker 07e7c36dac
Handle shmem init failures in cpu affinity setup code
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker 2a6fef9a55
Try to handle shmget or shmat failing
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker 514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Martin Kroeker ba1f91f17b Convert another caller of "allocation" to LOCK_COMMAND
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker f460776f0f Fix thread data races 2017-09-09 19:07:06 +02:00
Martin Kroeker e882f3d6f3 Fix thread data race in memory.c 2017-09-09 18:58:38 +02:00
Sacha Refshauge 37858d1146 Fix threading usage in CMake: s/SMP/USE_THREAD/ 2017-08-19 15:07:42 +10:00
Isuru Fernando 2f12ea017b No strncasecmp with MSVC 2017-08-08 00:07:25 +05:30
Martin Kroeker ebb04e3265 Merge pull request #1259 from isuruf/cmake
CMake Improvements
2017-08-02 15:31:05 +02:00
Isuru Fernando d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Martin Kroeker 63cfa32691 Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds 2017-07-31 21:02:43 +02:00
Martin Kroeker c4af196a2d Honor cgroup/cpuset limits when enumerating cpus 2017-07-25 22:47:34 +02:00
Martin Kroeker 480e697681 Revert "Honor cgroup/cpuset limits when enumerating cpus" (#1246) 2017-07-24 16:17:50 +02:00
Martin Kroeker 80373ea039 More fixes for silly misedits 2017-07-15 12:48:42 +02:00
Martin Kroeker d12b75a6c4 Fixup braces lost in previous edit 2017-07-15 11:53:28 +02:00
Martin Kroeker 7294fb1d9d Merge branch 'develop' into cgroups 2017-07-15 10:40:42 +02:00
Martin Kroeker 731c518cff Add files via upload 2017-07-11 18:42:39 +02:00
Martin Kroeker 29fc429d9a Honor cgroup/cpuset constraints when enumerating cpus 2017-07-11 18:27:33 +02:00
Martin Kroeker 3db2adf872 Merge pull request #1230 from martin-frbg/rhel5
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker c1cf62d2c0 Add sched_getcpu implementation for pre-2.6 glibc
Fixes #1210, compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Neil Shipp 34513be726 Add Microsoft Windows 10 UWP build support 2017-06-23 13:07:34 -07:00
Neil Shipp 65e56cb29d Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
James Cowgill 59c97cfee4 memory: Fix buffer overflow when position == NUM_BUFFERS 2017-05-05 17:47:03 +01:00
James Cowgill 5fecfe0f42 memory: switch loop condition around in blas_memory_free
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.

This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto 9c884986ad Add an extra familiy/model combination used by AMD Steamrolller (Godavari). 2017-04-19 19:15:47 +02:00
Gian-Carlo Pascutto 0cbd2d34e4 Recognize ZEN when passed as OPENBLAS_CORETYPE. 2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto 62979fd104 Fix dynamic detection for ZEN CPUs. 2017-04-10 19:08:37 +02:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Martin Kroeker ffc1d6c468 Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K a86474c6f7 THUNDERX2T99: Performance fix for ZGEMM 2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K 19ba133383 THUNDERX2T99: Add Optimized ZGEMM Implementation 2017-02-28 05:31:41 +00:00
Andrew 5088523786 detect apollo lake for real 2017-02-20 23:54:59 +01:00
Elliot Saba 1d8ab99e09 Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection 2017-02-10 15:23:55 -08:00
Martin Koehler 76c6e33e54 Enable EXCAVATOR kernels for A12-9800 2017-02-07 21:38:28 +01:00
Ashwin Sekhar T K 2757b49767 THUNDERX2T99: Add Optimized CGEMM Implementation 2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K f279ff4789 THUNDERX2T99: Add Optimized SGEMM Implementation 2017-01-16 21:44:33 +05:30
Zhang Xianyi 0863a0d4b4 Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar c1c5a63d3c prepared parameter.c for UNROLL values, that are not a power of two 2017-01-11 09:50:28 +01:00
Ashwin Sekhar T K 4b55fae337 ARM64: Add Cavium THUNDERX2T99 Target 2017-01-11 11:18:40 +05:30
Ashwin Sekhar T K 0b8e876d89 VULCAN: Add optimized DGEMM implementation 2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K 4713e7c47f ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
jiahaipeng 1aa1e6cb54 modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value 2017-01-10 11:47:06 +08:00
Martin Kroeker 51aa157e64 Relocate declaration of alloc_lock outside ifdef block 2017-01-09 01:10:43 +01:00
Martin Kroeker 87c7d10b34 Fix thread data races detected by helgrind 3.12
Ref. #995, may possibly help solve issues seen in 660,883
2017-01-08 23:33:51 +01:00
Martin Kroeker 0ef7841473 Update xerbla.c 2017-01-04 23:16:48 +01:00
Martin Kroeker 104ad066af Use appropriate int32/int64 format for error number in message string 2016-12-30 00:45:59 +01:00
Alex Arslan a16ace68f5
Include system headers on FreeBSD 2016-11-16 21:58:20 -08:00
Martin Kroeker 596ead0f8d Add files via upload 2016-11-06 23:26:39 +01:00
Zhang Xianyi 66c9a9b33d Merge pull request #981 from howard0su/develop
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
2016-10-17 11:32:57 +08:00
Martin Kroeker 8a8f3932eb Update dynamic.c
Add Bay Trail "Pentium N3520" atom
2016-10-16 22:40:00 +02:00
Howard Su ff1da01476 USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU.
2016-10-13 12:37:50 +00:00
Zhang Xianyi ef52a9266b Fixed #979. Patch for NetBSD. 2016-10-13 10:17:07 +08:00
Martin Kroeker 7de829f713 Update dynamic.c
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Ashwin Sekhar T K 0fb380c966 Update NUMA CPU binding
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
2016-04-29 11:58:15 +05:30
Werner Saar 78b05f6476 bugfix for EXCAVATOR and DYNAMIC_ARCH 2016-04-25 10:13:30 +02:00
Werner Saar 2b967590a0 bugfix in dynamic.c 2016-04-25 09:08:38 +02:00
Theoractice aa744dfa59 Update memory.c 2016-03-22 20:02:37 +08:00
theoractice 61cf8f74d9 Fix access violation on Windows while static linking 2016-03-22 19:14:54 +08:00
Zhang Xianyi 68eb4fa329 Add missing openblas_env makefile. 2016-03-09 14:52:47 -05:00
Zhang Xianyi 05196a8497 Refs #716. Only call getenv at init function. 2016-03-09 12:50:07 -05:00
Zhang Xianyi 1edf30b790 Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name. 2016-03-01 20:13:08 +08:00
Zhang Xianyi 6b85dbb6dc Refs #696. Turn off stack limit setting on Linux.
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:21:42 -05:00
Martin Kroeker 935356c34f Update dynamic.c and cpuid_x86.c for Intel Avoton.
Second part of "support Intel Avoton via Nehalem kernel"
2016-02-02 13:42:55 -05:00
Zhang Xianyi f5df444ceb Merge pull request #762 from jeromerobert/bug760
Let openblas_get_num_threads return the number of active threads
2016-01-26 08:45:16 -06:00
Zhang Xianyi aaa8551c57 Merge pull request #749 from lotheac/illumos_fixes
illumos fixes
2016-01-26 08:42:20 -06:00
Jerome Robert 0d87c1ffb6 Let openblas_get_num_threads return the number of active threads
... not the number of allocated threads.

Close #760
2016-01-26 13:04:16 +01:00
Lauri Tirkkonen e737e32fd1 RLIMIT_NPROC doesn't exist on illumos 2016-01-22 18:55:51 +02:00
Lauri Tirkkonen 97cd4b8aee illumos fixes to memory.c 2016-01-22 18:55:43 +02:00
Werner Saar 0d22551a6b increase the stack size limit in the constructor 2015-11-20 09:23:01 +01:00
Ralph Campbell fbc21266e6 Minor C code fixes in driver/ 2015-11-09 14:15:49 +05:30
Zhang Xianyi 839395fc25 Detect AMD Trinity and Richland. 2015-10-29 02:53:29 +08:00
j-bo 6040858b22 Fix #673
Add lacking headers declarations when compiling for Android ARM7
2015-10-27 13:55:24 +01:00
Zhang Xianyi 70642fe4ed Refs #668. Raise the signal when pthread_create fails.
Thank James K. Lowden for the patch.
2015-10-26 19:02:51 -05:00
Zhang Xianyi 2feef49fa8 Merge branch 'develop' into cmake
Conflicts:
	driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi 1ce054fcb3 Refs #669. Fixed the build bug with gcc on Mac OS X. 2015-10-22 11:07:35 -05:00
Zhang Xianyi 94b125255f Merge branch 'develop' into cmake
Conflicts:
	driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi 11ac4665c8 Fixed #654. Make sure the gotoblas_init function is run before all other static initializations. 2015-10-05 14:14:32 -05:00
Zhang Xianyi cc7cab8a45 Detect other Intel Skylake cores.
http://users.atw.hu/instlatx64/
2015-09-09 10:47:17 -05:00
Yichao Yu 61ae47eb99 Ref #632. Support Intel Skylake by Haswell kernels. 2015-09-09 11:07:33 -04:00
Grazvydas Ignotas d3e2f0a1af add missing barriers
should fix issue #597
2015-08-16 15:37:02 +02:00
Zhang Xianyi f874465bb8 Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi dcd5ba4443 Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake 2015-07-22 04:06:39 +08:00
Zhang Xianyi a11555c715 Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7

In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/.
2015-05-20 21:57:27 -05:00
Zhang Xianyi 51ff17d46e Add AMD Excavator target. 2015-05-13 16:16:30 -05:00
powderluv ebb9eba987 Fix build with ALLOC_SHM=0 (Android NDK)
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
2015-05-10 00:10:26 -07:00
Zhang Xianyi 9798481979 Refs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi 8977b3f235 Refs #529. Support Intel Broadwell by Haswell kernels. 2015-04-02 11:08:03 -05:00
Zhang Xianyi e95d64333a Refs #519. Avoid calling strncpy. 2015-03-19 15:57:22 -05:00
Ton van den Heuvel b6438dedea Fix issue #508
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Hank Anderson 5ae8993752 Added intrinsics for MSVC. 2015-02-25 11:52:51 -06:00
Hank Anderson 84d90d6ed8 Fixed some compiler errors/warnings for clang. 2015-02-25 11:52:25 -06:00
Hank Anderson 0d8e227ea7 Changed strategy for setting preprocessor definitions.
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.

This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson 4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson c94fe71278 Removed incoming-stack-boundary for MSVC.
Made float type optional for GenerateNamedObjects.

Called GenerateNamedObjects for a couple of driver/others files that
needed NAME/CNAME set.
2015-02-11 10:54:14 -06:00
Hank Anderson 7fa5c4e2fd Fixed some case issues with ARCH.
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Zhang Xianyi cfa9392ffa Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread. 2015-02-08 01:30:23 -06:00
Hank Anderson 2828f6630c Added SMP sources to COMMONOBJS. 2015-02-04 14:01:36 -06:00
Erik Schnetter 65a847cd36 Introduce openblas_get_num_threads and openblas_get_num_procs 2015-02-03 12:23:41 -05:00
Hank Anderson 7194424fef Added missing common objects to the library. 2015-02-02 15:21:29 -06:00
Hank Anderson 5057a4b4df Added openblas add_library call that uses DBLAS_OBJS ojbects. 2015-01-30 15:21:21 -06:00
Hank Anderson 3e8ea7a351 Added COMMONOBJS to driver/others CMakeLists.txt. 2015-01-30 14:06:14 -06:00
Hank Anderson 8d9b196e0d Moved loop over define combos into a function.
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Werner Saar 0dc559ed30 bugfix in dynamic.c 2014-12-28 17:15:42 +01:00
Werner Saar 4319769b79 added target processor STEAMROLLER 2014-12-28 20:16:46 +08:00
Zhang Xianyi 2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Zhang Xianyi 695e0fa649 #463 fixed a compiling bug on AIX. 2014-11-10 14:39:56 +08:00
wernsaar a64fe9bcc9 added optimized sgemv_n kernel for sandybridge 2014-09-06 08:41:53 +02:00
wernsaar 2021d0f9d6 experimentally removed expensive function calls 2014-09-05 15:05:53 +02:00
Isaac Dunham f7eb81a846 Fix link error on Linux/musl.
get_nprocs() is a GNU convenience function equivalent to POSIX2008
sysconf(_SC_NPROCESSORS_ONLN); the latter should be available in unistd.h
on any current *nix. (OS X supports this call since 10.5, and FreeBSD
currently supports it. But this commit does not change FreeBSD or OS X
versions.)
2014-08-03 15:06:30 -07:00
wernsaar 793175be3a added experimental support for big numa machines 2014-08-02 13:40:16 +02:00
Zhang Xianyi c94762bb56 Refs #401. Added NO_AVX2 flag for old binutils (e.g. RHEL6) 2014-07-16 08:38:25 +08:00
Zhang Xianyi 552119c484 Fixed #407. Support outputing the CPU corename on runtime.
The user can use char * openblas_get_config() or char * openblas_get_corename().
2014-07-08 12:48:08 +08:00
wernsaar 50e99a52ea added definitions for PILEDRIVER and HASWELL 2014-07-06 12:08:27 +02:00
Zhang Xianyi 7a8949e0ce Merge branch 'develop' of https://github.com/TimothyGu/OpenBLAS into TimothyGu-develop
Conflicts:
	driver/others/memory.c
2014-06-28 20:51:31 +08:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Jameson Nash f41f03ab83 fix #394. this cleans up some handles after using them, and doesn't disable ALL process privileges upon success 2014-06-27 12:16:57 -04:00
wernsaar 438002204d Ref #393: fix for INTERFACE64=0 and ARCH_X86 in divtable 2014-06-21 12:29:23 +02:00
wernsaar 53bfa51ee0 Ref #385: fixed warnings in dynamic.c 2014-06-12 18:17:08 +02:00
wernsaar a86d349a51 Ref #380: enhancements for dynamic_arch 2014-06-12 14:20:03 +02:00
wernsaar a35a1a9ae7 changed makefiles for lapack development 2014-05-07 11:33:02 +02:00
Olivier Grisel 2c556f093a Add cast to function pointer to remove warning 2014-02-25 11:08:32 +01:00
Olivier Grisel 3b027d2528 Do not reference pthread_atfork in non-SMP_SERVER mode 2014-02-25 11:08:32 +01:00
Olivier Grisel 49bd98f410 Do not reference pthread_atfork under windows 2014-02-19 19:25:48 +01:00
Olivier Grisel 138a841390 FIX #294: make OpenBLAS thread-pool resilient to fork via pthread_atfork 2014-02-19 19:01:15 +01:00
Olivier Grisel 046e4013cb Revert "Refs #294. Used pthread_atfork to avoid hang after a Unix fork."
This reverts commit 3617c22a56.
2014-02-19 18:32:54 +01:00
Zhang Xianyi 3617c22a56 Refs #294. Used pthread_atfork to avoid hang after a Unix fork.
The problem is the mutex we used in blas_server. Thus, we must clear
the mutex before the fork and re-init them at parent and child process.

If you used OpenMP, GOMP has the same problem by now. Please try other OpenMP
implemantation.
2014-02-18 15:36:04 +08:00
Zhang Xianyi 8c7687b419 Refs #338. Added OPENBLAS_VERBOSE environment variable on runtime
By default, OpenBLAS doesn't output the warning message. You can set
OPENBLAS_VERBOSE (e.g. export OPENBLAS_VERBOSE=1) to enable the warning
message on runtime.
2014-01-24 02:05:59 +08:00
Zhang Xianyi ab69443bd4 Refs #332. Added addtional Intel Ivy Bridge and Haswell CPU-id. 2014-01-05 23:44:29 +08:00
Zhang Xianyi b263e096af Refs #307. Delete debug printf. 2013-12-31 15:53:13 +08:00
wernsaar 0b6e13b689 Merge remote branch 'origin/develop' into haswell 2013-12-01 13:38:11 +01:00
wernsaar 5c648a8984 Merge remote branch 'origin/develop' into haswell 2013-12-01 11:25:33 +01:00
Zhang Xianyi 5048a80032 Refs #283. Fixed the incorrect usage of long data type for Windows 64. 2013-11-14 13:46:42 +08:00
Zhang Xianyi a2942456ef Refs #307. Fixed the hang bug when free OpenBLAS dll in Windows. 2013-11-13 10:00:18 +08:00
Sébastien Villemot eae4cfa3f6 Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.

Closes #272
2013-08-28 14:29:42 +02:00
Zhang Xianyi 2638370844 Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
Zhang Xianyi 673e453b3f Enable bulldozer kernels. 2013-08-05 16:07:54 +08:00
Zhang Xianyi 143cca4dd5 Merge branch 'develop' into bulldozer 2013-08-05 15:51:53 +08:00
Zhang Xianyi 534c5ec919 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-29 16:48:35 +08:00
Zhang Xianyi 5b504d6c23 Refs #263. Rollback bulldozer and piledriver kernels to barcelona kernels. 2013-07-28 17:39:24 +08:00
Zhang Xianyi 72b1edaf1b Merge branch 'develop' into bulldozer
Conflicts:
	kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
Zhang Xianyi 4471c77905 Fixed #261. Use strncmp instead of a comparing trick. 2013-07-26 23:43:54 +08:00
Zhang Xianyi 2a7503e563 Refs #225. Fixed a bug in GEMM OpenMP threading. 2013-07-15 09:56:19 +08:00
grisuthedragon c19a488af2 create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
2013-07-11 21:39:19 +08:00
Zhang Xianyi f54f5bac9e Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.

typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;

job_t          job[MAX_CPU_NUMBER];

The job array is equal 8MB.

Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
Zhang Xianyi 32dbeb636d Refs #221. Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256. 2013-07-02 14:17:55 +08:00
Dan Luu 88ef307cef Refs #241. Add Haswell support (using sandybridge optimizations) 2013-06-30 22:35:14 +08:00
Zhang Xianyi 65ffead0cf Refs #124. Check XSAVE flag on x86 CPU. 2013-06-06 22:50:43 +08:00
Zhang Xianyi f1ce74ffdd Improved the print when OS don't support AVX. 2013-03-02 14:15:54 +08:00
Zhang Xianyi d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 2013-03-01 14:36:47 +08:00
Zhang Xianyi 3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 2013-02-26 00:48:21 +08:00
Zhang Xianyi 5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi 6ae2f868fd Set the affinity. Only use 1 core of each module on bulldozer. 2013-02-09 18:19:02 +01:00
Zhang Xianyi 299b5a44dc Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer 2013-02-09 16:22:04 +01:00
Zhang Xianyi 8cdb795438 Refs #187. Use binary code for xgetbv, which is compatible with old compiler. 2013-01-22 00:25:08 +08:00
Zhang Xianyi a4ee6f3915 Fixed #172. Support Intel Xeon E7540. 2012-12-18 08:57:46 +08:00
Zhang Xianyi fba6b590f2 Merge branch 'master' into develop 2012-12-15 22:49:37 +08:00
Julian Taylor 1138817dd2 add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi bdf8d9411e Refs #163. Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.

Example:
 #include <stdio.h>
extern char * openblas_get_config();
void main()
{
  printf("%s\n",openblas_get_config());
  return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi 6751f7b9a7 Fixed #157. Only detect the number of physical CPU cores on Mac OSX. 2012-11-13 15:48:57 +08:00
Zhang Xianyi 538c764d2b Refs #153. Restore the original CPU affinity when calling openblas_set_num_threads(1).
Please read the issue on github.com for the detail.
2012-11-06 18:21:46 +08:00
Zhang Xianyi 6c5899dff5 Don't use xgetbv instruction when NO_AVX=1 2012-10-09 14:52:35 +08:00
Zhang Xianyi 735ca38b8f Refs #139. Check OS supporting AVX on runtime. 2012-09-18 15:46:20 +08:00
Zhang Xianyi f76a384841 Refs #139. Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
2012-09-17 23:25:46 +08:00
Jameson Nash d0e731e8b8 provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line 2012-08-21 00:31:12 -04:00
Zhang Xianyi fe4ab95cd5 Refs #136. Fixed a bug about controlling the number of threads on Windows. 2012-08-19 23:50:54 +08:00
Xianyi Zhang 801383effe Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows. 2012-08-14 18:34:32 +08:00
Zhang Xianyi 54cd65e47f Use sandy bridge kernel when DYNAMIC_ARCH=1. 2012-08-13 15:25:08 +08:00
Zhang Xianyi a55821a2ec Refs #132. Kill the threads when unload the library. 2012-08-11 21:33:15 +08:00
Zhang Xianyi d007cca61d Refs #134. Fixed the building bug on IBM Power. 2012-08-10 11:54:21 +08:00
Xianyi Zhang 25f1a573fd Fixed the build bug when DYNAMIC_ARCH=0. 2012-07-07 12:12:24 +08:00
Sylvestre Ledru 3692b4d631 Improve the detection of sparc 2012-07-02 02:51:38 +02:00
Xianyi Zhang a507b56ab1 Refs #119 #118. Fixed disabling hyper threading bug. 2012-06-29 15:53:24 +08:00
Xianyi Zhang 853d16ed7e Added openblas_set_num_threads dummy function on Windows. We plan to implement this feature in next version. 2012-06-23 13:07:38 +08:00
Zhang Xianyi 422359d09a Export openblas_set_num_threads in shared library. 2012-06-23 11:32:43 +08:00
Zhang Xianyi d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 2012-05-31 22:40:15 +08:00
Zhang Xianyi d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 2012-05-31 18:17:45 +08:00
Zhang Xianyi 90d6ad569d Merge branch 'sandybridge' into develop
Just copy the kernel codes from Nehalem. The optimization is ongoing.
2012-05-31 12:44:55 +08:00
Xianyi Zhang a6adbb299d Refs #112. Improved setting thread affinity in Linux. Remove the limit (64) about the number of CPU cores. 2012-05-29 15:23:52 +08:00
Xianyi Zhang a53c6e2440 Merge branch 'develop' into sandybridge 2012-05-25 23:16:44 +08:00
Zaheer Chothia a431042475 Fix inconsistent case for OS_* macros (Refs pull request #111) 2012-05-23 00:01:14 +02:00
Mike Nolta 4e29b6ffc0 FreeBSD: fix OS_FreeBSD -> OS_FREEBSD typos 2012-05-21 16:57:19 -04:00
Xianyi Zhang 19a48b82cf Init Sandybridge codes based on Nehalem. 2012-03-30 20:01:03 +08:00
Xianyi Zhang 0b89a7a92d Ref #82. Disable outputing debug information in alloc_mmap. 2012-03-23 18:17:12 +08:00
Wang Qian 8163ab7e55 Change the block size on Loongson 3B. 2011-11-23 18:41:49 +00:00
Xianyi Zhang ef6f7f32ae Fixed mbind bug on Loongson 3B. Check the return value of my_mbind function. 2011-11-23 17:17:41 +00:00
Xianyi Zhang b95ad4cfaf Support detecting ICT Loongson-3B CPU. 2011-11-09 19:29:50 +00:00
traz 831858b883 Modify aligned address of sa and sb to improve the performance of multi-threads. 2011-09-23 20:59:48 +00:00
Xianyi Zhang 16fc083322 Refs #47. Fixed the seting parameter bug on Loongson 3A single thread version. 2011-09-08 16:39:34 +00:00
Xianyi Zhang 3c856c0c1a Check the return value of pthread_create. Update the docs with known issue on Loongson 3A. 2011-09-06 18:27:33 +00:00
Xianyi Zhang 4727fe8abf Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. 2011-09-05 15:13:52 +00:00
Xianyi Zhang 82f5274828 Refs #39. It's unnecessary to include sys/mman.h file in blas_server_omp.c. 2011-06-22 01:52:20 +08:00
Xianyi Zhang 1496383224 Print the wall time (cycles) with enabling FUNCTION_PROFILE. 2011-06-09 10:40:15 +08:00
Xianyi Zhang af40551c9f Fixed the makefile bug about openblas_set_num_threads. 2011-05-27 21:15:30 +08:00
Xianyi Zhang 417b8ec792 Added openblas_set_num_threads for Fortran. 2011-05-06 17:03:35 +08:00
Xianyi Zhang 989c6f8b06 Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64. 2011-04-07 14:48:10 +08:00
Xianyi Zhang e4bb6f2482 Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch. 2011-03-22 14:09:47 +08:00
Xianyi Zhang f7a5e049e2 Enable Debug flags in memory alloc and init functions. 2011-02-26 11:51:39 +08:00
Xianyi Zhang 128418f49b Fixed #10. Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables. 2011-02-24 16:32:13 +08:00
Xianyi Zhang e51364edb4 Fixed #5 Detected Intel Westmere (using Nehalem codes) in build and dynamic arch build.
Thanks Cao He from Dawning supporting Intel Xeon 5660 testbed.
2011-02-19 00:03:50 +08:00
Xianyi Zhang e6c13e2b3c changed library name to openblas and modified environment variable. 2011-01-24 17:58:05 +00:00
Xianyi Zhang 5c9f1ebbf9 Fixed a bug when compiling dynamic ARCH x86 in GotoBLAS2. 2011-01-24 16:04:17 +00:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00