Martin Kroeker
7c861605b2
Catch invalid cpu count returned by CPU_COUNT_S
...
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
2018-04-14 18:29:10 +02:00
Martin Kroeker
d636b418af
Merge pull request #1504 from ararslan/aa/openbsd
...
Allow building on OpenBSD
2018-04-04 15:26:46 +02:00
Alex Arslan
a41d241a0e
Add support for DragonFly BSD
2018-04-03 16:39:29 -07:00
Alex Arslan
8da6b6ae52
Allow building on OpenBSD
...
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
2018-04-02 10:48:22 -07:00
Martin Kroeker
01c4b82f04
Update memory.c
2018-03-31 22:32:06 +02:00
Martin Kroeker
93db123f7e
Update memory.c
2018-03-29 13:13:49 +02:00
Martin Kroeker
752fdb5dd8
Add workaround for old gcc and clang versions
...
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
2018-03-29 11:56:56 +02:00
Martin Kroeker
7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
2018-02-21 11:45:33 +01:00
Martin Kroeker
8866e393a2
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:12 +01:00
Martin Kroeker
3119b2ab4c
Add locks only for non-OPENMP multithreading
...
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Erik M. Bray
8f5f614615
On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.
2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd
Perform blas_thread_shutdown with pthread_atfork() on Cygwin
...
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Andrew
e5752ff9b3
take out unused variables
2018-01-20 11:42:31 +01:00
Andrew
8a0b086b28
add missing bracket for old glibc (cppcheck)
2018-01-12 22:35:48 +01:00
Andrew
8aafa0473c
address last warnings as seen by gcc7
2018-01-01 20:57:12 +01:00
Andrew
bfc2a88594
remove unused buffer
2017-12-22 00:55:40 +01:00
Martin Kroeker
28ae3ca76f
Limit MAX_CPU to 1024 for now
...
Some Linux distributions (notably SuSE) have raised CPU_SETSIZE to 4096, apparently disregarding API limitations.
From #1348 , the highest value to survive array initialization (on a desktop system) is 3232, and 1024 - which is the
more usual CPU_SETSIZE limit, was demonstrated to work fine on an actual bignuma system.
2017-12-05 12:54:15 +01:00
Martin Kroeker
07e7c36dac
Handle shmem init failures in cpu affinity setup code
...
Failures to obtain or attach shared memory segments would lead to an exit without explanation of the exact cause.
This change introduces a more verbose error message and tries to make the code continue without setting cpu affinity.
Fixes #1351
2017-11-18 23:57:44 +01:00
Martin Kroeker
2a6fef9a55
Try to handle shmget or shmat failing
...
also replaces one verbatim sched_yield with the YIELDING macro for consistency as suggested in #1351
2017-11-09 23:16:13 +01:00
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
...
CMake improvements
2017-10-06 21:13:45 +02:00
Martin Kroeker
ba1f91f17b
Convert another caller of "allocation" to LOCK_COMMAND
...
... as the "allocation" code jumped to now does UNLOCK_COMMAND instead of blas_unlock
2017-09-09 20:30:33 +02:00
Martin Kroeker
f460776f0f
Fix thread data races
2017-09-09 19:07:06 +02:00
Martin Kroeker
e882f3d6f3
Fix thread data race in memory.c
2017-09-09 18:58:38 +02:00
Sacha Refshauge
37858d1146
Fix threading usage in CMake: s/SMP/USE_THREAD/
2017-08-19 15:07:42 +10:00
Isuru Fernando
2f12ea017b
No strncasecmp with MSVC
2017-08-08 00:07:25 +05:30
Martin Kroeker
ebb04e3265
Merge pull request #1259 from isuruf/cmake
...
CMake Improvements
2017-08-02 15:31:05 +02:00
Isuru Fernando
d245caa49a
Support out-of-source build
2017-08-01 15:16:14 +05:30
Martin Kroeker
63cfa32691
Rework __GLIBC_PREREQ checks to avoid breaking non-glibc builds
2017-07-31 21:02:43 +02:00
Martin Kroeker
c4af196a2d
Honor cgroup/cpuset limits when enumerating cpus
2017-07-25 22:47:34 +02:00
Martin Kroeker
480e697681
Revert "Honor cgroup/cpuset limits when enumerating cpus" ( #1246 )
2017-07-24 16:17:50 +02:00
Martin Kroeker
80373ea039
More fixes for silly misedits
2017-07-15 12:48:42 +02:00
Martin Kroeker
d12b75a6c4
Fixup braces lost in previous edit
2017-07-15 11:53:28 +02:00
Martin Kroeker
7294fb1d9d
Merge branch 'develop' into cgroups
2017-07-15 10:40:42 +02:00
Martin Kroeker
731c518cff
Add files via upload
2017-07-11 18:42:39 +02:00
Martin Kroeker
29fc429d9a
Honor cgroup/cpuset constraints when enumerating cpus
2017-07-11 18:27:33 +02:00
Martin Kroeker
3db2adf872
Merge pull request #1230 from martin-frbg/rhel5
...
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker
c1cf62d2c0
Add sched_getcpu implementation for pre-2.6 glibc
...
Fixes #1210 , compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Neil Shipp
34513be726
Add Microsoft Windows 10 UWP build support
2017-06-23 13:07:34 -07:00
Neil Shipp
65e56cb29d
Add 64bit support for Microsoft Visual Studio
2017-06-21 13:38:22 -07:00
James Cowgill
59c97cfee4
memory: Fix buffer overflow when position == NUM_BUFFERS
2017-05-05 17:47:03 +01:00
James Cowgill
5fecfe0f42
memory: switch loop condition around in blas_memory_free
...
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
2017-04-19 19:15:47 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Andrew
5088523786
detect apollo lake for real
2017-02-20 23:54:59 +01:00
Elliot Saba
1d8ab99e09
Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection
2017-02-10 15:23:55 -08:00
Martin Koehler
76c6e33e54
Enable EXCAVATOR kernels for A12-9800
2017-02-07 21:38:28 +01:00
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
...
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar
c1c5a63d3c
prepared parameter.c for UNROLL values, that are not a power of two
2017-01-11 09:50:28 +01:00
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
jiahaipeng
1aa1e6cb54
modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value
2017-01-10 11:47:06 +08:00
Martin Kroeker
51aa157e64
Relocate declaration of alloc_lock outside ifdef block
2017-01-09 01:10:43 +01:00
Martin Kroeker
87c7d10b34
Fix thread data races detected by helgrind 3.12
...
Ref. #995 , may possibly help solve issues seen in 660,883
2017-01-08 23:33:51 +01:00
Martin Kroeker
0ef7841473
Update xerbla.c
2017-01-04 23:16:48 +01:00
Martin Kroeker
104ad066af
Use appropriate int32/int64 format for error number in message string
2016-12-30 00:45:59 +01:00
Alex Arslan
a16ace68f5
Include system headers on FreeBSD
2016-11-16 21:58:20 -08:00
Martin Kroeker
596ead0f8d
Add files via upload
2016-11-06 23:26:39 +01:00
Zhang Xianyi
66c9a9b33d
Merge pull request #981 from howard0su/develop
...
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
2016-10-17 11:32:57 +08:00
Martin Kroeker
8a8f3932eb
Update dynamic.c
...
Add Bay Trail "Pentium N3520" atom
2016-10-16 22:40:00 +02:00
Howard Su
ff1da01476
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
...
to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU.
2016-10-13 12:37:50 +00:00
Zhang Xianyi
ef52a9266b
Fixed #979 . Patch for NetBSD.
2016-10-13 10:17:07 +08:00
Martin Kroeker
7de829f713
Update dynamic.c
...
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Ashwin Sekhar T K
0fb380c966
Update NUMA CPU binding
...
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
2016-04-29 11:58:15 +05:30
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
2b967590a0
bugfix in dynamic.c
2016-04-25 09:08:38 +02:00
Theoractice
aa744dfa59
Update memory.c
2016-03-22 20:02:37 +08:00
theoractice
61cf8f74d9
Fix access violation on Windows while static linking
2016-03-22 19:14:54 +08:00
Zhang Xianyi
68eb4fa329
Add missing openblas_env makefile.
2016-03-09 14:52:47 -05:00
Zhang Xianyi
05196a8497
Refs #716 . Only call getenv at init function.
2016-03-09 12:50:07 -05:00
Zhang Xianyi
1edf30b790
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
2016-03-01 20:13:08 +08:00
Zhang Xianyi
6b85dbb6dc
Refs #696 . Turn off stack limit setting on Linux.
...
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:21:42 -05:00
Martin Kroeker
935356c34f
Update dynamic.c and cpuid_x86.c for Intel Avoton.
...
Second part of "support Intel Avoton via Nehalem kernel"
2016-02-02 13:42:55 -05:00
Zhang Xianyi
f5df444ceb
Merge pull request #762 from jeromerobert/bug760
...
Let openblas_get_num_threads return the number of active threads
2016-01-26 08:45:16 -06:00
Zhang Xianyi
aaa8551c57
Merge pull request #749 from lotheac/illumos_fixes
...
illumos fixes
2016-01-26 08:42:20 -06:00
Jerome Robert
0d87c1ffb6
Let openblas_get_num_threads return the number of active threads
...
... not the number of allocated threads.
Close #760
2016-01-26 13:04:16 +01:00
Lauri Tirkkonen
e737e32fd1
RLIMIT_NPROC doesn't exist on illumos
2016-01-22 18:55:51 +02:00
Lauri Tirkkonen
97cd4b8aee
illumos fixes to memory.c
2016-01-22 18:55:43 +02:00
Werner Saar
0d22551a6b
increase the stack size limit in the constructor
2015-11-20 09:23:01 +01:00
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
839395fc25
Detect AMD Trinity and Richland.
2015-10-29 02:53:29 +08:00
j-bo
6040858b22
Fix #673
...
Add lacking headers declarations when compiling for Android ARM7
2015-10-27 13:55:24 +01:00
Zhang Xianyi
70642fe4ed
Refs #668 . Raise the signal when pthread_create fails.
...
Thank James K. Lowden for the patch.
2015-10-26 19:02:51 -05:00
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi
1ce054fcb3
Refs #669 . Fixed the build bug with gcc on Mac OS X.
2015-10-22 11:07:35 -05:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi
11ac4665c8
Fixed #654 . Make sure the gotoblas_init function is run before all other static initializations.
2015-10-05 14:14:32 -05:00
Zhang Xianyi
cc7cab8a45
Detect other Intel Skylake cores.
...
http://users.atw.hu/instlatx64/
2015-09-09 10:47:17 -05:00
Yichao Yu
61ae47eb99
Ref #632 . Support Intel Skylake by Haswell kernels.
2015-09-09 11:07:33 -04:00
Grazvydas Ignotas
d3e2f0a1af
add missing barriers
...
should fix issue #597
2015-08-16 15:37:02 +02:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Zhang Xianyi
a11555c715
Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
...
e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7
In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/.
2015-05-20 21:57:27 -05:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
powderluv
ebb9eba987
Fix build with ALLOC_SHM=0 (Android NDK)
...
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
2015-05-10 00:10:26 -07:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi
8977b3f235
Refs #529 . Support Intel Broadwell by Haswell kernels.
2015-04-02 11:08:03 -05:00
Zhang Xianyi
e95d64333a
Refs #519 . Avoid calling strncpy.
2015-03-19 15:57:22 -05:00
Ton van den Heuvel
b6438dedea
Fix issue #508
...
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Hank Anderson
5ae8993752
Added intrinsics for MSVC.
2015-02-25 11:52:51 -06:00
Hank Anderson
84d90d6ed8
Fixed some compiler errors/warnings for clang.
2015-02-25 11:52:25 -06:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
c94fe71278
Removed incoming-stack-boundary for MSVC.
...
Made float type optional for GenerateNamedObjects.
Called GenerateNamedObjects for a couple of driver/others files that
needed NAME/CNAME set.
2015-02-11 10:54:14 -06:00
Hank Anderson
7fa5c4e2fd
Fixed some case issues with ARCH.
...
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Zhang Xianyi
cfa9392ffa
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
2015-02-08 01:30:23 -06:00
Hank Anderson
2828f6630c
Added SMP sources to COMMONOBJS.
2015-02-04 14:01:36 -06:00
Erik Schnetter
65a847cd36
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 12:23:41 -05:00
Hank Anderson
7194424fef
Added missing common objects to the library.
2015-02-02 15:21:29 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00
Hank Anderson
3e8ea7a351
Added COMMONOBJS to driver/others CMakeLists.txt.
2015-01-30 14:06:14 -06:00
Hank Anderson
8d9b196e0d
Moved loop over define combos into a function.
...
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Werner Saar
0dc559ed30
bugfix in dynamic.c
2014-12-28 17:15:42 +01:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Zhang Xianyi
695e0fa649
#463 fixed a compiling bug on AIX.
2014-11-10 14:39:56 +08:00
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
2014-09-06 08:41:53 +02:00
wernsaar
2021d0f9d6
experimentally removed expensive function calls
2014-09-05 15:05:53 +02:00
Isaac Dunham
f7eb81a846
Fix link error on Linux/musl.
...
get_nprocs() is a GNU convenience function equivalent to POSIX2008
sysconf(_SC_NPROCESSORS_ONLN); the latter should be available in unistd.h
on any current *nix. (OS X supports this call since 10.5, and FreeBSD
currently supports it. But this commit does not change FreeBSD or OS X
versions.)
2014-08-03 15:06:30 -07:00
wernsaar
793175be3a
added experimental support for big numa machines
2014-08-02 13:40:16 +02:00
Zhang Xianyi
c94762bb56
Refs #401 . Added NO_AVX2 flag for old binutils (e.g. RHEL6)
2014-07-16 08:38:25 +08:00
Zhang Xianyi
552119c484
Fixed #407 . Support outputing the CPU corename on runtime.
...
The user can use char * openblas_get_config() or char * openblas_get_corename().
2014-07-08 12:48:08 +08:00
wernsaar
50e99a52ea
added definitions for PILEDRIVER and HASWELL
2014-07-06 12:08:27 +02:00
Zhang Xianyi
7a8949e0ce
Merge branch 'develop' of https://github.com/TimothyGu/OpenBLAS into TimothyGu-develop
...
Conflicts:
driver/others/memory.c
2014-06-28 20:51:31 +08:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Jameson Nash
f41f03ab83
fix #394 . this cleans up some handles after using them, and doesn't disable ALL process privileges upon success
2014-06-27 12:16:57 -04:00
wernsaar
438002204d
Ref #393 : fix for INTERFACE64=0 and ARCH_X86 in divtable
2014-06-21 12:29:23 +02:00
wernsaar
53bfa51ee0
Ref #385 : fixed warnings in dynamic.c
2014-06-12 18:17:08 +02:00
wernsaar
a86d349a51
Ref #380 : enhancements for dynamic_arch
2014-06-12 14:20:03 +02:00
wernsaar
a35a1a9ae7
changed makefiles for lapack development
2014-05-07 11:33:02 +02:00
Olivier Grisel
2c556f093a
Add cast to function pointer to remove warning
2014-02-25 11:08:32 +01:00
Olivier Grisel
3b027d2528
Do not reference pthread_atfork in non-SMP_SERVER mode
2014-02-25 11:08:32 +01:00
Olivier Grisel
49bd98f410
Do not reference pthread_atfork under windows
2014-02-19 19:25:48 +01:00
Olivier Grisel
138a841390
FIX #294 : make OpenBLAS thread-pool resilient to fork via pthread_atfork
2014-02-19 19:01:15 +01:00
Olivier Grisel
046e4013cb
Revert "Refs #294 . Used pthread_atfork to avoid hang after a Unix fork."
...
This reverts commit 3617c22a56
.
2014-02-19 18:32:54 +01:00
Zhang Xianyi
3617c22a56
Refs #294 . Used pthread_atfork to avoid hang after a Unix fork.
...
The problem is the mutex we used in blas_server. Thus, we must clear
the mutex before the fork and re-init them at parent and child process.
If you used OpenMP, GOMP has the same problem by now. Please try other OpenMP
implemantation.
2014-02-18 15:36:04 +08:00
Zhang Xianyi
8c7687b419
Refs #338 . Added OPENBLAS_VERBOSE environment variable on runtime
...
By default, OpenBLAS doesn't output the warning message. You can set
OPENBLAS_VERBOSE (e.g. export OPENBLAS_VERBOSE=1) to enable the warning
message on runtime.
2014-01-24 02:05:59 +08:00
Zhang Xianyi
ab69443bd4
Refs #332 . Added addtional Intel Ivy Bridge and Haswell CPU-id.
2014-01-05 23:44:29 +08:00
Zhang Xianyi
b263e096af
Refs #307 . Delete debug printf.
2013-12-31 15:53:13 +08:00
wernsaar
0b6e13b689
Merge remote branch 'origin/develop' into haswell
2013-12-01 13:38:11 +01:00
wernsaar
5c648a8984
Merge remote branch 'origin/develop' into haswell
2013-12-01 11:25:33 +01:00
Zhang Xianyi
5048a80032
Refs #283 . Fixed the incorrect usage of long data type for Windows 64.
2013-11-14 13:46:42 +08:00
Zhang Xianyi
a2942456ef
Refs #307 . Fixed the hang bug when free OpenBLAS dll in Windows.
2013-11-13 10:00:18 +08:00
Sébastien Villemot
eae4cfa3f6
Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
...
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.
Closes #272
2013-08-28 14:29:42 +02:00
Zhang Xianyi
2638370844
Init code base for Intel Haswell.
2013-08-13 00:54:59 +08:00
Zhang Xianyi
673e453b3f
Enable bulldozer kernels.
2013-08-05 16:07:54 +08:00
Zhang Xianyi
143cca4dd5
Merge branch 'develop' into bulldozer
2013-08-05 15:51:53 +08:00
Zhang Xianyi
534c5ec919
Fixed #261 . Use strncmp instead of a comparing trick.
2013-07-29 16:48:35 +08:00
Zhang Xianyi
5b504d6c23
Refs #263 . Rollback bulldozer and piledriver kernels to barcelona kernels.
2013-07-28 17:39:24 +08:00
Zhang Xianyi
72b1edaf1b
Merge branch 'develop' into bulldozer
...
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
2013-07-28 06:38:25 +02:00
Zhang Xianyi
4471c77905
Fixed #261 . Use strncmp instead of a comparing trick.
2013-07-26 23:43:54 +08:00
Zhang Xianyi
2a7503e563
Refs #225 . Fixed a bug in GEMM OpenMP threading.
2013-07-15 09:56:19 +08:00
grisuthedragon
c19a488af2
create openblas_get_parallel to retrieve information which
...
parallelization model is used by OpenBLAS.
2013-07-11 21:39:19 +08:00
Zhang Xianyi
f54f5bac9e
Refs #248 . Fixed the LSB compatiable issue for BLAS only.
...
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
...
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
2013-07-06 12:06:43 -03:00
Zhang Xianyi
32dbeb636d
Refs #221 . Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.
2013-07-02 14:17:55 +08:00
Dan Luu
88ef307cef
Refs #241 . Add Haswell support (using sandybridge optimizations)
2013-06-30 22:35:14 +08:00
Zhang Xianyi
65ffead0cf
Refs #124 . Check XSAVE flag on x86 CPU.
2013-06-06 22:50:43 +08:00
Zhang Xianyi
f1ce74ffdd
Improved the print when OS don't support AVX.
2013-03-02 14:15:54 +08:00
Zhang Xianyi
d744c9590a
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
2013-03-01 14:36:47 +08:00
Zhang Xianyi
3cc6ae793e
Refs #174 . Return sb pointer when OpenMP or Windows.
2013-02-26 00:48:21 +08:00
Zhang Xianyi
5155e3f509
Refs #174 . Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
...
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
2013-02-10 01:19:42 +08:00
Zhang Xianyi
6ae2f868fd
Set the affinity. Only use 1 core of each module on bulldozer.
2013-02-09 18:19:02 +01:00
Zhang Xianyi
299b5a44dc
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
2013-02-09 16:22:04 +01:00
Zhang Xianyi
8cdb795438
Refs #187 . Use binary code for xgetbv, which is compatible with old compiler.
2013-01-22 00:25:08 +08:00
Zhang Xianyi
a4ee6f3915
Fixed #172 . Support Intel Xeon E7540.
2012-12-18 08:57:46 +08:00
Zhang Xianyi
fba6b590f2
Merge branch 'master' into develop
2012-12-15 22:49:37 +08:00
Julian Taylor
1138817dd2
add a sanity check on the detected cpu type
...
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi
bdf8d9411e
Refs #163 . Obtain the build configure on runtime.
...
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.
Example:
#include <stdio.h>
extern char * openblas_get_config();
void main()
{
printf("%s\n",openblas_get_config());
return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
2012-12-07 00:53:31 +08:00
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
2012-12-06 07:29:54 -05:00
Zhang Xianyi
6751f7b9a7
Fixed #157 . Only detect the number of physical CPU cores on Mac OSX.
2012-11-13 15:48:57 +08:00
Zhang Xianyi
538c764d2b
Refs #153 . Restore the original CPU affinity when calling openblas_set_num_threads(1).
...
Please read the issue on github.com for the detail.
2012-11-06 18:21:46 +08:00
Zhang Xianyi
6c5899dff5
Don't use xgetbv instruction when NO_AVX=1
2012-10-09 14:52:35 +08:00
Zhang Xianyi
735ca38b8f
Refs #139 . Check OS supporting AVX on runtime.
2012-09-18 15:46:20 +08:00
Zhang Xianyi
f76a384841
Refs #139 . Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
...
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
2012-09-17 23:25:46 +08:00
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
2012-08-21 00:31:12 -04:00
Zhang Xianyi
fe4ab95cd5
Refs #136 . Fixed a bug about controlling the number of threads on Windows.
2012-08-19 23:50:54 +08:00
Xianyi Zhang
801383effe
Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows.
2012-08-14 18:34:32 +08:00
Zhang Xianyi
54cd65e47f
Use sandy bridge kernel when DYNAMIC_ARCH=1.
2012-08-13 15:25:08 +08:00
Zhang Xianyi
a55821a2ec
Refs #132 . Kill the threads when unload the library.
2012-08-11 21:33:15 +08:00
Zhang Xianyi
d007cca61d
Refs #134 . Fixed the building bug on IBM Power.
2012-08-10 11:54:21 +08:00
Xianyi Zhang
25f1a573fd
Fixed the build bug when DYNAMIC_ARCH=0.
2012-07-07 12:12:24 +08:00
Sylvestre Ledru
3692b4d631
Improve the detection of sparc
2012-07-02 02:51:38 +02:00
Xianyi Zhang
a507b56ab1
Refs #119 #118 . Fixed disabling hyper threading bug.
2012-06-29 15:53:24 +08:00
Xianyi Zhang
853d16ed7e
Added openblas_set_num_threads dummy function on Windows. We plan to implement this feature in next version.
2012-06-23 13:07:38 +08:00
Zhang Xianyi
422359d09a
Export openblas_set_num_threads in shared library.
2012-06-23 11:32:43 +08:00
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
2012-05-31 22:40:15 +08:00
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
2012-05-31 18:17:45 +08:00
Zhang Xianyi
90d6ad569d
Merge branch 'sandybridge' into develop
...
Just copy the kernel codes from Nehalem. The optimization is ongoing.
2012-05-31 12:44:55 +08:00
Xianyi Zhang
a6adbb299d
Refs #112 . Improved setting thread affinity in Linux. Remove the limit (64) about the number of CPU cores.
2012-05-29 15:23:52 +08:00
Xianyi Zhang
a53c6e2440
Merge branch 'develop' into sandybridge
2012-05-25 23:16:44 +08:00
Zaheer Chothia
a431042475
Fix inconsistent case for OS_* macros (Refs pull request #111 )
2012-05-23 00:01:14 +02:00
Mike Nolta
4e29b6ffc0
FreeBSD: fix OS_FreeBSD -> OS_FREEBSD typos
2012-05-21 16:57:19 -04:00
Xianyi Zhang
19a48b82cf
Init Sandybridge codes based on Nehalem.
2012-03-30 20:01:03 +08:00
Xianyi Zhang
0b89a7a92d
Ref #82 . Disable outputing debug information in alloc_mmap.
2012-03-23 18:17:12 +08:00
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
2011-11-23 18:41:49 +00:00
Xianyi Zhang
ef6f7f32ae
Fixed mbind bug on Loongson 3B. Check the return value of my_mbind function.
2011-11-23 17:17:41 +00:00
Xianyi Zhang
b95ad4cfaf
Support detecting ICT Loongson-3B CPU.
2011-11-09 19:29:50 +00:00
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
2011-09-23 20:59:48 +00:00
Xianyi Zhang
16fc083322
Refs #47 . Fixed the seting parameter bug on Loongson 3A single thread version.
2011-09-08 16:39:34 +00:00
Xianyi Zhang
3c856c0c1a
Check the return value of pthread_create. Update the docs with known issue on Loongson 3A.
2011-09-06 18:27:33 +00:00
Xianyi Zhang
4727fe8abf
Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
2011-09-05 15:13:52 +00:00
Xianyi Zhang
82f5274828
Refs #39 . It's unnecessary to include sys/mman.h file in blas_server_omp.c.
2011-06-22 01:52:20 +08:00
Xianyi Zhang
1496383224
Print the wall time (cycles) with enabling FUNCTION_PROFILE.
2011-06-09 10:40:15 +08:00
Xianyi Zhang
af40551c9f
Fixed the makefile bug about openblas_set_num_threads.
2011-05-27 21:15:30 +08:00
Xianyi Zhang
417b8ec792
Added openblas_set_num_threads for Fortran.
2011-05-06 17:03:35 +08:00
Xianyi Zhang
989c6f8b06
Fixed #14 the SEGFAULT bug on 64 cores. On SMP server, the number of CPUs or cores should be less than or equal to 64.
2011-04-07 14:48:10 +08:00
Xianyi Zhang
e4bb6f2482
Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch.
2011-03-22 14:09:47 +08:00
Xianyi Zhang
f7a5e049e2
Enable Debug flags in memory alloc and init functions.
2011-02-26 11:51:39 +08:00
Xianyi Zhang
128418f49b
Fixed #10 . Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables.
2011-02-24 16:32:13 +08:00
Xianyi Zhang
e51364edb4
Fixed #5 Detected Intel Westmere (using Nehalem codes) in build and dynamic arch build.
...
Thanks Cao He from Dawning supporting Intel Xeon 5660 testbed.
2011-02-19 00:03:50 +08:00
Xianyi Zhang
e6c13e2b3c
changed library name to openblas and modified environment variable.
2011-01-24 17:58:05 +00:00
Xianyi Zhang
5c9f1ebbf9
Fixed a bug when compiling dynamic ARCH x86 in GotoBLAS2.
2011-01-24 16:04:17 +00:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00