Martin Kroeker
585c0010a5
Fix range limit exceeding actual data size in last step
2017-07-28 00:27:02 +02:00
Martin Kroeker
857f61bc5d
Fix range limit exceeding data size in last step
2017-07-28 00:21:53 +02:00
Martin Kroeker
9332042d5f
Fix range exceeding actual data size in quick_divide
2017-07-28 00:13:24 +02:00
Martin Kroeker
c4af196a2d
Honor cgroup/cpuset limits when enumerating cpus
2017-07-25 22:47:34 +02:00
Martin Kroeker
480e697681
Revert "Honor cgroup/cpuset limits when enumerating cpus" ( #1246 )
2017-07-24 16:17:50 +02:00
Martin Kroeker
80373ea039
More fixes for silly misedits
2017-07-15 12:48:42 +02:00
Martin Kroeker
d12b75a6c4
Fixup braces lost in previous edit
2017-07-15 11:53:28 +02:00
Martin Kroeker
7294fb1d9d
Merge branch 'develop' into cgroups
2017-07-15 10:40:42 +02:00
Zhang Xianyi
2a7c6930ac
Merge pull request #1234 from brada4/develop
...
Fix write past fixed size buffer
2017-07-13 20:27:37 +08:00
Andrew
529bfc36ec
Fix write past fixed size buffer
2017-07-12 00:59:30 +02:00
Martin Kroeker
731c518cff
Add files via upload
2017-07-11 18:42:39 +02:00
Martin Kroeker
29fc429d9a
Honor cgroup/cpuset constraints when enumerating cpus
2017-07-11 18:27:33 +02:00
Martin Kroeker
3db2adf872
Merge pull request #1230 from martin-frbg/rhel5
...
Add sched_getcpu implementation for pre-2.6 glibc
2017-07-09 13:16:16 +02:00
Martin Kroeker
c1cf62d2c0
Add sched_getcpu implementation for pre-2.6 glibc
...
Fixes #1210 , compilation on RHEL5 with affinity enabled
2017-07-09 09:45:38 +02:00
Zhang Xianyi
bfe1656b8b
Merge pull request #1225 from martin-frbg/stolen_from_wernsaar_fork
...
fixed syrk_thread.c taken from wernsaar
2017-07-07 15:43:33 +08:00
Martin Kroeker
49e62c0e77
fixed syrk_thread.c taken from wernsaar
...
Stride calculation fix copied from https://github.com/wernsaar/OpenBLAS/commit/88900e1
2017-07-06 17:30:12 +02:00
Neil Shipp
34513be726
Add Microsoft Windows 10 UWP build support
2017-06-23 13:07:34 -07:00
Neil Shipp
65e56cb29d
Add 64bit support for Microsoft Visual Studio
2017-06-21 13:38:22 -07:00
James Cowgill
59c97cfee4
memory: Fix buffer overflow when position == NUM_BUFFERS
2017-05-05 17:47:03 +01:00
James Cowgill
5fecfe0f42
memory: switch loop condition around in blas_memory_free
...
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.
This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
2017-04-19 19:15:47 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Andrew
5088523786
detect apollo lake for real
2017-02-20 23:54:59 +01:00
Elliot Saba
1d8ab99e09
Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection
2017-02-10 15:23:55 -08:00
Martin Koehler
76c6e33e54
Enable EXCAVATOR kernels for A12-9800
2017-02-07 21:38:28 +01:00
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
...
Add new targets for ARM64
2017-01-16 13:20:10 +08:00
Werner Saar
c1c5a63d3c
prepared parameter.c for UNROLL values, that are not a power of two
2017-01-11 09:50:28 +01:00
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
jiahaipeng
1aa1e6cb54
modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value
2017-01-10 11:47:06 +08:00
Werner Saar
b9bb009236
Merge pull request #1053 from wernsaar/develop
...
prepared driver/level3 functions for UNROLL values, that are not a po…
2017-01-09 11:17:38 +01:00
Werner Saar
a2672d5589
prepared driver/level3 functions for UNROLL values, that are not a power of two
2017-01-09 10:38:15 +01:00
Martin Kroeker
51aa157e64
Relocate declaration of alloc_lock outside ifdef block
2017-01-09 01:10:43 +01:00
Martin Kroeker
87c7d10b34
Fix thread data races detected by helgrind 3.12
...
Ref. #995 , may possibly help solve issues seen in 660,883
2017-01-08 23:33:51 +01:00
Martin Kroeker
0ef7841473
Update xerbla.c
2017-01-04 23:16:48 +01:00
Martin Kroeker
104ad066af
Use appropriate int32/int64 format for error number in message string
2016-12-30 00:45:59 +01:00
Alex Arslan
a16ace68f5
Include system headers on FreeBSD
2016-11-16 21:58:20 -08:00
Martin Kroeker
596ead0f8d
Add files via upload
2016-11-06 23:26:39 +01:00
Zhang Xianyi
66c9a9b33d
Merge pull request #981 from howard0su/develop
...
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
2016-10-17 11:32:57 +08:00
Martin Kroeker
8a8f3932eb
Update dynamic.c
...
Add Bay Trail "Pentium N3520" atom
2016-10-16 22:40:00 +02:00
Howard Su
ff1da01476
USE NPROCESSOR_CONF instaed of NPORCESSOR_ONLN
...
to determine the number of CPU. In ARM platform,
online CPU will increasing when there is more workload.
while configure cpu is the max number of CPU.
2016-10-13 12:37:50 +00:00
Zhang Xianyi
ef52a9266b
Fixed #979 . Patch for NetBSD.
2016-10-13 10:17:07 +08:00
Martin Kroeker
7de829f713
Update dynamic.c
...
Add Braswell (extended model 4, model 12) N3150 as Nehalem
2016-07-14 12:22:55 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Ashwin Sekhar T K
0fb380c966
Update NUMA CPU binding
...
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
2016-04-29 11:58:15 +05:30
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
2016-04-25 10:13:30 +02:00
Werner Saar
2b967590a0
bugfix in dynamic.c
2016-04-25 09:08:38 +02:00
Theoractice
aa744dfa59
Update memory.c
2016-03-22 20:02:37 +08:00
theoractice
61cf8f74d9
Fix access violation on Windows while static linking
2016-03-22 19:14:54 +08:00
Zhang Xianyi
68eb4fa329
Add missing openblas_env makefile.
2016-03-09 14:52:47 -05:00
Zhang Xianyi
05196a8497
Refs #716 . Only call getenv at init function.
2016-03-09 12:50:07 -05:00
Jerome Robert
53ba1a77c8
ztrmv_L.c: no longer need a 4kB buffer
...
Fix #786
2016-03-05 19:07:03 +01:00
Zhang Xianyi
1edf30b790
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
2016-03-01 20:13:08 +08:00
Zhang Xianyi
6b85dbb6dc
Refs #696 . Turn off stack limit setting on Linux.
...
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:21:42 -05:00
Zhang Xianyi
d06b92906a
Add gemm3m building for CMake.
2016-02-12 05:02:51 +08:00
Jerome Robert
78dcf5c3d5
Improve performances of ztrmv on small matrices
...
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Martin Kroeker
935356c34f
Update dynamic.c and cpuid_x86.c for Intel Avoton.
...
Second part of "support Intel Avoton via Nehalem kernel"
2016-02-02 13:42:55 -05:00
Zhang Xianyi
f5df444ceb
Merge pull request #762 from jeromerobert/bug760
...
Let openblas_get_num_threads return the number of active threads
2016-01-26 08:45:16 -06:00
Zhang Xianyi
aaa8551c57
Merge pull request #749 from lotheac/illumos_fixes
...
illumos fixes
2016-01-26 08:42:20 -06:00
Jerome Robert
0d87c1ffb6
Let openblas_get_num_threads return the number of active threads
...
... not the number of allocated threads.
Close #760
2016-01-26 13:04:16 +01:00
Lauri Tirkkonen
e737e32fd1
RLIMIT_NPROC doesn't exist on illumos
2016-01-22 18:55:51 +02:00
Lauri Tirkkonen
97cd4b8aee
illumos fixes to memory.c
2016-01-22 18:55:43 +02:00
Werner Saar
b07d733a71
added updates for syrk and syr2k
2016-01-21 13:16:44 +01:00
Zhang Xianyi
055b481386
Fixed CMake bug for single core.
2016-01-15 06:42:54 +08:00
Werner Saar
0d22551a6b
increase the stack size limit in the constructor
2015-11-20 09:23:01 +01:00
Ralph Campbell
fbc21266e6
Minor C code fixes in driver/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
839395fc25
Detect AMD Trinity and Richland.
2015-10-29 02:53:29 +08:00
j-bo
6040858b22
Fix #673
...
Add lacking headers declarations when compiling for Android ARM7
2015-10-27 13:55:24 +01:00
Zhang Xianyi
70642fe4ed
Refs #668 . Raise the signal when pthread_create fails.
...
Thank James K. Lowden for the patch.
2015-10-26 19:02:51 -05:00
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi
1ce054fcb3
Refs #669 . Fixed the build bug with gcc on Mac OS X.
2015-10-22 11:07:35 -05:00
Zhang Xianyi
d8392c1245
Fixe cmake config bugs.
2015-10-20 04:30:55 +08:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi
11ac4665c8
Fixed #654 . Make sure the gotoblas_init function is run before all other static initializations.
2015-10-05 14:14:32 -05:00
Zhang Xianyi
cc7cab8a45
Detect other Intel Skylake cores.
...
http://users.atw.hu/instlatx64/
2015-09-09 10:47:17 -05:00
Yichao Yu
61ae47eb99
Ref #632 . Support Intel Skylake by Haswell kernels.
2015-09-09 11:07:33 -04:00
Grazvydas Ignotas
d3e2f0a1af
add missing barriers
...
should fix issue #597
2015-08-16 15:37:02 +02:00
Zhang Xianyi
f8eba3d548
Fixed cmake build bugs on Linux.
2015-08-11 16:25:16 -05:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Zhang Xianyi
a11555c715
Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)
...
e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7
In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/.
2015-05-20 21:57:27 -05:00
Zhang Xianyi
51ff17d46e
Add AMD Excavator target.
2015-05-13 16:16:30 -05:00
powderluv
ebb9eba987
Fix build with ALLOC_SHM=0 (Android NDK)
...
Refactor such that you can build with ALLOC_SHM=0. HughTLB
implicity depends on ALLOC_SHM=1. This patch allows
building for Android NDK r10d.
2015-05-10 00:10:26 -07:00
Zhang Xianyi
8e5a1083bb
Refs #532 . Improve gemv paralel with small m and large n case.
...
Splite the matrix and reduction.
2015-05-08 05:33:17 +08:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi
8977b3f235
Refs #529 . Support Intel Broadwell by Haswell kernels.
2015-04-02 11:08:03 -05:00
Zhang Xianyi
e95d64333a
Refs #519 . Avoid calling strncpy.
2015-03-19 15:57:22 -05:00
Ton van den Heuvel
b6438dedea
Fix issue #508
...
Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
2015-03-18 13:22:43 +01:00
Hank Anderson
5ae8993752
Added intrinsics for MSVC.
2015-02-25 11:52:51 -06:00
Hank Anderson
84d90d6ed8
Fixed some compiler errors/warnings for clang.
2015-02-25 11:52:25 -06:00
Hank Anderson
9eaea02f33
Added additional gemm defines for complex types.
2015-02-25 09:39:11 -06:00
Hank Anderson
ab7043373f
Fixed bug generating trmv complex source names.
2015-02-24 15:18:41 -06:00
Hank Anderson
0553476fba
Added TRANS defines for complex sources in lapack.
2015-02-24 14:30:35 -06:00
Hank Anderson
2416d9dbac
Fixed TRANSA defines for complex sources in driver/level2.
2015-02-24 13:18:07 -06:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
1b7f427401
Added conj gemv objects for complex build.
2015-02-23 10:24:31 -06:00
Hank Anderson
fb5d5bb971
Added defines for complex trmv.
2015-02-21 12:39:03 -06:00
Hank Anderson
371071d461
Added CONJ defines for trmm/trsm.
2015-02-21 10:59:02 -06:00
Hank Anderson
8a143516e3
Added alternate_name to a couple of the name mangling schemes.
...
Added zherk_k sources to driver/level3.
2015-02-20 17:03:33 -06:00
Hank Anderson
e5897ecb9b
Added zherk_kernel.c objects to driver/level3.
2015-02-19 16:19:56 -06:00
Hank Anderson
33c5e8db7f
Added a helper function for setting the L1 kernel defaults.
...
Added loop to build objects with different KERNEL defines.
2015-02-17 21:36:23 -06:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
2015-02-11 11:16:57 -06:00
Hank Anderson
056ba26755
Changed a number of inline calls to use __inline.
...
MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC.
2015-02-11 11:13:17 -06:00
Hank Anderson
c94fe71278
Removed incoming-stack-boundary for MSVC.
...
Made float type optional for GenerateNamedObjects.
Called GenerateNamedObjects for a couple of driver/others files that
needed NAME/CNAME set.
2015-02-11 10:54:14 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
7fa5c4e2fd
Fixed some case issues with ARCH.
...
Added some kernel and driver/others objects.
2015-02-08 15:29:18 -06:00
Zhang Xianyi
cfa9392ffa
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
2015-02-08 01:30:23 -06:00
Hank Anderson
2f59135eb6
Added gemv to level2 CMakeLists.txt.
2015-02-07 21:15:21 -06:00
Hank Anderson
6b5d26e07b
Added SMP sources to level2 CMakeLists.txt.
2015-02-06 16:52:19 -06:00
Hank Anderson
627d5e7401
Added SMP objects to driver/level3.
2015-02-05 12:22:48 -06:00
Hank Anderson
943fa2fb58
Fixed object names in level2.
2015-02-05 10:49:11 -06:00
Hank Anderson
461e691127
Codes when define is absent are now a parameter to AllCombinations.
...
The level3 object names should now be correct.
2015-02-05 09:23:47 -06:00
Hank Anderson
cfaf1c678f
Added option to append define codes with an underscore.
...
Fixed the code array not getting reset on subsequent AllCombinations
calls.
2015-02-05 09:17:18 -06:00
Hank Anderson
0d7bad1f35
Changed GenerateObjects to append combination codes (e.g. dtrmm_TU).
2015-02-05 09:02:54 -06:00
Hank Anderson
2828f6630c
Added SMP sources to COMMONOBJS.
2015-02-04 14:01:36 -06:00
Erik Schnetter
65a847cd36
Introduce openblas_get_num_threads and openblas_get_num_procs
2015-02-03 12:23:41 -05:00
Hank Anderson
7194424fef
Added missing common objects to the library.
2015-02-02 15:21:29 -06:00
Hank Anderson
d11bde60d0
DOUBLE define for DBLAS objects is now set in main CMakeLists.txt.
...
Since the objects are the same, could generate SINGLE/COMPLEX/etc here
without having to rewrite all the object enumeration code again.
2015-02-02 15:00:44 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00
Hank Anderson
3e8ea7a351
Added COMMONOBJS to driver/others CMakeLists.txt.
2015-01-30 14:06:14 -06:00
Hank Anderson
d3dcdddf75
Moved functions into util cmake file.
2015-01-30 13:47:40 -06:00
Hank Anderson
e5e7595bf9
Added paramater to GenerateObjects for defines that affect all sources.
2015-01-30 13:31:13 -06:00
Hank Anderson
7693887d61
Added empty set to the combinations generated by AllCombinations.
2015-01-30 13:01:11 -06:00
Hank Anderson
8d9b196e0d
Moved loop over define combos into a function.
...
This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file.
2015-01-30 12:14:44 -06:00
Hank Anderson
a6cf8aafc0
Updated level3/CMakeLists with correct defines using all combos.
2015-01-30 11:21:50 -06:00
Hank Anderson
dbdca7bf0c
Added first pass at driver/level3 Makefile conversion.
...
Added a rather convoluted CMake function to find all combinations
of a given list. This will be useful for the object files that are
compiled multiple times with different combinations of preprocessor
definitions.
2015-01-29 22:53:11 -06:00
Hank Anderson
8c23965da3
prebuild.cmake now reads the output from getarch into CMake vars.
2015-01-28 22:57:44 -06:00
Hank Anderson
8ede4a8da4
getarch now compiles and sets config.h defines properly.
...
Still isn't parsed into CMake variables, and getarch_2 needs to
get the same treatment.
2015-01-28 17:18:26 -06:00
Hank Anderson
1c5b6bb4f7
Added CORE define to config.h in prebuild.cmake (temporarily).
2015-01-28 16:33:48 -06:00
Hank Anderson
9a508abdc7
Added first pass at driver/level2 makefile conversion.
2015-01-28 14:52:15 -06:00
Werner Saar
0dc559ed30
bugfix in dynamic.c
2014-12-28 17:15:42 +01:00
Werner Saar
4319769b79
added target processor STEAMROLLER
2014-12-28 20:16:46 +08:00
Zhang Xianyi
2fb02626da
Update organization info.
2014-11-25 15:28:58 +08:00
Zhang Xianyi
695e0fa649
#463 fixed a compiling bug on AIX.
2014-11-10 14:39:56 +08:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
2014-09-06 08:41:53 +02:00
wernsaar
2021d0f9d6
experimentally removed expensive function calls
2014-09-05 15:05:53 +02:00
Isaac Dunham
f7eb81a846
Fix link error on Linux/musl.
...
get_nprocs() is a GNU convenience function equivalent to POSIX2008
sysconf(_SC_NPROCESSORS_ONLN); the latter should be available in unistd.h
on any current *nix. (OS X supports this call since 10.5, and FreeBSD
currently supports it. But this commit does not change FreeBSD or OS X
versions.)
2014-08-03 15:06:30 -07:00
wernsaar
793175be3a
added experimental support for big numa machines
2014-08-02 13:40:16 +02:00
wernsaar
1d33547222
optimized zgemm kernel for haswell
2014-07-27 11:51:42 +02:00
wernsaar
3ea4dadd30
optimizations for trsm
2014-07-25 11:59:17 +02:00