Werner Saar
dd6212e684
updated some level1 funcions, that are not thread save
2017-01-10 14:05:07 +01:00
jiahaipeng
84b8170bfb
Adding multi-threading for copy, dot, rot, and asum funcitons
2017-01-10 11:48:58 +08:00
Werner Saar
ae4ac6f984
removed obj-files, that are moved to lapack 3.7.0
2017-01-06 16:14:53 +01:00
Jerome Robert
d346c533b1
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
...
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
2016-06-07 16:11:09 +02:00
Werner Saar
f04af36ad0
Merge pull request #898 from wernsaar/develop
...
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Werner Saar
41000c8443
added directory for optimized lapack fortan codes and added dlaqr5.f
2016-05-31 12:53:07 +02:00
John Biddiscombe
053044ae4d
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
...
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Jerome Robert
40af513669
Disable multi-threading in swap
...
* Close #873
2016-05-16 13:07:55 +00:00
Jerome Robert
16ec5323c9
Fix zgemv.c compilation when stack allocation is disabled
2016-02-08 12:05:02 +01:00
Jerome Robert
5fc2203d8a
zgemv: Add a workaround for #746
2016-02-08 11:25:15 +01:00
Jerome Robert
78dcf5c3d5
Improve performances of ztrmv on small matrices
...
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Jerome Robert
32f793195f
Use stack allocation in zgemv and zger
...
For better performance with small matrices
Ref #727
2016-02-08 11:24:21 +01:00
Jerome Robert
1fe3aab047
Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.
Close #742
2016-01-24 11:31:40 +01:00
Jerome Robert
1a1935507b
[z]ger: increase multithread threshold
...
The ones given in 3ae30cd
was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.
2016-01-24 10:46:35 +01:00
Jerome Robert
66eafb16cf
swap: disable multi-threading for small matrices
...
Close #731
2016-01-19 17:14:46 +01:00
Jerome Robert
3ae30cd6b9
Disable multi-threading for small matrices in [z]ger
...
Ref #731
2016-01-19 17:14:31 +01:00
Jerome Robert
87a2ccc37c
Factorize MAX_STACK_ALLOC code to common_stackalloc.h
...
Ref #727
2016-01-08 16:03:52 +01:00
Jerome Robert
f9890a6452
Fix compilation when MAX_STACK_ALLOC is not set
...
Close #722
2015-12-31 14:43:09 +01:00
Zhang Xianyi
285d042b10
Fixed rotg bug on ARM.
2015-12-14 10:07:01 -06:00
Zhang Xianyi
640cccc2b1
Refs #697 . Fixed gemv bug for Windows.
...
Thank matzeri's patch.
2015-11-30 15:19:45 -06:00
Ralph Campbell
55a0b27c01
Minor C code fixes in interface/
2015-11-09 14:15:49 +05:30
Zhang Xianyi
2feef49fa8
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi
5a291606ad
Refs #671 . the return of i?max cannot larger than N.
2015-10-24 01:16:34 +08:00
Zhang Xianyi
8fade093aa
Fixed cmake bug on Visual Studio.
2015-10-20 14:37:22 -05:00
Zhang Xianyi
94b125255f
Merge branch 'develop' into cmake
...
Conflicts:
driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi
baec8f5cac
Refs #638 . Fixed compiling bug with clang on Mac OS X.
2015-09-10 10:32:07 -05:00
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
...
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
f874465bb8
Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
...
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Zhang Xianyi
dcd5ba4443
Merge branch 'cmake' of https://github.com/hpanderson/OpenBLAS into hpanderson_cmake
2015-07-22 04:06:39 +08:00
Werner Saar
f8f2e261fe
use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
2015-05-06 10:41:53 +02:00
Jerome Robert
ab567d8443
gemv: Ensure stack buffer is large enough to handle memory alignment
...
Ref #478
2015-04-24 10:12:49 +02:00
Zhang Xianyi
847e19c04e
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
2015-04-20 23:22:40 -05:00
Zhang Xianyi
fd9fd42936
Refs #478 , #482 . Fixed bug on previous commit.
2015-04-13 23:22:27 -05:00
Zhang Xianyi
9798481979
Refs #478 , #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
...
For gemv_t, directly use malloc to create the buffer.
2015-04-13 19:45:27 -05:00
Zhang Xianyi
cdefdb21cd
Refs #492 . Fixed c/zsyr bug with negative incx.
2015-02-26 06:37:03 +08:00
Hank Anderson
0d8e227ea7
Changed strategy for setting preprocessor definitions.
...
Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths.
2015-02-24 12:26:33 -06:00
Hank Anderson
b2284647a3
More complex objects.
2015-02-23 07:51:05 -06:00
Hank Anderson
a6116e5859
Added some more complex-only objects.
2015-02-22 17:49:28 -06:00
Hank Anderson
67e39bd8fb
Added mangled complex filenames to interface and lapack CMakeLists.txt.
2015-02-17 13:12:30 -06:00
Hank Anderson
9eb1499095
Added another param to GenerateNamedObjects to mangle complex source names.
...
There are a lot of sources for complex float types that are the same
names as the real sources, except with z prepended.
2015-02-17 10:30:28 -06:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Hank Anderson
4662a0b13a
Changed generate functions to iterate through a list of float types.
...
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson
e74462a3f5
Moved declarations to start of functions to satisfy MSVC C89 implementation.
2015-02-11 11:16:57 -06:00
Hank Anderson
e8c39138c6
Removed return value from GenerateNamedObjects.
...
It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files.
2015-02-09 12:28:09 -06:00
Hank Anderson
58cff2fed8
Added CBLAS define/naming convention to GenerateNamedObjects.
2015-02-04 11:30:15 -06:00
Hank Anderson
5690cf3f0e
Added override for function names in GenerateNamedObjects.
...
The BLAS interface folder should now be generated the correct objects
for the DOUBLE case.
2015-02-04 10:52:19 -06:00
Hank Anderson
a0aeda6187
Added function to set defines for the object names (e.g. -DNAME=dgemm).
2015-02-04 10:37:34 -06:00
Hank Anderson
20e593a44a
Added cblas_ objects to interface CMakeLists.
...
Naming isn't right, though, not seeing cblas_xxxx exports in the
resulting library.
2015-02-02 16:25:30 -06:00
Hank Anderson
9e154aba58
Added LAPACK object files to interface CMakeLists.
2015-02-02 12:31:15 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00