Commit Graph

189 Commits

Author SHA1 Message Date
Martin Kroeker 9251a2efde
Merge pull request #1359 from brada4/develop
Eliminate mode variable where not needed in syrk interface
2017-11-18 23:47:17 +01:00
Martin Kroeker b46e2b57cc
Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well 2017-11-18 20:28:02 +01:00
Martin Kroeker 3ce401f51b
Make last parameter of cblas_Xdotc_sub/cblas_Xdotu_sub a void pointer as well 2017-11-18 18:58:40 +01:00
Andrew 27575d200a Eliminate mode variable where not needed 2017-11-15 15:32:38 +01:00
Martin Kroeker 2c222f1faa
Modify complex CBLAS functions to take void pointers
Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)
2017-11-05 15:53:14 +01:00
Martin Kroeker 742f54c235 Merge pull request #1303 from martin-frbg/imatcopy-rowscols
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
2017-09-14 21:46:26 +02:00
Martin Kroeker d674fbb4c7 Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Equivalent of #1244 (issue #899) for the non-complex cases. Fixes #1289
2017-09-14 19:59:05 +02:00
Martin Kroeker 46c9357c72 Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Abdurrauf 1cfdb2295d Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Martin Kroeker 00740c0e34 Merge pull request #1290 from martin-frbg/imatcopy
Use in-place transform shortcut only if matrix is square
2017-09-03 13:02:10 +02:00
Martin Kroeker 254db9bd7c Use in-place transform shortcut only if matrix is square 2017-09-03 09:52:55 +02:00
Isuru Fernando d245caa49a Support out-of-source build 2017-08-01 15:16:14 +05:30
Martin Kroeker 376048156b Use in-place transform shortcut only if matrix is square 2017-07-21 11:20:15 +02:00
Martin Kroeker d1c5b8f913 Add files via upload 2017-07-20 20:51:06 +02:00
Martin Kroeker 91bde7d315 Exchange rows and cols in final omatcopy with BlasTrans
This is MicMuc's patch from #899
2017-07-15 22:02:53 +02:00
Martin Kroeker 1e06b49854 Update xerbla.c 2017-04-26 20:29:30 +02:00
Martin Kroeker 7f546f54fa Add cblas_xerbla 2017-04-26 20:01:34 +02:00
Martin Kroeker a809431e34 Add cblas_xerbla() 2017-04-26 19:58:59 +02:00
Andrew 99880f7906 Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker 211d2eceb5 Update zdot.c 2017-03-13 18:08:00 +01:00
Martin Kroeker 5813ed095b Update zdot.c 2017-03-13 17:49:07 +01:00
Martin Kroeker e44b028fe5 Replace gnu _real_, _imag_ extensions in initializers 2017-03-13 00:40:11 +01:00
Ashwin Sekhar T K 071a830e8b THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 2017-02-03 03:55:06 -08:00
Werner Saar dd6212e684 updated some level1 funcions, that are not thread save 2017-01-10 14:05:07 +01:00
jiahaipeng 84b8170bfb Adding multi-threading for copy, dot, rot, and asum funcitons 2017-01-10 11:48:58 +08:00
Werner Saar ae4ac6f984 removed obj-files, that are moved to lapack 3.7.0 2017-01-06 16:14:53 +01:00
Jerome Robert d346c533b1 Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
2016-06-07 16:11:09 +02:00
Werner Saar f04af36ad0 Merge pull request #898 from wernsaar/develop
added experimental support for optimized lapack fortran functions
2016-05-31 14:13:52 +02:00
Werner Saar 41000c8443 added directory for optimized lapack fortan codes and added dlaqr5.f 2016-05-31 12:53:07 +02:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Jerome Robert 40af513669 Disable multi-threading in swap
* Close #873
2016-05-16 13:07:55 +00:00
Jerome Robert 16ec5323c9 Fix zgemv.c compilation when stack allocation is disabled 2016-02-08 12:05:02 +01:00
Jerome Robert 5fc2203d8a zgemv: Add a workaround for #746 2016-02-08 11:25:15 +01:00
Jerome Robert 78dcf5c3d5 Improve performances of ztrmv on small matrices
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Jerome Robert 32f793195f Use stack allocation in zgemv and zger
For better performance with small matrices
Ref #727
2016-02-08 11:24:21 +01:00
Jerome Robert 1fe3aab047 Use GEMM_MULTITHREAD_THRESHOLD as a number of ops
...not a matrix size. For GEMM_MULTITHREAD_THRESHOLD=4
(the default value) this does not change anything but
for other values it make the GEMM and GEMV thresholds
changing in the same way.

Close #742
2016-01-24 11:31:40 +01:00
Jerome Robert 1a1935507b [z]ger: increase multithread threshold
The ones given in 3ae30cd was by far to low because I
mixed m and m*n in my measures. Note that the new ones
are closed to the [z]gemv ones which is comforting
that both are right.
2016-01-24 10:46:35 +01:00
Jerome Robert 66eafb16cf swap: disable multi-threading for small matrices
Close #731
2016-01-19 17:14:46 +01:00
Jerome Robert 3ae30cd6b9 Disable multi-threading for small matrices in [z]ger
Ref #731
2016-01-19 17:14:31 +01:00
Jerome Robert 87a2ccc37c Factorize MAX_STACK_ALLOC code to common_stackalloc.h
Ref #727
2016-01-08 16:03:52 +01:00
Jerome Robert f9890a6452 Fix compilation when MAX_STACK_ALLOC is not set
Close #722
2015-12-31 14:43:09 +01:00
Zhang Xianyi 285d042b10 Fixed rotg bug on ARM. 2015-12-14 10:07:01 -06:00
Zhang Xianyi 640cccc2b1 Refs #697. Fixed gemv bug for Windows.
Thank matzeri's patch.
2015-11-30 15:19:45 -06:00
Ralph Campbell 55a0b27c01 Minor C code fixes in interface/ 2015-11-09 14:15:49 +05:30
Zhang Xianyi 2feef49fa8 Merge branch 'develop' into cmake
Conflicts:
	driver/others/memory.c
2015-10-26 14:54:34 -05:00
Zhang Xianyi 5a291606ad Refs #671. the return of i?max cannot larger than N. 2015-10-24 01:16:34 +08:00
Zhang Xianyi 8fade093aa Fixed cmake bug on Visual Studio. 2015-10-20 14:37:22 -05:00
Zhang Xianyi 94b125255f Merge branch 'develop' into cmake
Conflicts:
	driver/others/memory.c
2015-10-13 04:46:08 +08:00
Zhang Xianyi baec8f5cac Refs #638. Fixed compiling bug with clang on Mac OS X. 2015-09-10 10:32:07 -05:00
Martin Koehler 711ca33bc6 Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00