Commit Graph

116 Commits

Author SHA1 Message Date
Zhang Xianyi cdefdb21cd Refs #492. Fixed c/zsyr bug with negative incx. 2015-02-26 06:37:03 +08:00
Martin Koehler 39cc6b21d3 Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
Jerome Robert b17ccb4c5c Fix a segfault in gemv when MAX_STACK_ALLOC is set
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Jerome Robert e9d9a8eae3 Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478
2014-12-27 14:33:12 +01:00
wernsaar 9e829ce98f enabled cblas gemm3m functions 2014-09-20 17:20:02 +02:00
wernsaar d49fd33885 disabled SYMM3M and HEMM3M functions because segment violations 2014-09-20 15:27:40 +02:00
wernsaar 7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar 3300f5ebff optimized multithreading lower limits 2014-09-15 11:38:25 +02:00
wernsaar fd2478c9e2 optimized interface/zgemv.c for multithreading 2014-09-12 19:18:23 +02:00
Zhang Xianyi 1cba8e7b11 Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
2014-09-10 16:31:31 +08:00
Martin Koehler a057e5434d add CBLAS interface for s/d/c/zimatcopy 2014-09-09 09:52:13 +02:00
Martin Köhler 7794766d3c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
wernsaar f511807fc0 modified multithreading threshold 2014-09-08 12:27:32 +02:00
wernsaar d1800397f5 optimized interface/gemv.c for multithreading 2014-09-02 17:36:07 +02:00
wernsaar f4ff889491 updated interface/gemv.c for multithreading 2014-09-02 16:30:04 +02:00
wernsaar 51413925bd adjust number of threads for small size in cgemv and zgemv 2014-07-15 16:27:02 +02:00
wernsaar b985cea65d adjust number of threads for sgemv and dgemv 2014-07-15 16:04:46 +02:00
wernsaar d286daa2ba adjusted number of threads for small size 2014-07-15 14:41:35 +02:00
wernsaar cedc1f4b14 Ref #410: disabled optimized potri functions ( single threading bug) 2014-07-10 13:42:32 +02:00
wernsaar 02a504c0b8 fixed my bug in ger.c 2014-07-02 10:39:33 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar aee61456a4 disabled SMP for sbmv and zsbmv again 2014-06-29 21:18:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
wernsaar 1fad2b759f enabled smp for ger.c and zger.c, but only for 64bit binaries 2014-06-29 16:43:04 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 15d5dfa92c fixed compiler warnings 2014-06-25 11:32:44 +02:00
wernsaar 86d8c8978b Ref #391: disabled SMP in ger.c and zger.c 2014-06-22 12:01:24 +02:00
wernsaar a19d209005 Ref #103: enhancement for small matrix dimensions 2014-06-18 15:04:11 +02:00
wernsaar faeab93df0 Ref #51: added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy 2014-06-10 16:14:34 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar bff575d0b1 Ref #375: added workaround for small sizes to scal.c and zscal.c 2014-06-08 13:49:19 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi b31ec99372 Fixed #374.
Merge branch 'TimothyGu-develop' into develop
2014-06-05 17:01:44 +08:00
wernsaar 25e899b60b fixed function profile in zpotri.c 2014-05-25 09:15:22 +02:00
wernsaar 89da450800 enabled and tested optimized potri lapack functions 2014-05-23 12:14:30 +02:00
wernsaar c26bbee489 enabled abd tested optimized trtri lapack functions 2014-05-23 10:55:39 +02:00
Timothy Gu ced13574a0 Random "walk (a)round" --> "work-around" typo fixes
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-05-22 18:11:52 -07:00
wernsaar a748d3a75d enabled optimized trti2 lapack functions again 2014-05-21 11:02:07 +02:00
wernsaar a5ab231ad4 enabled optimized complex lauum lapack functions again 2014-05-21 10:35:28 +02:00
wernsaar dbaeea7b59 enabled lauu2 and lauum lapack functions again 2014-05-21 09:49:18 +02:00
wernsaar 0d75f3b6a2 enabled and tested optimized gesv lapack functions 2014-05-19 14:44:53 +02:00
wernsaar abad6f66d6 marked trti2.c and ztrti2.c as bad 2014-05-19 13:50:02 +02:00
wernsaar 2ff66e661d enabled and tested optimized laswp lapack function 2014-05-19 13:35:32 +02:00
wernsaar 5e55034922 marked zlauu2.c and zlauum.c as bad 2014-05-19 12:53:22 +02:00
wernsaar 9a9e810239 marked trtri.c and ztrtri as bad 2014-05-19 12:42:52 +02:00
wernsaar 45be9ac111 moved trtri.c and ztrtri.c to the directory lapack 2014-05-19 12:29:29 +02:00
wernsaar 9f201558c9 marked lauu2.c and lauum.c as bad 2014-05-19 12:00:16 +02:00
wernsaar d4237cb7f3 marked larf.c as obsolete 2014-05-19 11:23:17 +02:00