Zhang Xianyi
cdefdb21cd
Refs #492 . Fixed c/zsyr bug with negative incx.
2015-02-26 06:37:03 +08:00
Martin Koehler
39cc6b21d3
Add ATLAS-style ?geadd function
2015-02-16 13:46:20 +01:00
Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
...
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
wernsaar
9e829ce98f
enabled cblas gemm3m functions
2014-09-20 17:20:02 +02:00
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
2014-09-20 15:27:40 +02:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
3300f5ebff
optimized multithreading lower limits
2014-09-15 11:38:25 +02:00
wernsaar
fd2478c9e2
optimized interface/zgemv.c for multithreading
2014-09-12 19:18:23 +02:00
Zhang Xianyi
1cba8e7b11
Merge pull request #446 from grisuthedragon/cblas_matcopy
...
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
2014-09-10 16:31:31 +08:00
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
2014-09-09 09:52:13 +02:00
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
2014-09-08 17:57:44 +02:00
wernsaar
f511807fc0
modified multithreading threshold
2014-09-08 12:27:32 +02:00
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
2014-09-02 17:36:07 +02:00
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
2014-09-02 16:30:04 +02:00
wernsaar
51413925bd
adjust number of threads for small size in cgemv and zgemv
2014-07-15 16:27:02 +02:00
wernsaar
b985cea65d
adjust number of threads for sgemv and dgemv
2014-07-15 16:04:46 +02:00
wernsaar
d286daa2ba
adjusted number of threads for small size
2014-07-15 14:41:35 +02:00
wernsaar
cedc1f4b14
Ref #410 : disabled optimized potri functions ( single threading bug)
2014-07-10 13:42:32 +02:00
wernsaar
02a504c0b8
fixed my bug in ger.c
2014-07-02 10:39:33 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
wernsaar
aee61456a4
disabled SMP for sbmv and zsbmv again
2014-06-29 21:18:38 +02:00
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
2014-06-29 20:35:56 +02:00
wernsaar
1fad2b759f
enabled smp for ger.c and zger.c, but only for 64bit binaries
2014-06-29 16:43:04 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
15d5dfa92c
fixed compiler warnings
2014-06-25 11:32:44 +02:00
wernsaar
86d8c8978b
Ref #391 : disabled SMP in ger.c and zger.c
2014-06-22 12:01:24 +02:00
wernsaar
a19d209005
Ref #103 : enhancement for small matrix dimensions
2014-06-18 15:04:11 +02:00
wernsaar
faeab93df0
Ref #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy
2014-06-10 16:14:34 +02:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
bff575d0b1
Ref #375 : added workaround for small sizes to scal.c and zscal.c
2014-06-08 13:49:19 +02:00
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
2014-06-08 11:54:24 +02:00
Zhang Xianyi
b31ec99372
Fixed #374 .
...
Merge branch 'TimothyGu-develop' into develop
2014-06-05 17:01:44 +08:00
wernsaar
25e899b60b
fixed function profile in zpotri.c
2014-05-25 09:15:22 +02:00
wernsaar
89da450800
enabled and tested optimized potri lapack functions
2014-05-23 12:14:30 +02:00
wernsaar
c26bbee489
enabled abd tested optimized trtri lapack functions
2014-05-23 10:55:39 +02:00
Timothy Gu
ced13574a0
Random "walk (a)round" --> "work-around" typo fixes
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-05-22 18:11:52 -07:00
wernsaar
a748d3a75d
enabled optimized trti2 lapack functions again
2014-05-21 11:02:07 +02:00
wernsaar
a5ab231ad4
enabled optimized complex lauum lapack functions again
2014-05-21 10:35:28 +02:00
wernsaar
dbaeea7b59
enabled lauu2 and lauum lapack functions again
2014-05-21 09:49:18 +02:00
wernsaar
0d75f3b6a2
enabled and tested optimized gesv lapack functions
2014-05-19 14:44:53 +02:00
wernsaar
abad6f66d6
marked trti2.c and ztrti2.c as bad
2014-05-19 13:50:02 +02:00
wernsaar
2ff66e661d
enabled and tested optimized laswp lapack function
2014-05-19 13:35:32 +02:00
wernsaar
5e55034922
marked zlauu2.c and zlauum.c as bad
2014-05-19 12:53:22 +02:00
wernsaar
9a9e810239
marked trtri.c and ztrtri as bad
2014-05-19 12:42:52 +02:00
wernsaar
45be9ac111
moved trtri.c and ztrtri.c to the directory lapack
2014-05-19 12:29:29 +02:00
wernsaar
9f201558c9
marked lauu2.c and lauum.c as bad
2014-05-19 12:00:16 +02:00
wernsaar
d4237cb7f3
marked larf.c as obsolete
2014-05-19 11:23:17 +02:00