Hank Anderson
5690cf3f0e
Added override for function names in GenerateNamedObjects.
...
The BLAS interface folder should now be generated the correct objects
for the DOUBLE case.
2015-02-04 10:52:19 -06:00
Hank Anderson
a0aeda6187
Added function to set defines for the object names (e.g. -DNAME=dgemm).
2015-02-04 10:37:34 -06:00
Hank Anderson
20e593a44a
Added cblas_ objects to interface CMakeLists.
...
Naming isn't right, though, not seeing cblas_xxxx exports in the
resulting library.
2015-02-02 16:25:30 -06:00
Hank Anderson
9e154aba58
Added LAPACK object files to interface CMakeLists.
2015-02-02 12:31:15 -06:00
Hank Anderson
5057a4b4df
Added openblas add_library call that uses DBLAS_OBJS ojbects.
2015-01-30 15:21:21 -06:00
Hank Anderson
a6cf8aafc0
Updated level3/CMakeLists with correct defines using all combos.
2015-01-30 11:21:50 -06:00
Jerome Robert
b17ccb4c5c
Fix a segfault in gemv when MAX_STACK_ALLOC is set
...
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Hank Anderson
5eefe18ae4
Added CMakeLists.txt for the first of the BLAS folders.
...
It only does the double precision compile currently.
I realized I didn't finish converting Makefile.system yet, so I made
a note of that.
2015-01-27 16:17:17 -06:00
Jerome Robert
e9d9a8eae3
Allow to do gemv and ger buffer allocation on the stack
...
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix #478
2014-12-27 14:33:12 +01:00
wernsaar
9e829ce98f
enabled cblas gemm3m functions
2014-09-20 17:20:02 +02:00
wernsaar
d49fd33885
disabled SYMM3M and HEMM3M functions because segment violations
2014-09-20 15:27:40 +02:00
wernsaar
7aae4a62e7
enabled use of GEMM3M functions
2014-09-20 14:27:10 +02:00
wernsaar
3300f5ebff
optimized multithreading lower limits
2014-09-15 11:38:25 +02:00
wernsaar
fd2478c9e2
optimized interface/zgemv.c for multithreading
2014-09-12 19:18:23 +02:00
Zhang Xianyi
1cba8e7b11
Merge pull request #446 from grisuthedragon/cblas_matcopy
...
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
2014-09-10 16:31:31 +08:00
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
2014-09-09 09:52:13 +02:00
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
2014-09-08 17:57:44 +02:00
wernsaar
f511807fc0
modified multithreading threshold
2014-09-08 12:27:32 +02:00
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
2014-09-02 17:36:07 +02:00
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
2014-09-02 16:30:04 +02:00
wernsaar
51413925bd
adjust number of threads for small size in cgemv and zgemv
2014-07-15 16:27:02 +02:00
wernsaar
b985cea65d
adjust number of threads for sgemv and dgemv
2014-07-15 16:04:46 +02:00
wernsaar
d286daa2ba
adjusted number of threads for small size
2014-07-15 14:41:35 +02:00
wernsaar
cedc1f4b14
Ref #410 : disabled optimized potri functions ( single threading bug)
2014-07-10 13:42:32 +02:00
wernsaar
02a504c0b8
fixed my bug in ger.c
2014-07-02 10:39:33 +02:00
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
2014-07-01 16:18:05 +02:00
wernsaar
aee61456a4
disabled SMP for sbmv and zsbmv again
2014-06-29 21:18:38 +02:00
wernsaar
01a119abfc
enabled SMP for sbmv and zsbmv, but only for 64bit binaries
2014-06-29 20:35:56 +02:00
wernsaar
1fad2b759f
enabled smp for ger.c and zger.c, but only for 64bit binaries
2014-06-29 16:43:04 +02:00
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar
15d5dfa92c
fixed compiler warnings
2014-06-25 11:32:44 +02:00
wernsaar
86d8c8978b
Ref #391 : disabled SMP in ger.c and zger.c
2014-06-22 12:01:24 +02:00
wernsaar
a19d209005
Ref #103 : enhancement for small matrix dimensions
2014-06-18 15:04:11 +02:00
wernsaar
faeab93df0
Ref #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy
2014-06-10 16:14:34 +02:00
wernsaar
cee257f384
Ref #51 : added blas extensions zomatcopy and comatcopy
2014-06-10 10:34:54 +02:00
wernsaar
7bfb3011e8
Ref #51 : added blas extension somatcopy
2014-06-09 20:21:13 +02:00
wernsaar
8c8f596238
Ref #51 : added blas extension domatcopy as not opimized reference
2014-06-09 17:11:07 +02:00
wernsaar
bff575d0b1
Ref #375 : added workaround for small sizes to scal.c and zscal.c
2014-06-08 13:49:19 +02:00
wernsaar
faf3ac0aad
Ref #285 : added axpby kernels
2014-06-08 11:54:24 +02:00
Zhang Xianyi
b31ec99372
Fixed #374 .
...
Merge branch 'TimothyGu-develop' into develop
2014-06-05 17:01:44 +08:00
wernsaar
25e899b60b
fixed function profile in zpotri.c
2014-05-25 09:15:22 +02:00
wernsaar
89da450800
enabled and tested optimized potri lapack functions
2014-05-23 12:14:30 +02:00
wernsaar
c26bbee489
enabled abd tested optimized trtri lapack functions
2014-05-23 10:55:39 +02:00
Timothy Gu
ced13574a0
Random "walk (a)round" --> "work-around" typo fixes
...
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-05-22 18:11:52 -07:00
wernsaar
a748d3a75d
enabled optimized trti2 lapack functions again
2014-05-21 11:02:07 +02:00
wernsaar
a5ab231ad4
enabled optimized complex lauum lapack functions again
2014-05-21 10:35:28 +02:00
wernsaar
dbaeea7b59
enabled lauu2 and lauum lapack functions again
2014-05-21 09:49:18 +02:00
wernsaar
0d75f3b6a2
enabled and tested optimized gesv lapack functions
2014-05-19 14:44:53 +02:00
wernsaar
abad6f66d6
marked trti2.c and ztrti2.c as bad
2014-05-19 13:50:02 +02:00
wernsaar
2ff66e661d
enabled and tested optimized laswp lapack function
2014-05-19 13:35:32 +02:00
wernsaar
5e55034922
marked zlauu2.c and zlauum.c as bad
2014-05-19 12:53:22 +02:00
wernsaar
9a9e810239
marked trtri.c and ztrtri as bad
2014-05-19 12:42:52 +02:00
wernsaar
45be9ac111
moved trtri.c and ztrtri.c to the directory lapack
2014-05-19 12:29:29 +02:00
wernsaar
9f201558c9
marked lauu2.c and lauum.c as bad
2014-05-19 12:00:16 +02:00
wernsaar
d4237cb7f3
marked larf.c as obsolete
2014-05-19 11:23:17 +02:00
wernsaar
aaa9d7fbf8
marked potri functions as bad because a lot of errors
2014-05-18 23:41:13 +02:00
wernsaar
ebc95e6f11
enabled and tested optimized potf2 lapack functions
2014-05-18 22:41:43 +02:00
wernsaar
61a2c50e8e
enabled and tested optimized getf2 lapack functions
2014-05-18 22:21:16 +02:00
wernsaar
4f98f8c9b3
enabled and tested optimized potrf lapack functions
2014-05-18 21:42:37 +02:00
wernsaar
536875d463
enabled and tested optimized getrs lapack functions
2014-05-18 21:13:56 +02:00
wernsaar
65f2fba4c3
enabled and tested optimized cgetrf lapack function
2014-05-18 20:32:27 +02:00
wernsaar
eea6f51df9
enabled and tested optimized sgetrf lapack function
2014-05-18 20:01:23 +02:00
wernsaar
6fc4646709
enabled and tested optimized zgetrf lapack function
2014-05-18 19:36:32 +02:00
wernsaar
ac029f81b3
enabled and tested optimized dgetrf function
2014-05-18 19:07:51 +02:00
wernsaar
c0cf875a82
added optimized lapack files from OpenBLAS
2014-05-18 14:09:22 +02:00
wernsaar
189ca1bcee
removed lapack objects from interface/Makefile
2014-05-11 12:09:34 +02:00
wernsaar
4c1caa7454
checked, that zhpr is OK
2014-05-11 11:21:23 +02:00
wernsaar
7bb19cf90e
checked, that zhpr2 is OK
2014-05-11 11:11:05 +02:00
wernsaar
2a94aaaf2e
checked, that zhpmv is OK
2014-05-11 10:46:48 +02:00
wernsaar
5e4b4f6712
checked, that zher is OK
2014-05-11 10:36:34 +02:00
wernsaar
47e8950e77
checked, that zher2 is OK
2014-05-11 10:26:05 +02:00
wernsaar
f45f2c8465
checked, that zhemv is OK
2014-05-11 10:15:06 +02:00
wernsaar
10780ae650
marked zhbmv as smp bug
2014-05-11 09:58:16 +02:00
wernsaar
9bae50f700
checked, that zscal and zswap are OK
2014-05-11 09:30:18 +02:00
wernsaar
0758c1a374
checked, that trtri is OK
2014-05-11 09:11:20 +02:00
wernsaar
564ff395f6
checked, that trsm is OK
2014-05-11 08:59:33 +02:00
wernsaar
7fb78a5f01
checked, that trmv is OK
2014-05-11 08:47:44 +02:00
wernsaar
8204ab4aa8
checked, that tpmv is OK
2014-05-11 08:35:34 +02:00
wernsaar
48d1325784
checked, that tbmv is OK
2014-05-11 08:22:00 +02:00
wernsaar
57bbc586ef
checked, that syrk is OK
2014-05-11 08:10:25 +02:00
wernsaar
bfef3c5dd1
checked, that syr is OK
2014-05-11 07:46:22 +02:00
wernsaar
d972f4a60a
check, that syr2k is OK
2014-05-11 01:04:46 +02:00
wernsaar
eebce01cf2
checked, that syr2 is OK
2014-05-11 00:48:49 +02:00
wernsaar
e2c39a4a8e
checked, that symv is OK
2014-05-11 00:36:56 +02:00
wernsaar
1e8e6faa7e
checked, that symm is OK
2014-05-11 00:22:40 +02:00
wernsaar
c7eb901496
checked, that spr is OK
2014-05-11 00:07:07 +02:00
wernsaar
2ed03ea0a2
checked, that spr2 is OK
2014-05-10 23:55:43 +02:00
wernsaar
de00e2937a
marked as smp bug
2014-05-10 23:18:35 +02:00
wernsaar
e187b5e9d0
removed gesv.c from interface
2014-05-10 22:55:44 +02:00
wernsaar
0947fc1c89
checked, that ger is OK
2014-05-10 22:49:53 +02:00
wernsaar
4d61607c9e
cheched, that gbmv is OK
2014-05-10 22:38:09 +02:00
wernsaar
781bfb6e66
checked, that gemv is OK
2014-05-10 22:24:05 +02:00
wernsaar
79a82ba7f1
checked that axpy is OK
2014-05-10 22:09:49 +02:00
wernsaar
d63bd7fa5e
checked that gemm.c is OK
2014-05-10 21:51:44 +02:00
wernsaar
e265c4ec86
added C files in interface
2014-05-10 21:27:47 +02:00
wernsaar
0732238213
removed all C files in interface
2014-05-10 21:25:17 +02:00
wernsaar
320c805905
fixed incorrect parameter 2 errors
2014-05-08 11:06:32 +02:00
wernsaar
025fc914cc
fixed 2 bugs as reported by Brendan Tracey
2014-05-02 11:34:26 +02:00
wernsaar
9db0fb8b02
bugfix for sdsdot
2014-02-28 14:59:36 +01:00
wernsaar
692b14cecd
rewrote rotmg.c instead of modifying very old code
2014-02-28 14:43:28 +01:00
Zhang Xianyi
3e0a7b931c
Refs #333 . Detect the wrong parameter for zherk/zher2k.
2014-01-21 01:27:51 +08:00
Zhang Xianyi
73770e60b8
Refs #309 . Fixed trtri_U single thread computational bug.
2013-11-07 01:08:39 +08:00
Lars Buitinck
3f7b0cd994
Merge pull request #290 from larsmans/missing-threshold
...
check if GEMM_MULTITHREAD_THRESHOLD defined in gemm.c
Set a fallback value.
2013-08-29 00:33:55 +08:00
Zhang Xianyi
c92ae012a6
Refs #279 . Provide ONLY_CBLAS flag. If you only need CBLAS without
...
a fortran compiler, please try make ONLY_CBLAS=1.
This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
2013-08-21 00:03:25 +08:00
Zhang Xianyi
a07cc39571
Refs #266 . Fixed the compiling bug with Open64 5.0.
2013-07-31 14:41:39 +08:00
Zhang Xianyi
b5c2ac4fd6
Fixed #264 the memory leak bug in dtrtri_U.
2013-07-29 23:21:10 +08:00
Elliot Saba
6f5b395009
Fix xianyi/OpenBLAS#256
2013-07-22 17:02:06 -07:00
Zhang Xianyi
fd0c388681
Refs #191 . A walk around for dtrtri_U single thread bug.
...
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
2013-07-14 22:16:30 +08:00
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
2012-08-21 00:31:12 -04:00
Xianyi Zhang
83ecfbb9b3
Merge branch 'loongson3a' into release-0.1.0
2012-03-23 01:26:27 +08:00
Xianyi Zhang
31c836ac25
Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices.
2012-03-23 01:17:41 +08:00
Xianyi Zhang
722dd08703
ref #80 . On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions.
...
For example, make USE_NETLIB_GEMV=1
2012-03-16 20:29:39 +08:00
traz
a4292976e9
Adding detection of complex situations in symm.c, otherwise the buffer address of sb will overlap the end of sa.
2011-12-05 14:54:25 +00:00
Xianyi Zhang
aeed8d6225
Fixed #27 . Temporarily walk around axpy's low performance issue with small imput size & multithreads.
2011-06-19 11:55:29 +08:00
Xianyi Zhang
1496383224
Print the wall time (cycles) with enabling FUNCTION_PROFILE.
2011-06-09 10:40:15 +08:00
Xianyi Zhang
fcb5ce011b
Fixed #28 . Convert the result to double precision in MIPS64 dsdot_k kernel.
2011-05-17 21:24:00 +00:00
Xianyi Zhang
fa8e4fd879
Fixed #26 the wrong result of rotmg. Used fabs() instead of abs().
2011-05-11 01:12:32 +08:00
Xianyi Zhang
8f1090d32a
Support NO_LAPACK=1 to build the lib without LAPACK functions.
2011-03-04 11:51:32 +08:00
Xianyi Zhang
0cfd29a819
Fixed #7 . 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0.
2011-02-21 00:24:21 +08:00
Xianyi Zhang
78da0e0a0c
Fixed #6 . Disable multi-thread swap when incx==0 or incy==0.
2011-02-20 17:14:38 +08:00
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
2011-01-24 14:54:24 +00:00