Commit Graph

221 Commits

Author SHA1 Message Date
Hank Anderson 5690cf3f0e Added override for function names in GenerateNamedObjects.
The BLAS interface folder should now be generated the correct objects
for the DOUBLE case.
2015-02-04 10:52:19 -06:00
Hank Anderson a0aeda6187 Added function to set defines for the object names (e.g. -DNAME=dgemm). 2015-02-04 10:37:34 -06:00
Hank Anderson 20e593a44a Added cblas_ objects to interface CMakeLists.
Naming isn't right, though, not seeing cblas_xxxx exports in the
resulting library.
2015-02-02 16:25:30 -06:00
Hank Anderson 9e154aba58 Added LAPACK object files to interface CMakeLists. 2015-02-02 12:31:15 -06:00
Hank Anderson 5057a4b4df Added openblas add_library call that uses DBLAS_OBJS ojbects. 2015-01-30 15:21:21 -06:00
Hank Anderson a6cf8aafc0 Updated level3/CMakeLists with correct defines using all combos. 2015-01-30 11:21:50 -06:00
Jerome Robert b17ccb4c5c Fix a segfault in gemv when MAX_STACK_ALLOC is set
* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.
2015-01-29 09:55:57 +01:00
Hank Anderson 5eefe18ae4 Added CMakeLists.txt for the first of the BLAS folders.
It only does the double precision compile currently.

I realized I didn't finish converting Makefile.system yet, so I made
a note of that.
2015-01-27 16:17:17 -06:00
Jerome Robert e9d9a8eae3 Allow to do gemv and ger buffer allocation on the stack
ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478
2014-12-27 14:33:12 +01:00
wernsaar 9e829ce98f enabled cblas gemm3m functions 2014-09-20 17:20:02 +02:00
wernsaar d49fd33885 disabled SYMM3M and HEMM3M functions because segment violations 2014-09-20 15:27:40 +02:00
wernsaar 7aae4a62e7 enabled use of GEMM3M functions 2014-09-20 14:27:10 +02:00
wernsaar 3300f5ebff optimized multithreading lower limits 2014-09-15 11:38:25 +02:00
wernsaar fd2478c9e2 optimized interface/zgemv.c for multithreading 2014-09-12 19:18:23 +02:00
Zhang Xianyi 1cba8e7b11 Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
2014-09-10 16:31:31 +08:00
Martin Koehler a057e5434d add CBLAS interface for s/d/c/zimatcopy 2014-09-09 09:52:13 +02:00
Martin Köhler 7794766d3c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
wernsaar f511807fc0 modified multithreading threshold 2014-09-08 12:27:32 +02:00
wernsaar d1800397f5 optimized interface/gemv.c for multithreading 2014-09-02 17:36:07 +02:00
wernsaar f4ff889491 updated interface/gemv.c for multithreading 2014-09-02 16:30:04 +02:00
wernsaar 51413925bd adjust number of threads for small size in cgemv and zgemv 2014-07-15 16:27:02 +02:00
wernsaar b985cea65d adjust number of threads for sgemv and dgemv 2014-07-15 16:04:46 +02:00
wernsaar d286daa2ba adjusted number of threads for small size 2014-07-15 14:41:35 +02:00
wernsaar cedc1f4b14 Ref #410: disabled optimized potri functions ( single threading bug) 2014-07-10 13:42:32 +02:00
wernsaar 02a504c0b8 fixed my bug in ger.c 2014-07-02 10:39:33 +02:00
wernsaar be94db096c disabled *3M functions for x86_64 platforms 2014-07-01 16:18:05 +02:00
wernsaar aee61456a4 disabled SMP for sbmv and zsbmv again 2014-06-29 21:18:38 +02:00
wernsaar 01a119abfc enabled SMP for sbmv and zsbmv, but only for 64bit binaries 2014-06-29 20:35:56 +02:00
wernsaar 1fad2b759f enabled smp for ger.c and zger.c, but only for 64bit binaries 2014-06-29 16:43:04 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar 15d5dfa92c fixed compiler warnings 2014-06-25 11:32:44 +02:00
wernsaar 86d8c8978b Ref #391: disabled SMP in ger.c and zger.c 2014-06-22 12:01:24 +02:00
wernsaar a19d209005 Ref #103: enhancement for small matrix dimensions 2014-06-18 15:04:11 +02:00
wernsaar faeab93df0 Ref #51: added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy 2014-06-10 16:14:34 +02:00
wernsaar cee257f384 Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
wernsaar 7bfb3011e8 Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
wernsaar 8c8f596238 Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
wernsaar bff575d0b1 Ref #375: added workaround for small sizes to scal.c and zscal.c 2014-06-08 13:49:19 +02:00
wernsaar faf3ac0aad Ref #285: added axpby kernels 2014-06-08 11:54:24 +02:00
Zhang Xianyi b31ec99372 Fixed #374.
Merge branch 'TimothyGu-develop' into develop
2014-06-05 17:01:44 +08:00
wernsaar 25e899b60b fixed function profile in zpotri.c 2014-05-25 09:15:22 +02:00
wernsaar 89da450800 enabled and tested optimized potri lapack functions 2014-05-23 12:14:30 +02:00
wernsaar c26bbee489 enabled abd tested optimized trtri lapack functions 2014-05-23 10:55:39 +02:00
Timothy Gu ced13574a0 Random "walk (a)round" --> "work-around" typo fixes
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-05-22 18:11:52 -07:00
wernsaar a748d3a75d enabled optimized trti2 lapack functions again 2014-05-21 11:02:07 +02:00
wernsaar a5ab231ad4 enabled optimized complex lauum lapack functions again 2014-05-21 10:35:28 +02:00
wernsaar dbaeea7b59 enabled lauu2 and lauum lapack functions again 2014-05-21 09:49:18 +02:00
wernsaar 0d75f3b6a2 enabled and tested optimized gesv lapack functions 2014-05-19 14:44:53 +02:00
wernsaar abad6f66d6 marked trti2.c and ztrti2.c as bad 2014-05-19 13:50:02 +02:00
wernsaar 2ff66e661d enabled and tested optimized laswp lapack function 2014-05-19 13:35:32 +02:00
wernsaar 5e55034922 marked zlauu2.c and zlauum.c as bad 2014-05-19 12:53:22 +02:00
wernsaar 9a9e810239 marked trtri.c and ztrtri as bad 2014-05-19 12:42:52 +02:00
wernsaar 45be9ac111 moved trtri.c and ztrtri.c to the directory lapack 2014-05-19 12:29:29 +02:00
wernsaar 9f201558c9 marked lauu2.c and lauum.c as bad 2014-05-19 12:00:16 +02:00
wernsaar d4237cb7f3 marked larf.c as obsolete 2014-05-19 11:23:17 +02:00
wernsaar aaa9d7fbf8 marked potri functions as bad because a lot of errors 2014-05-18 23:41:13 +02:00
wernsaar ebc95e6f11 enabled and tested optimized potf2 lapack functions 2014-05-18 22:41:43 +02:00
wernsaar 61a2c50e8e enabled and tested optimized getf2 lapack functions 2014-05-18 22:21:16 +02:00
wernsaar 4f98f8c9b3 enabled and tested optimized potrf lapack functions 2014-05-18 21:42:37 +02:00
wernsaar 536875d463 enabled and tested optimized getrs lapack functions 2014-05-18 21:13:56 +02:00
wernsaar 65f2fba4c3 enabled and tested optimized cgetrf lapack function 2014-05-18 20:32:27 +02:00
wernsaar eea6f51df9 enabled and tested optimized sgetrf lapack function 2014-05-18 20:01:23 +02:00
wernsaar 6fc4646709 enabled and tested optimized zgetrf lapack function 2014-05-18 19:36:32 +02:00
wernsaar ac029f81b3 enabled and tested optimized dgetrf function 2014-05-18 19:07:51 +02:00
wernsaar c0cf875a82 added optimized lapack files from OpenBLAS 2014-05-18 14:09:22 +02:00
wernsaar 189ca1bcee removed lapack objects from interface/Makefile 2014-05-11 12:09:34 +02:00
wernsaar 4c1caa7454 checked, that zhpr is OK 2014-05-11 11:21:23 +02:00
wernsaar 7bb19cf90e checked, that zhpr2 is OK 2014-05-11 11:11:05 +02:00
wernsaar 2a94aaaf2e checked, that zhpmv is OK 2014-05-11 10:46:48 +02:00
wernsaar 5e4b4f6712 checked, that zher is OK 2014-05-11 10:36:34 +02:00
wernsaar 47e8950e77 checked, that zher2 is OK 2014-05-11 10:26:05 +02:00
wernsaar f45f2c8465 checked, that zhemv is OK 2014-05-11 10:15:06 +02:00
wernsaar 10780ae650 marked zhbmv as smp bug 2014-05-11 09:58:16 +02:00
wernsaar 9bae50f700 checked, that zscal and zswap are OK 2014-05-11 09:30:18 +02:00
wernsaar 0758c1a374 checked, that trtri is OK 2014-05-11 09:11:20 +02:00
wernsaar 564ff395f6 checked, that trsm is OK 2014-05-11 08:59:33 +02:00
wernsaar 7fb78a5f01 checked, that trmv is OK 2014-05-11 08:47:44 +02:00
wernsaar 8204ab4aa8 checked, that tpmv is OK 2014-05-11 08:35:34 +02:00
wernsaar 48d1325784 checked, that tbmv is OK 2014-05-11 08:22:00 +02:00
wernsaar 57bbc586ef checked, that syrk is OK 2014-05-11 08:10:25 +02:00
wernsaar bfef3c5dd1 checked, that syr is OK 2014-05-11 07:46:22 +02:00
wernsaar d972f4a60a check, that syr2k is OK 2014-05-11 01:04:46 +02:00
wernsaar eebce01cf2 checked, that syr2 is OK 2014-05-11 00:48:49 +02:00
wernsaar e2c39a4a8e checked, that symv is OK 2014-05-11 00:36:56 +02:00
wernsaar 1e8e6faa7e checked, that symm is OK 2014-05-11 00:22:40 +02:00
wernsaar c7eb901496 checked, that spr is OK 2014-05-11 00:07:07 +02:00
wernsaar 2ed03ea0a2 checked, that spr2 is OK 2014-05-10 23:55:43 +02:00
wernsaar de00e2937a marked as smp bug 2014-05-10 23:18:35 +02:00
wernsaar e187b5e9d0 removed gesv.c from interface 2014-05-10 22:55:44 +02:00
wernsaar 0947fc1c89 checked, that ger is OK 2014-05-10 22:49:53 +02:00
wernsaar 4d61607c9e cheched, that gbmv is OK 2014-05-10 22:38:09 +02:00
wernsaar 781bfb6e66 checked, that gemv is OK 2014-05-10 22:24:05 +02:00
wernsaar 79a82ba7f1 checked that axpy is OK 2014-05-10 22:09:49 +02:00
wernsaar d63bd7fa5e checked that gemm.c is OK 2014-05-10 21:51:44 +02:00
wernsaar e265c4ec86 added C files in interface 2014-05-10 21:27:47 +02:00
wernsaar 0732238213 removed all C files in interface 2014-05-10 21:25:17 +02:00
wernsaar 320c805905 fixed incorrect parameter 2 errors 2014-05-08 11:06:32 +02:00
wernsaar 025fc914cc fixed 2 bugs as reported by Brendan Tracey 2014-05-02 11:34:26 +02:00
wernsaar 9db0fb8b02 bugfix for sdsdot 2014-02-28 14:59:36 +01:00
wernsaar 692b14cecd rewrote rotmg.c instead of modifying very old code 2014-02-28 14:43:28 +01:00
Zhang Xianyi 3e0a7b931c Refs #333. Detect the wrong parameter for zherk/zher2k. 2014-01-21 01:27:51 +08:00
Zhang Xianyi 73770e60b8 Refs #309. Fixed trtri_U single thread computational bug. 2013-11-07 01:08:39 +08:00
Lars Buitinck 3f7b0cd994 Merge pull request #290 from larsmans/missing-threshold
check if GEMM_MULTITHREAD_THRESHOLD defined in gemm.c
Set a fallback value.
2013-08-29 00:33:55 +08:00
Zhang Xianyi c92ae012a6 Refs #279. Provide ONLY_CBLAS flag. If you only need CBLAS without
a fortran compiler, please try make ONLY_CBLAS=1.

This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
2013-08-21 00:03:25 +08:00
Zhang Xianyi a07cc39571 Refs #266. Fixed the compiling bug with Open64 5.0. 2013-07-31 14:41:39 +08:00
Zhang Xianyi b5c2ac4fd6 Fixed #264 the memory leak bug in dtrtri_U. 2013-07-29 23:21:10 +08:00
Elliot Saba 6f5b395009 Fix xianyi/OpenBLAS#256 2013-07-22 17:02:06 -07:00
Zhang Xianyi fd0c388681 Refs #191. A walk around for dtrtri_U single thread bug.
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
2013-07-14 22:16:30 +08:00
Jameson Nash d0e731e8b8 provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line 2012-08-21 00:31:12 -04:00
Xianyi Zhang 83ecfbb9b3 Merge branch 'loongson3a' into release-0.1.0 2012-03-23 01:26:27 +08:00
Xianyi Zhang 31c836ac25 Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices. 2012-03-23 01:17:41 +08:00
Xianyi Zhang 722dd08703 ref #80. On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions.
For example, make USE_NETLIB_GEMV=1
2012-03-16 20:29:39 +08:00
traz a4292976e9 Adding detection of complex situations in symm.c, otherwise the buffer address of sb will overlap the end of sa. 2011-12-05 14:54:25 +00:00
Xianyi Zhang aeed8d6225 Fixed #27. Temporarily walk around axpy's low performance issue with small imput size & multithreads. 2011-06-19 11:55:29 +08:00
Xianyi Zhang 1496383224 Print the wall time (cycles) with enabling FUNCTION_PROFILE. 2011-06-09 10:40:15 +08:00
Xianyi Zhang fcb5ce011b Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel. 2011-05-17 21:24:00 +00:00
Xianyi Zhang fa8e4fd879 Fixed #26 the wrong result of rotmg. Used fabs() instead of abs(). 2011-05-11 01:12:32 +08:00
Xianyi Zhang 8f1090d32a Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
Xianyi Zhang 0cfd29a819 Fixed #7. 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0. 2011-02-21 00:24:21 +08:00
Xianyi Zhang 78da0e0a0c Fixed #6. Disable multi-thread swap when incx==0 or incy==0. 2011-02-20 17:14:38 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00