OpenBLAS/interface
Arjan van de Ven cdc668d82b Add a "sgemm direct" mode for small matrixes
OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.

This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.

But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.

This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.

What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512
2018-12-13 13:47:31 +00:00
..
lapack Add an explicit cast to silence a warning 2018-09-13 14:24:29 +02:00
netlib ref #80. On P4 CPU with 32-bit Windows XP, Octave crashed with OpenBLAS. Walkaroud: Use netlib reference gemv instead of own funtions. 2012-03-16 20:29:39 +08:00
CMakeLists.txt Support out-of-source build 2017-08-01 15:16:14 +05:30
Makefile Build cblas_iXamin interfaces 2018-06-23 13:27:30 +02:00
asum.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
axpby.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
axpy.c Handle special case INCX=0,INCY=0 in the axpy interface 2018-11-13 13:57:18 +01:00
copy.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
create Print the wall time (cycles) with enabling FUNCTION_PROFILE. 2011-06-09 10:40:15 +08:00
dot.c updated some level1 funcions, that are not thread save 2017-01-10 14:05:07 +01:00
dsdot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
geadd.c Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
gemm.c Add a "sgemm direct" mode for small matrixes 2018-12-13 13:47:31 +00:00
gemv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
ger.c [z]ger: increase multithread threshold 2016-01-24 10:46:35 +01:00
imatcopy.c Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases 2017-09-14 19:59:05 +02:00
imax.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
max.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
nrm2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
omatcopy.c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
rot.c updated some level1 funcions, that are not thread save 2017-01-10 14:05:07 +01:00
rotg.c fabs -> fabsl 2018-08-03 13:00:10 -04:00
rotm.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
rotmg.c Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480) 2018-03-04 17:39:56 +01:00
sbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
scal.c Fixed a few more unnecessary calls to num_cpu_avail. 2018-06-11 10:17:16 +01:00
sdsdot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
spmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
spr.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
spr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
swap.c ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 2018-10-17 10:44:37 -07:00
symm.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
symv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
syr.c Minor C code fixes in interface/ 2015-11-09 14:15:49 +05:30
syr2.c Minor C code fixes in interface/ 2015-11-09 14:15:49 +05:30
syr2k.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
syrk.c Merge pull request #1359 from brada4/develop 2017-11-18 23:47:17 +01:00
tbmv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
tbsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
tpmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
tpsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
trmv.c fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
trsm.c Improve performance of GEMM for small matrices when SMP is defined. 2018-06-07 15:29:13 +01:00
trsv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
xerbla.c Update xerbla.c 2017-04-26 20:29:30 +02:00
zaxpby.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zaxpy.c Handle special case INCX=0,INCY=0 in the axpy interface 2018-11-13 13:57:18 +01:00
zdot.c Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well 2017-11-18 20:28:02 +01:00
zgbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zgeadd.c Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
zgemv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zger.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zhemv.c re-arrange new code for readability 2018-10-20 21:37:53 +03:00
zher.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zher2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhpmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zhpr.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zhpr2.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
zimatcopy.c Use in-place transform shortcut only if matrix is square 2017-07-21 11:20:15 +02:00
zomatcopy.c Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them. 2014-09-08 17:57:44 +02:00
zrot.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zrotg.c fabs -> fabsl 2018-08-04 20:14:51 +02:00
zsbmv.c Use blasabs to switch between abs and labs as needed for INTERFACE64 2018-08-04 20:06:49 +02:00
zscal.c Fixed a few more unnecessary calls to num_cpu_avail. 2018-06-11 10:17:16 +01:00
zspmv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zspr.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zspr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zswap.c disable threading in C/ZSWAP copying from S/DSWAP 2018-10-22 23:21:49 +03:00
zsymv.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
zsyr.c Fixed cmake bug on Visual Studio. 2015-10-20 14:37:22 -05:00
zsyr2.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ztbmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztbsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztpmv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztpsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00
ztrmv.c Disable multithreading in ztrmv 2018-04-25 22:35:46 +02:00
ztrsv.c Modify complex CBLAS functions to take void pointers 2017-11-05 15:53:14 +01:00