OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	cee257f384	Ref #51 : added blas extensions zomatcopy and comatcopy	2014-06-10 10:34:54 +02:00
wernsaar	7bfb3011e8	Ref #51 : added blas extension somatcopy	2014-06-09 20:21:13 +02:00
wernsaar	8c8f596238	Ref #51 : added blas extension domatcopy as not opimized reference	2014-06-09 17:11:07 +02:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

8 Commits