OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Arjan van de Ven	cdc668d82b	Add a "sgemm direct" mode for small matrixes OpenBLAS has a fancy algorithm for copying the input data while laying it out in a more CPU friendly memory layout. This is great for large matrixes; the cost of the copy is easily ammortized by the gains from the better memory layout. But for small matrixes (on CPUs that can do efficient unaligned loads) this copy can be a net loss. This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses the whole copy machinary for ALPHA=1/BETA=0/... standard arguments, for small matrixes only. What is small? For the non-threaded case this has been measured to be in the MNK = 28 * 512 * 512 range, while in the threaded case it's less, around MNK = 1 * 512 * 512	2018-12-13 13:47:31 +00:00
Craig Donner	66316b9f4c	Improve performance of GEMM for small matrices when SMP is defined. Always checking num_cpu_avail() regardless of whether threading will actually be used adds noticeable overhead for small matrices. Most other uses of num_cpu_avail() do so only if threading will be used, so do the same here.	2018-06-07 15:29:13 +01:00
Martin Kroeker	2c222f1faa	Modify complex CBLAS functions to take void pointers Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)	2017-11-05 15:53:14 +01:00
Hank Anderson	e74462a3f5	Moved declarations to start of functions to satisfy MSVC C89 implementation.	2015-02-11 11:16:57 -06:00
wernsaar	3300f5ebff	optimized multithreading lower limits	2014-09-15 11:38:25 +02:00
wernsaar	d286daa2ba	adjusted number of threads for small size	2014-07-15 14:41:35 +02:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	a19d209005	Ref #103 : enhancement for small matrix dimensions	2014-06-18 15:04:11 +02:00
Lars Buitinck	3f7b0cd994	Merge pull request #290 from larsmans/missing-threshold check if GEMM_MULTITHREAD_THRESHOLD defined in gemm.c Set a fallback value.	2013-08-29 00:33:55 +08:00
Xianyi Zhang	31c836ac25	Ref #79 Added GEMM_MULTITHREAD_THRESHOLD flag to use single thread in gemm function with small matrices.	2012-03-23 01:17:41 +08:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

11 Commits