Add a "sgemm direct" mode for small matrixes

OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.

This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.

But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.

This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.

What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512

This commit is contained in:

Arjan van de Ven

2018-12-12 16:45:57 +00:00

parent 87718807f0

commit cdc668d82b

4 changed files with 483 additions and 1 deletions

									
										1

param.h
									
												View File
												
				@@ -1628,6 +1628,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

				#define SWITCH_RATIO	32

				#define GEMM_PREFERED_SIZE	32

				#define USE_SGEMM_KERNEL_DIRECT 1

				#ifdef ARCH_X86

Add a "sgemm direct" mode for small matrixes

1 param.h Unescape Escape View File

1

param.h

View File