OpenBLAS/param.h at 667fed579d8d1adb9d20f7e6bebd2faf57a1a8bf

Files

Arjan van de Ven 5b708e5eb1 sgemm/dgemm: add a way for an arch kernel to specify prefered sizes

The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.

this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.

The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)

2018-11-01 01:43:20 +00:00

69 KiB

Raw Blame History

View Raw

69 KiB Raw Blame History

69 KiB

Raw Blame History