Tune param.h for SkylakeX
param.h defines a per-platform SWITCH_RATIO, which is used as a measure for how fine grained the blocks for gemm need to be split up. Many platforms define this to 4. The reality is that the gemm low level implementation for SkylakeX likes bigger blocks due to the nature of SIMD... by tuning the SWITCH_RATIO to 32 the threading performance improves significantly: Before Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 10756.0 10.5 -0.5% 18296.7 6.1 -1.7% 64 x 64 20490.0 12.9 1.4% 40615.0 6.5 0.0% 65 x 65 83528.3 3.3 -210.9% 96319.0 2.9 -83.3% 80 x 80 101453.5 5.1 -166.3% 128021.7 4.0 -76.6% 96 x 96 149795.1 5.9 -143.1% 168059.4 5.3 -47.4% 112 x 112 191481.2 7.3 -105.8% 204165.0 6.9 -14.6% 128 x 128 265019.2 7.9 -99.0% 272006.4 7.7 -5.3% After Matrix SGEMM cycles MPC DGEMM cycles MPC 48 x 48 10666.3 10.6 0.4% 18236.9 6.2 -1.4% 64 x 64 20410.1 13.0 1.8% 39925.8 6.6 1.7% 65 x 65 34983.0 7.9 -30.2% 51494.6 5.4 2.0% 80 x 80 39769.1 13.0 -4.4% 63805.2 8.1 12.0% 96 x 96 45169.6 19.7 26.7% 80065.8 11.1 29.8% 112 x 112 57026.1 24.7 38.7% 99535.5 14.2 44.1% 128 x 128 64789.8 32.5 51.3% 117407.2 17.9 54.6% With this change, threading starts to be a win already at 96x96
This commit is contained in:
parent
d148ec4ea1
commit
5c6f008365