Files
OpenBLAS/kernel
Arjan van de Ven 1938819c25 skylake dgemm: Add a 16x8 kernel
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
2018-10-05 13:11:35 +00:00
..
2018-09-09 16:52:25 +02:00
2018-06-03 13:22:59 +02:00
2018-10-05 13:11:35 +00:00