Files
OpenBLAS/kernel/x86_64
Arjan van de Ven 1938819c25 skylake dgemm: Add a 16x8 kernel
The next step for the avx512 dgemm code is adding a 16x8 kernel.
In the 8x8 kernel, each FMA has a matching load (the broadcast);
in the 16x8 kernel we can reuse this load for 2 FMAs, which
in turn reduces pressure on the load ports of the CPU and gives
a nice performance boost (in the 25% range).
2018-10-05 13:11:35 +00:00
..
2018-06-03 07:58:52 +00:00
2018-06-03 07:58:52 +00:00
2018-06-03 07:58:52 +00:00
2013-06-21 16:06:51 +02:00
2018-08-09 03:55:52 +00:00
2015-04-28 16:58:11 +02:00
2011-01-24 14:54:24 +00:00
2018-08-11 17:14:57 +00:00
2016-03-07 09:39:34 +01:00
2016-03-07 09:39:34 +01:00
2017-10-16 23:29:03 +02:00
2018-08-10 02:33:43 +00:00
2014-09-14 11:00:53 +02:00
2014-09-14 11:00:53 +02:00
2018-06-03 07:58:52 +00:00
2018-06-03 07:58:52 +00:00
2018-06-03 07:58:52 +00:00