Commit Graph

6 Commits

Author SHA1 Message Date
Martin Kroeker
2dfb804cb9 Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180) applying wjc404's improvement of the DGEMM kernel from #2186
2019-07-28 23:17:28 +02:00
Martin Kroeker
2359c7c1a9 Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Andrew
4938faa822 core.IdenticalExpr clang501 checker 2018-01-19 23:15:58 +01:00
Martin Kroeker
3fea849bbf Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
Zhang Xianyi
69363622a8 Fix DYNAMIC_ARCH=1 bug. 2015-10-27 05:10:40 +08:00
Werner Saar
e7c969e164 added optimized dtrmm_kernel for haswell 2015-06-13 16:16:29 +02:00