Commit Graph

6 Commits

Author SHA1 Message Date
Martin Kroeker 2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180) applying wjc404's improvement of the DGEMM kernel from #2186
2019-07-28 23:17:28 +02:00
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Andrew 4938faa822 core.IdenticalExpr clang501 checker 2018-01-19 23:15:58 +01:00
Martin Kroeker 3fea849bbf
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
Zhang Xianyi 69363622a8 Fix DYNAMIC_ARCH=1 bug. 2015-10-27 05:10:40 +08:00
Werner Saar e7c969e164 added optimized dtrmm_kernel for haswell 2015-06-13 16:16:29 +02:00