Martin Kroeker
2dfb804cb9
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
...
to improve performance on AMD Zen (#2180 ) applying wjc404's improvement of the DGEMM kernel from #2186
2019-07-28 23:17:28 +02:00
Martin Kroeker
2359c7c1a9
Use .p2align instead of .align for portability
...
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730 , #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Andrew
4938faa822
core.IdenticalExpr clang501 checker
2018-01-19 23:15:58 +01:00
Martin Kroeker
3fea849bbf
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm
2017-11-14 23:35:10 +01:00
Zhang Xianyi
69363622a8
Fix DYNAMIC_ARCH=1 bug.
2015-10-27 05:10:40 +08:00
Werner Saar
e7c969e164
added optimized dtrmm_kernel for haswell
2015-06-13 16:16:29 +02:00