Commit Graph

412 Commits

Author SHA1 Message Date
Martin Kroeker 497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
Martin Kroeker ea37db828e
Convert .align to .p2align for OSX compatibility 2018-02-26 20:48:03 +01:00
Martin Kroeker 7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker e388459a27
Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Andrew 4938faa822 core.IdenticalExpr clang501 checker 2018-01-19 23:15:58 +01:00
Martin Kroeker 42285d8e70
Merge pull request #1410 from brada4/develop
Address warnings #1357
2018-01-06 20:02:46 +01:00
Andrew 4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Martin Kroeker b81656936f
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
2017-12-31 20:18:48 +01:00
Martin Kroeker b973990df2
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
2017-12-31 18:03:36 +01:00
Martin Kroeker 1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker 723f396a20
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Martin Kroeker 43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker 0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Andrew 281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker 6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Martin Kroeker c92cd6d162
Add trivially optimized dsdot based on sdot 2017-11-24 20:05:27 +01:00
Martin Kroeker cae5d9a20b
Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
Martin Kroeker 3d891c3106
Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00
Martin Kroeker 4fbdcfa823
Add trivially optimized dsdot based on sdot 2017-11-24 20:02:28 +01:00
Martin Kroeker 1bb6a96ebc
Add trivially optimized dsdot based on sdot 2017-11-24 20:01:42 +01:00
Martin Kroeker 6bd163f37a
Add trivially optimized dsdot based on sdot 2017-11-24 20:00:23 +01:00
Martin Kroeker f0333333d1
Add trivially optimized dsdot based on sdot 2017-11-24 19:59:28 +01:00
Andrew e89b979b2c fix spurious compiler warning fix (no code change) 2017-11-24 18:39:04 +01:00
Andrew 7e9b29b9b8 fix spurious compiler warning (no code change) 2017-11-24 18:36:37 +01:00
Martin Kroeker 6157d0902a
Merge pull request #1358 from martin-frbg/unused_vars
Clean up spurious unused variables in the kernels
2017-11-15 11:31:43 +01:00
Martin Kroeker 3fea849bbf
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
Martin Kroeker 8f177621bc
Remove unused variables at0...at3 from ?symv_U 2017-11-14 23:32:25 +01:00
Martin Kroeker 5f402b7759
Remove unused (loop?) variable j from the gemv_n_4 implementations 2017-11-14 23:29:42 +01:00
Martin Kroeker a07807caac Eliminate loop code when called as/from dsdot 2017-10-25 16:45:41 +02:00
Martin Kroeker 5e3e91d0fc Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision 2017-10-22 18:18:51 +02:00
Martin Kroeker 28c3fa8950 Add dsdot 2017-10-16 23:29:03 +02:00
Martin Kroeker 8ac87c1cb6 Implement DSDOT with unchanged sdot microkernels 2017-10-16 23:27:51 +02:00
Isuru Fernando 505b218829 Merge remote-tracking branch 'upstream/develop' into dyn 2017-08-06 19:07:00 +05:30
Isuru Fernando 1d1854032b Add missing EXCAVATOR 2017-08-02 19:03:04 +05:30
Isuru Fernando 2c51a990ac Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
2017-08-02 18:26:57 +05:30
Isuru Fernando ca17b4b75c Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Martin Kroeker a6efabf155 Replace gnu _real_ , _imag_ extensions in initializers 2017-03-13 00:38:37 +01:00
Martin Kroeker dc34a0da96 Merge pull request #915 from mdong/small_fix_for_icc
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Martin Kroeker 4998e19869 Change file comments to work around clang 3.9 assembler bug 2016-10-13 16:51:08 +02:00
Martin Kroeker 16446d1d23 Remove explicit include of complex.h 2016-09-29 23:45:56 +02:00
mdong 098d8ec5d6 remove input from clobbered list 2016-06-24 16:37:58 -04:00
Werner Saar 298b13bba4 updated some kernel files for EXCAVATOR 2016-04-25 10:36:23 +02:00
Zhang Xianyi f24d5307cf Refs #834. Fix zgemv config bug on Steamroller. 2016-04-12 22:26:11 +08:00
Zhang Xianyi d4380c1fe4 Refs xianyi/OpenBLAS-CI#10 , Fix sdot for scipy test_iterative.test_convergence test failure on AMD bulldozer and piledriver. 2016-04-07 01:44:18 +08:00
Werner Saar faa5e2e5e3 FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c 2016-03-10 11:10:38 +01:00
Werner Saar fdf291be30 Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller 2016-03-10 09:42:07 +01:00
Werner Saar c99cc41cbd Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller 2016-03-09 14:02:03 +01:00
Werner Saar acdff55a6a Bugfix for ztrmv 2016-03-07 09:39:34 +01:00