Commit Graph

1010 Commits

Author SHA1 Message Date
Martin Kroeker 9c5518319a
Revert "Fix 32bit HASWELL builds" 2018-04-22 20:20:04 +02:00
Martin Kroeker 2ca0faf495
Merge pull request #1515 from martin-frbg/mipsdot
Correct precision of mips dsdot
2018-04-11 08:21:25 +02:00
Martin Kroeker 0fe434598b
Fix precision of mips dsdot 2018-04-10 23:30:59 +02:00
Martin Kroeker c7b55b6082
Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd  kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
Martin Kroeker 840e01061f
Merge pull request #1491 from martin-frbg/ddot_mt
Add multithreading support for Haswell DDOT
2018-03-27 21:43:05 +02:00
QWR QWR 28ca97015d power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker 6a6ffaff1e
Merge pull request #1494 from martin-frbg/x86_dsdot
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
2018-03-17 15:26:47 +01:00
Martin Kroeker 28ac9ea5a6
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker a55694dd5b
Declare dot_compute static to avoid conflicts in multiarch builds 2018-03-16 22:23:36 +01:00
Martin Kroeker 85a41e9cdb
Add multithreading support for Haswell DDOT
copied from ashwinyes' implementation in dot_thunderx2t99.c
2018-03-16 16:58:47 +01:00
Martin Kroeker 81215711a2
Re-enable DAXPY microkernels for x86_64
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker 22167170b3
Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Ashwin Sekhar T K fa9ca65c0e ARM64: Fix utest dsdot errors 2018-02-27 10:47:55 +00:00
Martin Kroeker 719b68f077
Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker fe9f15f2d8
Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker 497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
Martin Kroeker ea37db828e
Convert .align to .p2align for OSX compatibility 2018-02-26 20:48:03 +01:00
Martin Kroeker 6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker 58f236ad73
Use generic/dot.c for DSDOT on zarch 2018-02-25 19:52:14 +01:00
Martin Kroeker e207107150
Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker c9d408064a
Use dot.S also for DSDOT on CORTEXA57 2018-02-25 19:48:09 +01:00
Martin Kroeker 288d1a3f6e
Use dot.S also for DSDOT on ARMV8 2018-02-25 19:45:16 +01:00
Martin Kroeker 7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker e7366a4161
Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
the mslm 2c0a008281 dgemm_ncopy_4_ save/restore 2018-02-18 01:30:17 +00:00
the mslm c5425daa6b power8 ?gemm_tcopy save/restore 2018-02-16 23:36:46 +00:00
Martin Kroeker b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
ref #1439
2018-02-06 11:42:58 +01:00
Abdelrauf 60596a1abc
Merge branch 'develop' into develop 2018-01-31 16:17:04 -08:00
Abdelrauf afd514c25d small fix inside ifdef z13mvc . (z13mvc code is not used in production) 2018-01-31 18:30:59 -05:00
Martin Kroeker f45776ec1f
Merge pull request #1440 from quickwritereader/develop
small corrections
2018-01-31 23:48:47 +01:00
Martin Kroeker e388459a27
Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Abdelrauf f653e7a18d
small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
the mslm f946a89432 zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes 2018-01-26 19:25:27 -08:00
Martin Kroeker 485df77612
Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
2018-01-26 23:20:00 +01:00
Martin Kroeker e4c71a799a
Merge pull request #1426 from quickwritereader/develop
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
2018-01-20 17:34:54 +01:00
the mslm 2619ad7ea5 Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
names). Some fixes and tunings
2018-01-19 19:24:35 -08:00
Andrew e5cc3d72c0 core.IdenticalExpr clang501 checker 2018-01-19 23:17:43 +01:00
Andrew 4938faa822 core.IdenticalExpr clang501 checker 2018-01-19 23:15:58 +01:00
Andrew 9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew 3eed97f6b9 Initialize values to silence cppcheck 2018-01-12 22:35:00 +01:00
Andrew 13e137fbc9 Initialize uninitialized variables (cppcheck) 2018-01-12 22:33:41 +01:00
Martin Kroeker 3d23f45107
Merge pull request #1415 from quickwritereader/develop
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
2018-01-11 08:35:02 +01:00
Abdelrauf 87669d1c0a small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined 2018-01-10 20:36:53 -05:00
Martin Kroeker 42285d8e70
Merge pull request #1410 from brada4/develop
Address warnings #1357
2018-01-06 20:02:46 +01:00
Andrew d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Andrew 4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Martin Kroeker b81656936f
Merge pull request #1409 from martin-frbg/issue1292-2
Tag %1 and %2 as both input and output operands
2017-12-31 20:18:48 +01:00
Martin Kroeker b973990df2
Tag %1 and %2 as both input and output operands
fix from #1292 extended to the other gemv microkernels
2017-12-31 18:03:36 +01:00
Martin Kroeker 1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker cc9500db41
Merge pull request #1403 from brada4/develop
Address few more warnings
2017-12-30 14:51:34 +01:00
Martin Kroeker 723f396a20
Tag %1 and %2 as both input and output
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Andrew 03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew 47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Martin Kroeker 43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker 0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Andrew 281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker 8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Martin Kroeker db00a51e6b
Merge pull request #1371 from martin-frbg/develop
Add trivially optimized DSDOT for POWER8
2017-11-29 19:55:21 +01:00
martin 7a4b3cfbf8 Add trivially optimized DSDOT for POWER8 2017-11-28 18:38:07 +01:00
Martin Kroeker 6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Andrew 441a9c8385 more dead increments clang4 scan-build deadcode.deadstores 2017-11-26 17:24:08 +01:00
Andrew 1236dbe5a6 Eliminate 2-8 dead increments code 2017-11-26 13:26:11 +01:00
Martin Kroeker c92cd6d162
Add trivially optimized dsdot based on sdot 2017-11-24 20:05:27 +01:00
Martin Kroeker cae5d9a20b
Add trivially optimized dsdot based on sdot 2017-11-24 20:04:29 +01:00
Martin Kroeker 3d891c3106
Add trivially optimized dsdot based on sdot 2017-11-24 20:03:40 +01:00
Martin Kroeker 4fbdcfa823
Add trivially optimized dsdot based on sdot 2017-11-24 20:02:28 +01:00
Martin Kroeker 1bb6a96ebc
Add trivially optimized dsdot based on sdot 2017-11-24 20:01:42 +01:00
Martin Kroeker 6bd163f37a
Add trivially optimized dsdot based on sdot 2017-11-24 20:00:23 +01:00
Martin Kroeker f0333333d1
Add trivially optimized dsdot based on sdot 2017-11-24 19:59:28 +01:00
Andrew e89b979b2c fix spurious compiler warning fix (no code change) 2017-11-24 18:39:04 +01:00
Andrew 7e9b29b9b8 fix spurious compiler warning (no code change) 2017-11-24 18:36:37 +01:00
Martin Kroeker 6157d0902a
Merge pull request #1358 from martin-frbg/unused_vars
Clean up spurious unused variables in the kernels
2017-11-15 11:31:43 +01:00
Martin Kroeker 3fea849bbf
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm 2017-11-14 23:35:10 +01:00
Martin Kroeker 8f177621bc
Remove unused variables at0...at3 from ?symv_U 2017-11-14 23:32:25 +01:00
Martin Kroeker 5f402b7759
Remove unused (loop?) variable j from the gemv_n_4 implementations 2017-11-14 23:29:42 +01:00
Martin Kroeker 65bf0a343c
Remove unused variable btpr 2017-11-14 23:25:50 +01:00
Martin Kroeker acf3d34bc5
Silence an unused variable warning with a cast
l2 cache size is not universally needed to assign default unrolling limits, but neither putting its declaration inside an ifdef nor cloning it into all ifdef sections that need it really makes sense here.
2017-11-14 23:23:44 +01:00
Martin Kroeker ab87ee6b48 Merge pull request #1329 from martin-frbg/dsdot
(Trivial) optimized dsdot implementation for HASWELL
2017-10-25 19:13:38 +02:00
Martin Kroeker a07807caac Eliminate loop code when called as/from dsdot 2017-10-25 16:45:41 +02:00
Ashwin Sekhar T K a0128aa489 ARM64: Convert all labels to local labels
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.

To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Martin Kroeker 0e2cf102e1 Fix 32bit HASWELL 2017-10-24 10:07:44 +02:00
Martin Kroeker 5e3e91d0fc Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision 2017-10-22 18:18:51 +02:00
Martin Kroeker 28c3fa8950 Add dsdot 2017-10-16 23:29:03 +02:00
Martin Kroeker 8ac87c1cb6 Implement DSDOT with unchanged sdot microkernels 2017-10-16 23:27:51 +02:00
Martin Kroeker c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Martin Kroeker 97ecd4996a Merge pull request #1319 from martin-frbg/issue601
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
2017-10-08 23:31:06 +02:00
Martin Kroeker 1eb43cccad Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
2017-10-08 23:30:46 +02:00
Martin Kroeker 9d92f526dd Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker 514d237257 Merge pull request #1279 from xsacha/develop
CMake improvements
2017-10-06 21:13:45 +02:00
Martin Kroeker f96afd94b0 Fix out-of-bounds accesses where the data should be zero anyway 2017-10-01 01:06:39 +02:00
Martin Kroeker 9c017a2218 Save and restore VSX registers 2017-09-28 12:17:09 +02:00
Shivraj Patil e3d844b062 Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-09-22 11:57:43 +05:30
Martin Kroeker 46c9357c72 Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
2017-09-09 23:47:17 +02:00
Abdurrauf 1cfdb2295d Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Sacha Refshauge 47ebce4d1a Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position. 2017-08-21 00:37:29 +10:00
Sacha Refshauge 69b560751c Improvements to previous commit (cross-compile).
Fix typos and bad if statements discovered in 0.2.20.
2017-08-20 22:50:31 +10:00
Sacha Refshauge 11911fd941 Add kernel/Makefile.LA to CMake 2017-08-20 00:59:14 +10:00
Isuru Fernando d3b677fe87 Add commonobjs 2017-08-07 23:12:40 +05:30
Isuru Fernando 505b218829 Merge remote-tracking branch 'upstream/develop' into dyn 2017-08-06 19:07:00 +05:30