Martin Kroeker
95f7f0229c
Remove extraneous brace from previous commit
2018-05-17 18:43:59 +02:00
Martin Kroeker
5082fe4306
Merge pull request #1564 from martin-frbg/issue1563
...
Revert changes from PR#1419
2018-05-17 14:04:13 +02:00
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
...
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Martin Kroeker
893b535540
Use correct data type for initializers of v2f64, v4f32
...
Fixes #1561
2018-05-15 14:42:12 +02:00
Martin Kroeker
018f2dad27
Switch mips32 target to USE_TRMM to fix complex TRMM
2018-05-02 20:25:32 +02:00
Martin Kroeker
9d5098dbc9
Add MIPS 1004K target (Mediatek MT7621 SOC)
2018-05-02 20:20:44 +02:00
Martin Kroeker
954f1832de
Merge pull request #1540 from martin-frbg/mips32-zasum
...
Fix typo in MIPS P5600 complex ASUM code selection
2018-04-25 23:23:00 +02:00
Martin Kroeker
941ad280a8
Fix typo in MIPS P5600 complex ASUM code selection
2018-04-25 22:50:10 +02:00
Martin Kroeker
1da365312a
Merge pull request #1538 from martin-frbg/arm7utest
...
Fix handling of zero INCX, INCY in ArmV7 AXPY and ROT
2018-04-25 08:38:58 +02:00
Martin Kroeker
2d0929fa7c
Move the test for zero incx,incy in ARMV7 ROT
...
to pass the related utest (see #1469 )
2018-04-24 22:43:00 +02:00
Martin Kroeker
125343cc88
Drop test for zero incx,incy in armv7 AXPY
...
...to pass the related utest (see #1469 )
2018-04-24 22:39:50 +02:00
Martin Kroeker
8a3b6fa108
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … ( #1535 )
...
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
2018-04-23 19:05:49 +02:00
Martin Kroeker
9c5518319a
Revert "Fix 32bit HASWELL builds"
2018-04-22 20:20:04 +02:00
Martin Kroeker
2ca0faf495
Merge pull request #1515 from martin-frbg/mipsdot
...
Correct precision of mips dsdot
2018-04-11 08:21:25 +02:00
Martin Kroeker
0fe434598b
Fix precision of mips dsdot
2018-04-10 23:30:59 +02:00
Martin Kroeker
c7b55b6082
Merge pull request #1499 from quickwritereader/develop
...
Implemented missing vsx simd kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
Martin Kroeker
840e01061f
Merge pull request #1491 from martin-frbg/ddot_mt
...
Add multithreading support for Haswell DDOT
2018-03-27 21:43:05 +02:00
QWR QWR
28ca97015d
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
...
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker
6a6ffaff1e
Merge pull request #1494 from martin-frbg/x86_dsdot
...
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
2018-03-17 15:26:47 +01:00
Martin Kroeker
28ac9ea5a6
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
...
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker
a55694dd5b
Declare dot_compute static to avoid conflicts in multiarch builds
2018-03-16 22:23:36 +01:00
Martin Kroeker
85a41e9cdb
Add multithreading support for Haswell DDOT
...
copied from ashwinyes' implementation in dot_thunderx2t99.c
2018-03-16 16:58:47 +01:00
Martin Kroeker
81215711a2
Re-enable DAXPY microkernels for x86_64
...
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker
22167170b3
Merge pull request #1477 from quickwritereader/develop
...
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
2018-02-27 10:47:55 +00:00
Martin Kroeker
719b68f077
Merge pull request #1473 from martin-frbg/p2align
...
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker
fe9f15f2d8
Merge pull request #1472 from martin-frbg/utest-fixes
...
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker
497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels
2018-02-26 20:58:33 +01:00
Martin Kroeker
ea37db828e
Convert .align to .p2align for OSX compatibility
2018-02-26 20:48:03 +01:00
Martin Kroeker
6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
...
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker
58f236ad73
Use generic/dot.c for DSDOT on zarch
2018-02-25 19:52:14 +01:00
Martin Kroeker
e207107150
Use generic/dot.c for DSDOT on z13
...
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
2018-02-25 19:45:16 +01:00
Martin Kroeker
7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well
2018-02-24 19:43:15 +01:00
Martin Kroeker
2359c7c1a9
Use .p2align instead of .align for portability
...
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730 , #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker
e7366a4161
Restore the remaining utests ( #1462 )
...
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
...
ref #1439
2018-02-06 11:42:58 +01:00
Abdelrauf
60596a1abc
Merge branch 'develop' into develop
2018-01-31 16:17:04 -08:00
Abdelrauf
afd514c25d
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 18:30:59 -05:00
Martin Kroeker
f45776ec1f
Merge pull request #1440 from quickwritereader/develop
...
small corrections
2018-01-31 23:48:47 +01:00
Martin Kroeker
e388459a27
Merge pull request #1419 from brada4/develop
...
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Abdelrauf
f653e7a18d
small fix
...
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
the mslm
f946a89432
zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes
2018-01-26 19:25:27 -08:00
Martin Kroeker
485df77612
Make USE_TRMM depend on TARGET_CORE not TARGET
...
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
2018-01-26 23:20:00 +01:00
Martin Kroeker
e4c71a799a
Merge pull request #1426 from quickwritereader/develop
...
(Z13 ) Blas1 mikrokernels can be inlined by gcc. Refactoring,fixes,tunings
2018-01-20 17:34:54 +01:00
the mslm
2619ad7ea5
Blas1 mikrokernels can be inlined by gcc. Refactoring ( symbolic operan
...
names). Some fixes and tunings
2018-01-19 19:24:35 -08:00
Andrew
e5cc3d72c0
core.IdenticalExpr clang501 checker
2018-01-19 23:17:43 +01:00
Andrew
4938faa822
core.IdenticalExpr clang501 checker
2018-01-19 23:15:58 +01:00
Andrew
9fa986337d
add missing brackets to silence indentation warnings gcc721
2018-01-19 23:11:12 +01:00
Andrew
3eed97f6b9
Initialize values to silence cppcheck
2018-01-12 22:35:00 +01:00
Andrew
13e137fbc9
Initialize uninitialized variables (cppcheck)
2018-01-12 22:33:41 +01:00
Martin Kroeker
3d23f45107
Merge pull request #1415 from quickwritereader/develop
...
(Z systems Z13) small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels…
2018-01-11 08:35:02 +01:00
Abdelrauf
87669d1c0a
small fixes, some (i(dz)amin,i(dz)amax,(dz)dot,(dz)asum) mikrokernels can be inlined
2018-01-10 20:36:53 -05:00
Martin Kroeker
42285d8e70
Merge pull request #1410 from brada4/develop
...
Address warnings #1357
2018-01-06 20:02:46 +01:00
Andrew
d602b99386
LAPACK helpers in C that need care too
2018-01-02 14:38:50 +01:00
Andrew
4d0b005e5b
Eliminate remaining unused results in kernels (clang5 analyzer)
2018-01-01 20:54:39 +01:00
Martin Kroeker
b81656936f
Merge pull request #1409 from martin-frbg/issue1292-2
...
Tag %1 and %2 as both input and output operands
2017-12-31 20:18:48 +01:00
Martin Kroeker
b973990df2
Tag %1 and %2 as both input and output operands
...
fix from #1292 extended to the other gemv microkernels
2017-12-31 18:03:36 +01:00
Martin Kroeker
1e31124eb0
Merge pull request #1406 from martin-frbg/issue1292
...
Tag %1 and %2 as both input and output
2017-12-30 14:52:03 +01:00
Martin Kroeker
cc9500db41
Merge pull request #1403 from brada4/develop
...
Address few more warnings
2017-12-30 14:51:34 +01:00
Martin Kroeker
723f396a20
Tag %1 and %2 as both input and output
...
The inline assembly modifies its input operands, so mark them as output to avoid surprises with optimization. Fixes #1292
2017-12-29 23:56:41 +01:00
Andrew
03e5ff0687
initialize potentially unitialized variables (clang5)
2017-12-26 09:24:24 +01:00
Andrew
47deec2c1a
fix couple of dead assignment warnings
2017-12-22 00:56:35 +01:00
Martin Kroeker
43c0622e7b
Retire Piledriver/Steamroller/Excavator daxpy microkernels as well
...
related to issue #1332
2017-12-13 18:40:39 +01:00
Martin Kroeker
0623636c98
Use Sandybridge daxpy kernel on Haswell and Zen for now
...
The testcase from #1332 exposes a problem in daxpy_microk_haswell-2.c that is not seen with
any of the other Intel x86_64 microkernels.
2017-12-10 19:24:31 +01:00
Andrew
281a2b952f
warning cleanup ( #1380 )
...
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
2017-12-02 22:51:58 +01:00
Martin Kroeker
db00a51e6b
Merge pull request #1371 from martin-frbg/develop
...
Add trivially optimized DSDOT for POWER8
2017-11-29 19:55:21 +01:00
martin
7a4b3cfbf8
Add trivially optimized DSDOT for POWER8
2017-11-28 18:38:07 +01:00
Martin Kroeker
6c77b5f267
Merge pull request #1369 from martin-frbg/dsdot
...
Add optimized dsdot to all other x86_64 kernels that use sdot.c
2017-11-28 18:15:31 +01:00
Andrew
441a9c8385
more dead increments clang4 scan-build deadcode.deadstores
2017-11-26 17:24:08 +01:00
Andrew
1236dbe5a6
Eliminate 2-8 dead increments code
2017-11-26 13:26:11 +01:00
Martin Kroeker
c92cd6d162
Add trivially optimized dsdot based on sdot
2017-11-24 20:05:27 +01:00
Martin Kroeker
cae5d9a20b
Add trivially optimized dsdot based on sdot
2017-11-24 20:04:29 +01:00
Martin Kroeker
3d891c3106
Add trivially optimized dsdot based on sdot
2017-11-24 20:03:40 +01:00
Martin Kroeker
4fbdcfa823
Add trivially optimized dsdot based on sdot
2017-11-24 20:02:28 +01:00
Martin Kroeker
1bb6a96ebc
Add trivially optimized dsdot based on sdot
2017-11-24 20:01:42 +01:00
Martin Kroeker
6bd163f37a
Add trivially optimized dsdot based on sdot
2017-11-24 20:00:23 +01:00
Martin Kroeker
f0333333d1
Add trivially optimized dsdot based on sdot
2017-11-24 19:59:28 +01:00
Andrew
e89b979b2c
fix spurious compiler warning fix (no code change)
2017-11-24 18:39:04 +01:00
Andrew
7e9b29b9b8
fix spurious compiler warning (no code change)
2017-11-24 18:36:37 +01:00
Martin Kroeker
6157d0902a
Merge pull request #1358 from martin-frbg/unused_vars
...
Clean up spurious unused variables in the kernels
2017-11-15 11:31:43 +01:00
Martin Kroeker
3fea849bbf
Remove unused variables from Haswell dtrmm and Bulldozer dtrsm
2017-11-14 23:35:10 +01:00
Martin Kroeker
8f177621bc
Remove unused variables at0...at3 from ?symv_U
2017-11-14 23:32:25 +01:00
Martin Kroeker
5f402b7759
Remove unused (loop?) variable j from the gemv_n_4 implementations
2017-11-14 23:29:42 +01:00
Martin Kroeker
65bf0a343c
Remove unused variable btpr
2017-11-14 23:25:50 +01:00
Martin Kroeker
acf3d34bc5
Silence an unused variable warning with a cast
...
l2 cache size is not universally needed to assign default unrolling limits, but neither putting its declaration inside an ifdef nor cloning it into all ifdef sections that need it really makes sense here.
2017-11-14 23:23:44 +01:00
Martin Kroeker
ab87ee6b48
Merge pull request #1329 from martin-frbg/dsdot
...
(Trivial) optimized dsdot implementation for HASWELL
2017-10-25 19:13:38 +02:00
Martin Kroeker
a07807caac
Eliminate loop code when called as/from dsdot
2017-10-25 16:45:41 +02:00
Ashwin Sekhar T K
a0128aa489
ARM64: Convert all labels to local labels
...
While debugging/profiling applications using perf or other tools, the
kernels appear scattered in the profile reports. This is because the labels
within the kernels are not local and each label is shown as a separate
function.
To avoid this, all the labels within the kernels are changed to local
labels.
2017-10-24 11:40:05 +00:00
Martin Kroeker
0e2cf102e1
Fix 32bit HASWELL
2017-10-24 10:07:44 +02:00
Martin Kroeker
5e3e91d0fc
Split the microkernel workload into chunks of 32 floats for dsdot mode to limit loss of precision
2017-10-22 18:18:51 +02:00
Martin Kroeker
28c3fa8950
Add dsdot
2017-10-16 23:29:03 +02:00
Martin Kroeker
8ac87c1cb6
Implement DSDOT with unchanged sdot microkernels
2017-10-16 23:27:51 +02:00
Martin Kroeker
c7a8512d12
Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names ( #1323 )
...
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295 )
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Martin Kroeker
97ecd4996a
Merge pull request #1319 from martin-frbg/issue601
...
Fix out-of-bounds memory accesses exposed by xccblat3 testcase
2017-10-08 23:31:06 +02:00
Martin Kroeker
1eb43cccad
Merge pull request #1317 from martin-frbg/power8-asm
...
Save and restore VSX registers
2017-10-08 23:30:46 +02:00