Commit Graph

975 Commits

Author SHA1 Message Date
Martin Kroeker
f5959f2543 Merge pull request #1567 from martin-frbg/mipstrmm
Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
2018-05-17 20:50:23 +02:00
Martin Kroeker
82012b960b Revert " Switch mips32 target to USE_TRMM to fix complex TRMM"
... as it was just a silly workaround for the issue seen in #1563, caused by #1419
2018-05-17 20:30:03 +02:00
Martin Kroeker
8dd3515fa2 Merge pull request #1565 from martin-frbg/mipstypo
Remove extraneous brace from previous commit of mips dsdot fix
2018-05-17 20:22:58 +02:00
Martin Kroeker
95f7f0229c Remove extraneous brace from previous commit 2018-05-17 18:43:59 +02:00
Martin Kroeker
5082fe4306 Merge pull request #1564 from martin-frbg/issue1563
Revert changes from PR#1419
2018-05-17 14:04:13 +02:00
Martin Kroeker
7a7619af6d Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Martin Kroeker
893b535540 Use correct data type for initializers of v2f64, v4f32
Fixes #1561
2018-05-15 14:42:12 +02:00
Martin Kroeker
018f2dad27 Switch mips32 target to USE_TRMM to fix complex TRMM 2018-05-02 20:25:32 +02:00
Martin Kroeker
9d5098dbc9 Add MIPS 1004K target (Mediatek MT7621 SOC) 2018-05-02 20:20:44 +02:00
Martin Kroeker
954f1832de Merge pull request #1540 from martin-frbg/mips32-zasum
Fix typo in MIPS P5600 complex ASUM code selection
2018-04-25 23:23:00 +02:00
Martin Kroeker
941ad280a8 Fix typo in MIPS P5600 complex ASUM code selection 2018-04-25 22:50:10 +02:00
Martin Kroeker
1da365312a Merge pull request #1538 from martin-frbg/arm7utest
Fix handling of zero INCX, INCY in ArmV7 AXPY and ROT
2018-04-25 08:38:58 +02:00
Martin Kroeker
2d0929fa7c Move the test for zero incx,incy in ARMV7 ROT
to pass the related utest (see #1469)
2018-04-24 22:43:00 +02:00
Martin Kroeker
125343cc88 Drop test for zero incx,incy in armv7 AXPY
...to pass the related utest (see #1469)
2018-04-24 22:39:50 +02:00
Martin Kroeker
8a3b6fa108 Use generic zrot.c on ppc64/POWER6 to work around utest failure from … (#1535)
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
2018-04-23 19:05:49 +02:00
Martin Kroeker
9c5518319a Revert "Fix 32bit HASWELL builds" 2018-04-22 20:20:04 +02:00
Martin Kroeker
2ca0faf495 Merge pull request #1515 from martin-frbg/mipsdot
Correct precision of mips dsdot
2018-04-11 08:21:25 +02:00
Martin Kroeker
0fe434598b Fix precision of mips dsdot 2018-04-10 23:30:59 +02:00
Martin Kroeker
c7b55b6082 Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd  kernels for power8 blas1/2 double. z13 modifications
2018-03-27 21:43:23 +02:00
Martin Kroeker
840e01061f Merge pull request #1491 from martin-frbg/ddot_mt
Add multithreading support for Haswell DDOT
2018-03-27 21:43:05 +02:00
QWR QWR
28ca97015d power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
2018-03-27 14:54:41 +00:00
Martin Kroeker
6a6ffaff1e Merge pull request #1494 from martin-frbg/x86_dsdot
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
2018-03-17 15:26:47 +01:00
Martin Kroeker
28ac9ea5a6 Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
2018-03-17 13:49:15 +01:00
Martin Kroeker
a55694dd5b Declare dot_compute static to avoid conflicts in multiarch builds 2018-03-16 22:23:36 +01:00
Martin Kroeker
85a41e9cdb Add multithreading support for Haswell DDOT
copied from ashwinyes' implementation in dot_thunderx2t99.c
2018-03-16 16:58:47 +01:00
Martin Kroeker
81215711a2 Re-enable DAXPY microkernels for x86_64
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker
22167170b3 Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Ashwin Sekhar T K
fa9ca65c0e ARM64: Fix utest dsdot errors 2018-02-27 10:47:55 +00:00
Martin Kroeker
719b68f077 Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker
fe9f15f2d8 Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker
497f0c3d8a Replace .align with .p2align in the Nehalem microkernels 2018-02-26 20:58:33 +01:00
Martin Kroeker
ea37db828e Convert .align to .p2align for OSX compatibility 2018-02-26 20:48:03 +01:00
Martin Kroeker
6e70287776 Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker
58f236ad73 Use generic/dot.c for DSDOT on zarch 2018-02-25 19:52:14 +01:00
Martin Kroeker
e207107150 Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker
c9d408064a Use dot.S also for DSDOT on CORTEXA57 2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e Use dot.S also for DSDOT on ARMV8 2018-02-25 19:45:16 +01:00
Martin Kroeker
7c1925acec Use .p2align instead of .align for compatibility on Sandybridge as well 2018-02-24 19:43:15 +01:00
Martin Kroeker
2359c7c1a9 Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker
e7366a4161 Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
the mslm
2c0a008281 dgemm_ncopy_4_ save/restore 2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b power8 ?gemm_tcopy save/restore 2018-02-16 23:36:46 +00:00
Martin Kroeker
b47e6822aa Enable most assembly kernels in the generic ARMV8 target
ref #1439
2018-02-06 11:42:58 +01:00
Abdelrauf
60596a1abc Merge branch 'develop' into develop 2018-01-31 16:17:04 -08:00
Abdelrauf
afd514c25d small fix inside ifdef z13mvc . (z13mvc code is not used in production) 2018-01-31 18:30:59 -05:00
Martin Kroeker
f45776ec1f Merge pull request #1440 from quickwritereader/develop
small corrections
2018-01-31 23:48:47 +01:00
Martin Kroeker
e388459a27 Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Abdelrauf
f653e7a18d small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
the mslm
f946a89432 zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes 2018-01-26 19:25:27 -08:00
Martin Kroeker
485df77612 Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
2018-01-26 23:20:00 +01:00