Martin Kroeker
40160ff3c1
Use _Atomic instead of volatile for thread safety where C11 is supported
2018-03-10 00:15:44 +01:00
Martin Kroeker
6a99fcce94
Use _Atomic instead of volatile for thread safety where C11 is supported
...
Suggested by dodomorandi in #660
2018-03-10 00:03:49 +01:00
Martin Kroeker
2c7392f07b
Merge pull request #1482 from martin-frbg/haswell_axpy
...
Re-enable DAXPY AVX microkernels for x86_64
2018-03-04 22:21:18 +01:00
Martin Kroeker
81215711a2
Re-enable DAXPY microkernels for x86_64
...
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
2018-03-04 19:37:03 +01:00
Martin Kroeker
809fd0d451
Rewrite ROTMG to address cases not covered by the netlib algorithm ( #1480 )
...
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
2018-03-04 17:39:56 +01:00
Martin Kroeker
72e65157df
Merge pull request #1481 from martin-frbg/utest-fixup
...
Fix transposition of expected and computed values in error message
2018-03-03 22:43:56 +01:00
Martin Kroeker
69a8aa6de2
Fix transposition of expected and computed values in error message
2018-03-03 18:01:51 +01:00
Martin Kroeker
0ab5bf1746
Merge pull request #1476 from xsacha/patch-1
...
Fix CMake cross-compiling
2018-02-28 18:47:57 +01:00
Martin Kroeker
22167170b3
Merge pull request #1477 from quickwritereader/develop
...
Power8 blas3 copy-pack routines
2018-02-28 18:46:54 +01:00
Martin Kroeker
69d9f36ff4
Merge pull request #1468 from martin-frbg/martin-frbg-patch-1
...
Limit the additional locking from PRs 1052,1299 to non-OpenMP cases
2018-02-28 18:40:31 +01:00
Sacha
f81815e48a
Fix CMake cross-compiling
...
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
2018-02-28 10:25:25 +10:00
Martin Kroeker
5f855d965d
Merge pull request #1475 from ashwinyes/develop_20180227_utest_dsdot_fixes
...
ARM64: Fix utest dsdot errors
2018-02-27 14:04:16 +01:00
Ashwin Sekhar T K
fa9ca65c0e
ARM64: Fix utest dsdot errors
2018-02-27 10:47:55 +00:00
Martin Kroeker
719b68f077
Merge pull request #1473 from martin-frbg/p2align
...
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
2018-02-27 08:28:20 +01:00
Martin Kroeker
fe9f15f2d8
Merge pull request #1472 from martin-frbg/utest-fixes
...
Fix limited DSDOT precision on arm,aarch64 and zarch
2018-02-26 22:48:07 +01:00
Martin Kroeker
497f0c3d8a
Replace .align with .p2align in the Nehalem microkernels
2018-02-26 20:58:33 +01:00
Martin Kroeker
ea37db828e
Convert .align to .p2align for OSX compatibility
2018-02-26 20:48:03 +01:00
Martin Kroeker
e6a0a3de73
Merge pull request #1471 from martin-frbg/p2align
...
Use .p2align instead of .align for portability on Haswell and Sandybridge
2018-02-26 12:28:01 +01:00
Martin Kroeker
6e70287776
Use generic/dot.c for DSDOT on ARMV5 and above
...
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
2018-02-25 19:57:23 +01:00
Martin Kroeker
58f236ad73
Use generic/dot.c for DSDOT on zarch
2018-02-25 19:52:14 +01:00
Martin Kroeker
e207107150
Use generic/dot.c for DSDOT on z13
...
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
2018-02-25 19:51:25 +01:00
Martin Kroeker
c9d408064a
Use dot.S also for DSDOT on CORTEXA57
2018-02-25 19:48:09 +01:00
Martin Kroeker
288d1a3f6e
Use dot.S also for DSDOT on ARMV8
2018-02-25 19:45:16 +01:00
Martin Kroeker
7c1925acec
Use .p2align instead of .align for compatibility on Sandybridge as well
2018-02-24 19:43:15 +01:00
Martin Kroeker
2359c7c1a9
Use .p2align instead of .align for portability
...
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730 , #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker
7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
2018-02-21 11:45:33 +01:00
Martin Kroeker
e3a80e6aa8
Merge pull request #1466 from xianyi/revert-1464-issue1461
...
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:38 +01:00
Martin Kroeker
8866e393a2
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:12 +01:00
Martin Kroeker
9e87e6f3d8
Merge pull request #1464 from martin-frbg/issue1461
...
Add locks only for non-OPENMP multithreading
2018-02-20 14:08:24 +01:00
Martin Kroeker
3119b2ab4c
Add locks only for non-OPENMP multithreading
...
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Martin Kroeker
e7366a4161
Restore the remaining utests ( #1462 )
...
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Martin Kroeker
6c0f79b787
Merge pull request #1463 from martin-frbg/rotmg2
...
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
2018-02-18 15:12:42 +01:00
Martin Kroeker
7cd7acf71e
Merge pull request #1460 from martin-frbg/issue1425
...
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
2018-02-18 15:12:20 +01:00
Martin Kroeker
72f14a0363
Fix conditionals in the rescaling against GAMSQ
2018-02-18 12:54:52 +01:00
Martin Kroeker
53026dc63a
Update single and double precision BLAS1 tests from LAPACK 3.8.0
...
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
2018-02-18 12:44:14 +01:00
Martin Kroeker
798f1595d5
Fix condition in both second scaling loops
2018-02-18 12:37:09 +01:00
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
Martin Kroeker
eaab622f03
Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
...
To allow compiling with gcc versions older than 4.9
2018-02-14 22:58:14 +01:00
Martin Kroeker
3cda1ce50a
Revert insiduous suppression of the -fopenmp flag in the LAPACK subtree
...
This was added in #1046 citing a problem with mingw, but in effect it quietly reduces thread safety on all non-Windows platforms (while -fopenmp is already disabled for Windows builds through the toplevel Makefile.system). Removing the filter fixes #1425
2018-02-13 22:44:45 +01:00
Martin Kroeker
0391c07b17
Merge pull request #1457 from martin-frbg/issue1456
...
test_fork is not meant to be run (nor expected to work) with OpenMP
2018-02-11 23:13:18 +01:00
Martin Kroeker
f4b095b1bb
test_fork is not meant (nor expected) to be run with OpenMP
...
Fixes 1456
2018-02-11 20:58:27 +01:00
Martin Kroeker
6940c59a88
Merge pull request #1454 from martin-frbg/issue1452
...
Keep the flag handling separate from the scaling loops in rotmg
2018-02-11 20:48:04 +01:00
Martin Kroeker
42514fa24c
Merge pull request #1449 from martin-frbg/armv8
...
Enable assembly kernels for the generic ARMV8 target and treat CortexA53,A72 as A57
2018-02-11 20:47:49 +01:00
Martin Kroeker
650077074a
Add tests for rotmg
2018-02-10 14:18:21 +01:00
Martin Kroeker
fe16a94fc2
Add rotmg tests for CMAKE MSVC+CLANG build
2018-02-10 14:17:41 +01:00
Martin Kroeker
632b8e0f05
Merge current Makefile from develop
2018-02-10 14:10:55 +01:00
Martin Kroeker
a1bc0fcf07
Resurrect utest for rotmg and add testcase for issue 1452
2018-02-10 12:48:03 +01:00
Martin Kroeker
0464aa6784
Remove debug printfs
2018-02-09 23:06:50 +01:00
Martin Kroeker
55840f0bc9
Keep the flag handling separate from the scaling loops
...
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
2018-02-09 23:00:03 +01:00