Martin Kroeker
2359c7c1a9
Use .p2align instead of .align for portability
...
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730 , #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker
7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
2018-02-21 11:45:33 +01:00
Martin Kroeker
e3a80e6aa8
Merge pull request #1466 from xianyi/revert-1464-issue1461
...
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:38 +01:00
Martin Kroeker
8866e393a2
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:12 +01:00
Martin Kroeker
9e87e6f3d8
Merge pull request #1464 from martin-frbg/issue1461
...
Add locks only for non-OPENMP multithreading
2018-02-20 14:08:24 +01:00
Martin Kroeker
3119b2ab4c
Add locks only for non-OPENMP multithreading
...
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Martin Kroeker
e7366a4161
Restore the remaining utests ( #1462 )
...
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Martin Kroeker
6c0f79b787
Merge pull request #1463 from martin-frbg/rotmg2
...
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
2018-02-18 15:12:42 +01:00
Martin Kroeker
7cd7acf71e
Merge pull request #1460 from martin-frbg/issue1425
...
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
2018-02-18 15:12:20 +01:00
Martin Kroeker
72f14a0363
Fix conditionals in the rescaling against GAMSQ
2018-02-18 12:54:52 +01:00
Martin Kroeker
53026dc63a
Update single and double precision BLAS1 tests from LAPACK 3.8.0
...
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
2018-02-18 12:44:14 +01:00
Martin Kroeker
798f1595d5
Fix condition in both second scaling loops
2018-02-18 12:37:09 +01:00
the mslm
2c0a008281
dgemm_ncopy_4_ save/restore
2018-02-18 01:30:17 +00:00
the mslm
c5425daa6b
power8 ?gemm_tcopy save/restore
2018-02-16 23:36:46 +00:00
Martin Kroeker
eaab622f03
Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
...
To allow compiling with gcc versions older than 4.9
2018-02-14 22:58:14 +01:00
Martin Kroeker
3cda1ce50a
Revert insiduous suppression of the -fopenmp flag in the LAPACK subtree
...
This was added in #1046 citing a problem with mingw, but in effect it quietly reduces thread safety on all non-Windows platforms (while -fopenmp is already disabled for Windows builds through the toplevel Makefile.system). Removing the filter fixes #1425
2018-02-13 22:44:45 +01:00
Martin Kroeker
0391c07b17
Merge pull request #1457 from martin-frbg/issue1456
...
test_fork is not meant to be run (nor expected to work) with OpenMP
2018-02-11 23:13:18 +01:00
Martin Kroeker
f4b095b1bb
test_fork is not meant (nor expected) to be run with OpenMP
...
Fixes 1456
2018-02-11 20:58:27 +01:00
Martin Kroeker
6940c59a88
Merge pull request #1454 from martin-frbg/issue1452
...
Keep the flag handling separate from the scaling loops in rotmg
2018-02-11 20:48:04 +01:00
Martin Kroeker
42514fa24c
Merge pull request #1449 from martin-frbg/armv8
...
Enable assembly kernels for the generic ARMV8 target and treat CortexA53,A72 as A57
2018-02-11 20:47:49 +01:00
Martin Kroeker
650077074a
Add tests for rotmg
2018-02-10 14:18:21 +01:00
Martin Kroeker
fe16a94fc2
Add rotmg tests for CMAKE MSVC+CLANG build
2018-02-10 14:17:41 +01:00
Martin Kroeker
632b8e0f05
Merge current Makefile from develop
2018-02-10 14:10:55 +01:00
Martin Kroeker
a1bc0fcf07
Resurrect utest for rotmg and add testcase for issue 1452
2018-02-10 12:48:03 +01:00
Martin Kroeker
0464aa6784
Remove debug printfs
2018-02-09 23:06:50 +01:00
Martin Kroeker
55840f0bc9
Keep the flag handling separate from the scaling loops
...
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
2018-02-09 23:00:03 +01:00
Martin Kroeker
a7485f3222
Merge pull request #1453 from martin-frbg/netlib228
...
Remove spurious EXTERNAL reference
2018-02-08 22:39:24 +01:00
Martin Kroeker
d31f62cf02
Merge pull request #1450 from embray/cygwin/forking
...
Fix issues with forking on Cygwin
2018-02-08 22:39:11 +01:00
Martin Kroeker
150c7294a6
Remove spurious EXTERNAL reference
...
From Reference-LAPACK issue 228, remove spurious EXTERNAL reference to unused and nonexistent function xLACGV that could cause linking problems.
2018-02-08 14:57:13 +01:00
Erik M. Bray
8f5f614615
On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe.
2018-02-06 12:23:27 +01:00
Erik M. Bray
f5fc109fbd
Perform blas_thread_shutdown with pthread_atfork() on Cygwin
...
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Erik M. Bray
ce2028b425
Rewrite this test to work with ctest and re-enable it on the appropriate platforms (including Cygwin, which has fork())
2018-02-06 12:23:27 +01:00
Martin Kroeker
b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
...
ref #1439
2018-02-06 11:42:58 +01:00
Martin Kroeker
0ae5e14923
Detect CORTEX A53 and A72 as CORTEXA57
2018-02-06 11:38:18 +01:00
Martin Kroeker
95f2cea45b
Merge pull request #1447 from martin-frbg/sparcfix
...
Generate CHAR_CORENAME for SPARC
2018-02-02 09:02:49 +01:00
Martin Kroeker
e3c50643bb
Fix my copypaste blunder with get_corename
2018-02-01 22:06:04 +01:00
Martin Kroeker
efa84afd00
Use get_corename for SPARC as well
2018-02-01 18:20:38 +01:00
Martin Kroeker
d1b512a01a
Return a corename for SPARC
2018-02-01 18:15:15 +01:00
Martin Kroeker
e0b02789ff
Merge pull request #1445 from quickwritereader/develop
...
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-02-01 08:28:53 +01:00
Abdelrauf
60596a1abc
Merge branch 'develop' into develop
2018-01-31 16:17:04 -08:00
Abdelrauf
afd514c25d
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 18:30:59 -05:00
Martin Kroeker
25c7e3992f
Merge pull request #1443 from martin-frbg/sparcfix
...
Also #define SPARC in config.h when autodetecting
2018-01-31 23:49:01 +01:00
Martin Kroeker
f45776ec1f
Merge pull request #1440 from quickwritereader/develop
...
small corrections
2018-01-31 23:48:47 +01:00
Martin Kroeker
e388459a27
Merge pull request #1419 from brada4/develop
...
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Martin Kroeker
0ac824f6a5
Also #define SPARC in config.h when autodetecting
...
Fixes #1442
2018-01-31 22:02:00 +01:00
Abdelrauf
f653e7a18d
small fix
...
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
Martin Kroeker
09e397e4f1
Merge pull request #1434 from xoviat/flang-wall
...
CMake: Remove unused wall option when FC=flang
2018-01-27 14:26:49 +01:00
Martin Kroeker
913e9546c0
Merge pull request #1436 from martin-frbg/cmaketrmm
...
Make USE_TRMM depend on TARGET_CORE not TARGET
2018-01-27 10:04:39 +01:00
the mslm
f946a89432
zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes
2018-01-26 19:25:27 -08:00
Martin Kroeker
485df77612
Make USE_TRMM depend on TARGET_CORE not TARGET
...
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
2018-01-26 23:20:00 +01:00