Commit Graph

3078 Commits

Author SHA1 Message Date
Martin Kroeker 2359c7c1a9
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance 
as observed in #730, #901 and most recently #1470
2018-02-24 17:50:13 +01:00
Martin Kroeker 7646974227
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading 2018-02-21 11:45:33 +01:00
Martin Kroeker e3a80e6aa8
Merge pull request #1466 from xianyi/revert-1464-issue1461
Revert "Add locks only for non-OPENMP multithreading"
2018-02-20 17:17:38 +01:00
Martin Kroeker 8866e393a2
Revert "Add locks only for non-OPENMP multithreading" 2018-02-20 17:17:12 +01:00
Martin Kroeker 9e87e6f3d8
Merge pull request #1464 from martin-frbg/issue1461
Add locks only for non-OPENMP multithreading
2018-02-20 14:08:24 +01:00
Martin Kroeker 3119b2ab4c
Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
2018-02-20 12:17:18 +01:00
Martin Kroeker e7366a4161
Restore the remaining utests (#1462)
* Restore the remaining utests

* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well

* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest

* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
2018-02-20 10:07:17 +01:00
Martin Kroeker 6c0f79b787
Merge pull request #1463 from martin-frbg/rotmg2
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
2018-02-18 15:12:42 +01:00
Martin Kroeker 7cd7acf71e
Merge pull request #1460 from martin-frbg/issue1425
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
2018-02-18 15:12:20 +01:00
Martin Kroeker 72f14a0363
Fix conditionals in the rescaling against GAMSQ 2018-02-18 12:54:52 +01:00
Martin Kroeker 53026dc63a
Update single and double precision BLAS1 tests from LAPACK 3.8.0
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
2018-02-18 12:44:14 +01:00
Martin Kroeker 798f1595d5
Fix condition in both second scaling loops 2018-02-18 12:37:09 +01:00
the mslm 2c0a008281 dgemm_ncopy_4_ save/restore 2018-02-18 01:30:17 +00:00
the mslm c5425daa6b power8 ?gemm_tcopy save/restore 2018-02-16 23:36:46 +00:00
Martin Kroeker eaab622f03
Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
To allow compiling with gcc versions older than 4.9
2018-02-14 22:58:14 +01:00
Martin Kroeker 3cda1ce50a
Revert insiduous suppression of the -fopenmp flag in the LAPACK subtree
This was added in #1046 citing a problem with mingw, but in effect it quietly reduces thread safety on all non-Windows platforms (while -fopenmp is already disabled for Windows builds through the toplevel Makefile.system). Removing the filter fixes #1425
2018-02-13 22:44:45 +01:00
Martin Kroeker 0391c07b17
Merge pull request #1457 from martin-frbg/issue1456
test_fork is not meant to be run (nor expected to work) with OpenMP
2018-02-11 23:13:18 +01:00
Martin Kroeker f4b095b1bb
test_fork is not meant (nor expected) to be run with OpenMP
Fixes 1456
2018-02-11 20:58:27 +01:00
Martin Kroeker 6940c59a88
Merge pull request #1454 from martin-frbg/issue1452
Keep the flag handling separate from the scaling loops in rotmg
2018-02-11 20:48:04 +01:00
Martin Kroeker 42514fa24c
Merge pull request #1449 from martin-frbg/armv8
Enable assembly kernels for the generic ARMV8 target and treat CortexA53,A72 as A57
2018-02-11 20:47:49 +01:00
Martin Kroeker 650077074a
Add tests for rotmg 2018-02-10 14:18:21 +01:00
Martin Kroeker fe16a94fc2
Add rotmg tests for CMAKE MSVC+CLANG build 2018-02-10 14:17:41 +01:00
Martin Kroeker 632b8e0f05
Merge current Makefile from develop 2018-02-10 14:10:55 +01:00
Martin Kroeker a1bc0fcf07
Resurrect utest for rotmg and add testcase for issue 1452 2018-02-10 12:48:03 +01:00
Martin Kroeker 0464aa6784
Remove debug printfs 2018-02-09 23:06:50 +01:00
Martin Kroeker 55840f0bc9
Keep the flag handling separate from the scaling loops
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
2018-02-09 23:00:03 +01:00
Martin Kroeker a7485f3222
Merge pull request #1453 from martin-frbg/netlib228
Remove spurious EXTERNAL reference
2018-02-08 22:39:24 +01:00
Martin Kroeker d31f62cf02
Merge pull request #1450 from embray/cygwin/forking
Fix issues with forking on Cygwin
2018-02-08 22:39:11 +01:00
Martin Kroeker 150c7294a6
Remove spurious EXTERNAL reference
From Reference-LAPACK issue 228, remove spurious EXTERNAL reference to unused and nonexistent function xLACGV that could cause linking problems.
2018-02-08 14:57:13 +01:00
Erik M. Bray 8f5f614615 On Cygwin use mmap instead of Windows native allocation functions, which are not fork-safe. 2018-02-06 12:23:27 +01:00
Erik M. Bray f5fc109fbd Perform blas_thread_shutdown with pthread_atfork() on Cygwin
Even if we're directly using the win32 threading driver and not pthreads,
pthread_atfork still works fine to register a pre-fork handler, and is
necessary to restore the threading server to a pre-initialized state.
2018-02-06 12:23:27 +01:00
Erik M. Bray ce2028b425 Rewrite this test to work with ctest and re-enable it on the appropriate platforms (including Cygwin, which has fork()) 2018-02-06 12:23:27 +01:00
Martin Kroeker b47e6822aa
Enable most assembly kernels in the generic ARMV8 target
ref #1439
2018-02-06 11:42:58 +01:00
Martin Kroeker 0ae5e14923
Detect CORTEX A53 and A72 as CORTEXA57 2018-02-06 11:38:18 +01:00
Martin Kroeker 95f2cea45b
Merge pull request #1447 from martin-frbg/sparcfix
Generate CHAR_CORENAME for SPARC
2018-02-02 09:02:49 +01:00
Martin Kroeker e3c50643bb
Fix my copypaste blunder with get_corename 2018-02-01 22:06:04 +01:00
Martin Kroeker efa84afd00
Use get_corename for SPARC as well 2018-02-01 18:20:38 +01:00
Martin Kroeker d1b512a01a
Return a corename for SPARC 2018-02-01 18:15:15 +01:00
Martin Kroeker e0b02789ff
Merge pull request #1445 from quickwritereader/develop
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-02-01 08:28:53 +01:00
Abdelrauf 60596a1abc
Merge branch 'develop' into develop 2018-01-31 16:17:04 -08:00
Abdelrauf afd514c25d small fix inside ifdef z13mvc . (z13mvc code is not used in production) 2018-01-31 18:30:59 -05:00
Martin Kroeker 25c7e3992f
Merge pull request #1443 from martin-frbg/sparcfix
Also #define SPARC in config.h when autodetecting
2018-01-31 23:49:01 +01:00
Martin Kroeker f45776ec1f
Merge pull request #1440 from quickwritereader/develop
small corrections
2018-01-31 23:48:47 +01:00
Martin Kroeker e388459a27
Merge pull request #1419 from brada4/develop
Initialize unitialized values for repeated calls
2018-01-31 23:48:34 +01:00
Martin Kroeker 0ac824f6a5
Also #define SPARC in config.h when autodetecting
Fixes #1442
2018-01-31 22:02:00 +01:00
Abdelrauf f653e7a18d
small fix
small fix inside ifdef z13mvc . (z13mvc code is not used in production)
2018-01-31 07:49:38 -08:00
Martin Kroeker 09e397e4f1
Merge pull request #1434 from xoviat/flang-wall
CMake: Remove unused wall option when FC=flang
2018-01-27 14:26:49 +01:00
Martin Kroeker 913e9546c0
Merge pull request #1436 from martin-frbg/cmaketrmm
Make USE_TRMM depend on TARGET_CORE not TARGET
2018-01-27 10:04:39 +01:00
the mslm f946a89432 zscal (case: real alpha=0 ) mikrokernel shift&mem fix , da_i as input reg. small typo fixes 2018-01-26 19:25:27 -08:00
Martin Kroeker 485df77612
Make USE_TRMM depend on TARGET_CORE not TARGET
Fixes #1432 (and possibly other DTRMM-related failures on Haswell and related architectures when built with cmake)
2018-01-26 23:20:00 +01:00