Commit Graph

3567 Commits

Author SHA1 Message Date
Martin Kroeker
4741ce803b Merge pull request #2045 from martin-frbg/2033-3
Do not compile in AVX512 check if AVX support is disabled
2019-03-06 22:40:26 +01:00
Martin Kroeker
11cfd0bd75 Do not compile in AVX512 check if AVX support is disabled
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
2019-03-05 16:04:25 +01:00
Martin Kroeker
651ab01d2b Merge pull request #2044 from martin-frbg/issue2043
Fix module definition conflicts between LAPACK and ReLAPACK
2019-03-05 12:11:32 +01:00
Martin Kroeker
d7b2c53c0b Merge pull request #2039 from brada4/meminit
Address warning in memory.c
2019-03-05 12:11:15 +01:00
Martin Kroeker
e4864a8933 Fix module definition conflicts between LAPACK and ReLAPACK
for #2043
2019-03-04 21:17:08 +01:00
Martin Kroeker
10d841d8b9 Merge pull request #2026 from martin-frbg/trmv_threads
Correct range limiting in trmv_thread and re-enable TRMV multithreading
2019-03-04 15:08:31 +01:00
Martin Kroeker
12f2b76748 Merge pull request #2038 from martin-frbg/issue2035
Improve handling of NO_STATIC and NO_SHARED
2019-03-04 15:07:48 +01:00
Martin Kroeker
6c83b878f6 Merge pull request #2040 from martin-frbg/locks2002
Restore locking optimizations for OpenMP case
2019-03-04 15:07:14 +01:00
Martin Kroeker
af480b02a4 Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
2019-03-03 14:17:07 +01:00
Andrew
e4a79be6bb address warning introed with #1814 et al 2019-03-03 09:05:11 +02:00
Andrew
e5c316c6b9 init 2019-03-03 08:59:27 +02:00
Martin Kroeker
25427926bc Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
2019-03-02 23:36:36 +01:00
Martin Kroeker
edb8143141 Merge pull request #2037 from martin-frbg/issue2033-2
Make sure that AVX512 is disabled in 32bit builds
2019-03-01 11:45:02 +01:00
Martin Kroeker
c4868d11c0 Make sure that AVX512 is disabled in 32bit builds
for #2033
2019-03-01 09:23:03 +01:00
Martin Kroeker
4c321ae571 Merge pull request #2034 from martin-frbg/issue2033
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
2019-02-28 22:10:12 +01:00
Martin Kroeker
2ffb727187 Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
2019-02-28 10:51:54 +01:00
Martin Kroeker
d66214c946 Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
2019-02-28 09:58:25 +01:00
Martin Kroeker
fd34820b99 Fix AVX512 test always returning false due to missing compiler option 2019-02-25 17:58:31 +01:00
Martin Kroeker
918a0cc4d1 Fix missing -c option in AVX512 test 2019-02-25 17:55:36 +01:00
Martin Kroeker
0db9c03e7e Merge pull request #2028 from brada4/mv
Move one of clobber fixes to right place
2019-02-24 19:50:23 +01:00
Andrew
6eee1beac5 move fix to right place 2019-02-24 20:41:02 +02:00
Andrew
e5df5958cc init 2019-02-24 20:39:25 +02:00
Martin Kroeker
343b301d14 Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
2019-02-20 10:27:48 +01:00
Martin Kroeker
45333d5793 Fix error introduced during cleanup 2019-02-19 22:16:33 +01:00
Martin Kroeker
e29b0cfcc4 Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
2019-02-19 21:03:30 +01:00
Martin Kroeker
78d9910236 Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
2019-02-19 20:59:48 +01:00
Martin Kroeker
e12cdf58ef Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
2019-02-17 11:49:15 +01:00
Martin Kroeker
1860c9456d Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
2019-02-17 11:48:57 +01:00
Martin Kroeker
aec905498f Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
2019-02-17 11:36:04 +01:00
TiborGY
56089991e2 fix the the 2019-02-16 23:26:13 +01:00
Martin Kroeker
f9bb76d29a Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
2019-02-16 20:06:48 +01:00
Martin Kroeker
8242b1fe3f Fix inline assembly constraints 2019-02-16 18:51:09 +01:00
Martin Kroeker
efb9038f72 Fix inline assembly constraints 2019-02-16 18:46:17 +01:00
Martin Kroeker
e976557d29 Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
2019-02-16 18:36:39 +01:00
Martin Kroeker
9d8be15789 Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
2019-02-16 18:24:11 +01:00
Martin Kroeker
d752799a0f Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
2019-02-16 18:05:40 +01:00
TiborGY
f209fc7fa9 Update Makefile.rule
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
2019-02-16 12:12:39 +01:00
Martin Kroeker
c26c0b77a7 Fix wrong constraints in inline assembly
for #2009
2019-02-15 15:08:16 +01:00
Martin Kroeker
1c6da2d03c Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
2019-02-15 15:02:54 +01:00
Martin Kroeker
4255a58cd2 Rename operands to put lda on the input/output constraint list 2019-02-15 10:10:04 +01:00
Martin Kroeker
d3e4725548 Merge pull request #2020 from martin-frbg/issue1956
With the Intel compiler on Linux, prefer ifort for the final link step
2019-02-15 09:57:59 +01:00
Martin Kroeker
adb419ed67 With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
2019-02-14 22:57:30 +01:00
Martin Kroeker
46e415b140 Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
2019-02-14 22:43:18 +01:00
Martin Kroeker
cd5a59b9cf Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
2019-02-14 21:55:11 +01:00
Bart Oldeman
69a97ca7b9 dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2

See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
2019-02-14 16:27:58 +00:00
Martin Kroeker
b55c586fac Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function (#2017)
* Fix missing clobber in blas_quickdivide assembly
2019-02-14 15:21:36 +01:00
Martin Kroeker
056917d616 Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
2019-02-14 09:29:34 +01:00
Martin Kroeker
718efcec6f Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
2019-02-13 22:08:37 +01:00
Martin Kroeker
f9d67bb5e8 Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
2019-02-13 22:06:41 +01:00
Martin Kroeker
76bb74fcd4 Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
2019-02-13 20:15:56 +01:00