Martin Kroeker
|
6c83b878f6
|
Merge pull request #2040 from martin-frbg/locks2002
Restore locking optimizations for OpenMP case
|
2019-03-04 15:07:14 +01:00 |
maomao194313
|
fb4dae7124
|
add TARGET support for HiSilicon tsv110 CPUs
|
2019-03-04 16:48:49 +08:00 |
maomao194313
|
760842dda1
|
add TARGET support for HiSilicon tsv110 CPUs
|
2019-03-04 16:45:22 +08:00 |
maomao194313
|
53f482ee72
|
add TARGET support for HiSilicon tsv110 CPUs
|
2019-03-04 16:41:21 +08:00 |
maomao194313
|
783ba8058f
|
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
|
2019-03-04 16:30:50 +08:00 |
Martin Kroeker
|
af480b02a4
|
Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
|
2019-03-03 14:17:07 +01:00 |
Andrew
|
e4a79be6bb
|
address warning introed with #1814 et al
|
2019-03-03 09:05:11 +02:00 |
Andrew
|
e5c316c6b9
|
init
|
2019-03-03 08:59:27 +02:00 |
Martin Kroeker
|
25427926bc
|
Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
|
2019-03-02 23:36:36 +01:00 |
Martin Kroeker
|
edb8143141
|
Merge pull request #2037 from martin-frbg/issue2033-2
Make sure that AVX512 is disabled in 32bit builds
|
2019-03-01 11:45:02 +01:00 |
Martin Kroeker
|
c4868d11c0
|
Make sure that AVX512 is disabled in 32bit builds
for #2033
|
2019-03-01 09:23:03 +01:00 |
Martin Kroeker
|
4c321ae571
|
Merge pull request #2034 from martin-frbg/issue2033
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
|
2019-02-28 22:10:12 +01:00 |
Martin Kroeker
|
2ffb727187
|
Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
|
2019-02-28 10:51:54 +01:00 |
Martin Kroeker
|
d66214c946
|
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
|
2019-02-28 09:58:25 +01:00 |
Martin Kroeker
|
fd34820b99
|
Fix AVX512 test always returning false due to missing compiler option
|
2019-02-25 17:58:31 +01:00 |
Martin Kroeker
|
918a0cc4d1
|
Fix missing -c option in AVX512 test
|
2019-02-25 17:55:36 +01:00 |
Martin Kroeker
|
0db9c03e7e
|
Merge pull request #2028 from brada4/mv
Move one of clobber fixes to right place
|
2019-02-24 19:50:23 +01:00 |
Andrew
|
6eee1beac5
|
move fix to right place
|
2019-02-24 20:41:02 +02:00 |
Andrew
|
e5df5958cc
|
init
|
2019-02-24 20:39:25 +02:00 |
Martin Kroeker
|
343b301d14
|
Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
|
2019-02-20 10:27:48 +01:00 |
Martin Kroeker
|
45333d5793
|
Fix error introduced during cleanup
|
2019-02-19 22:16:33 +01:00 |
Martin Kroeker
|
e29b0cfcc4
|
Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
|
2019-02-19 21:03:30 +01:00 |
Martin Kroeker
|
78d9910236
|
Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
|
2019-02-19 20:59:48 +01:00 |
Martin Kroeker
|
e12cdf58ef
|
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
|
2019-02-17 11:49:15 +01:00 |
Martin Kroeker
|
1860c9456d
|
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
|
2019-02-17 11:48:57 +01:00 |
Martin Kroeker
|
aec905498f
|
Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
|
2019-02-17 11:36:04 +01:00 |
TiborGY
|
56089991e2
|
fix the the
|
2019-02-16 23:26:13 +01:00 |
Martin Kroeker
|
f9bb76d29a
|
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
|
2019-02-16 20:06:48 +01:00 |
Martin Kroeker
|
8242b1fe3f
|
Fix inline assembly constraints
|
2019-02-16 18:51:09 +01:00 |
Martin Kroeker
|
efb9038f72
|
Fix inline assembly constraints
|
2019-02-16 18:46:17 +01:00 |
Martin Kroeker
|
e976557d29
|
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
|
2019-02-16 18:36:39 +01:00 |
Martin Kroeker
|
9d8be15789
|
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
|
2019-02-16 18:24:11 +01:00 |
Martin Kroeker
|
d752799a0f
|
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
|
2019-02-16 18:05:40 +01:00 |
TiborGY
|
f209fc7fa9
|
Update Makefile.rule
add note about NUM_THREADS for package maintainers, add examples of programs that cause affinity troubles
|
2019-02-16 12:12:39 +01:00 |
Martin Kroeker
|
c26c0b77a7
|
Fix wrong constraints in inline assembly
for #2009
|
2019-02-15 15:08:16 +01:00 |
Martin Kroeker
|
1c6da2d03c
|
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
|
2019-02-15 15:02:54 +01:00 |
Martin Kroeker
|
4255a58cd2
|
Rename operands to put lda on the input/output constraint list
|
2019-02-15 10:10:04 +01:00 |
Martin Kroeker
|
d3e4725548
|
Merge pull request #2020 from martin-frbg/issue1956
With the Intel compiler on Linux, prefer ifort for the final link step
|
2019-02-15 09:57:59 +01:00 |
Martin Kroeker
|
adb419ed67
|
With the Intel compiler on Linux, prefer ifort for the final link step
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
|
2019-02-14 22:57:30 +01:00 |
Martin Kroeker
|
46e415b140
|
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
|
2019-02-14 22:43:18 +01:00 |
Martin Kroeker
|
cd5a59b9cf
|
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
|
2019-02-14 21:55:11 +01:00 |
Bart Oldeman
|
69a97ca7b9
|
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
|
2019-02-14 16:27:58 +00:00 |
Martin Kroeker
|
b55c586fac
|
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function (#2017)
* Fix missing clobber in blas_quickdivide assembly
|
2019-02-14 15:21:36 +01:00 |
Martin Kroeker
|
056917d616
|
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
|
2019-02-14 09:29:34 +01:00 |
Martin Kroeker
|
718efcec6f
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
|
2019-02-13 22:08:37 +01:00 |
Martin Kroeker
|
f9d67bb5e8
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
|
2019-02-13 22:06:41 +01:00 |
Martin Kroeker
|
76bb74fcd4
|
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
|
2019-02-13 20:15:56 +01:00 |
maamountki
|
0a54c98b9d
|
[ZARCH] Modify constraints
|
2019-02-13 21:06:25 +02:00 |
maamountki
|
bec54ae366
|
[ZARCH] Fix caxpy
|
2019-02-13 12:54:35 +02:00 |
Martin Kroeker
|
63d7bad8a5
|
Merge pull request #2010 from martin-frbg/issue2009
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
|
2019-02-12 23:24:02 +01:00 |