Martin Kroeker
d3e4725548
Merge pull request #2020 from martin-frbg/issue1956
...
With the Intel compiler on Linux, prefer ifort for the final link step
2019-02-15 09:57:59 +01:00
Martin Kroeker
adb419ed67
With the Intel compiler on Linux, prefer ifort for the final link step
...
icc has known problems with mixed-language builds that ifort can handle just fine. Fixes #1956
2019-02-14 22:57:30 +01:00
Martin Kroeker
46e415b140
Save and restore input argument 8 (lda4)
...
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009 )
2019-02-14 22:43:18 +01:00
Martin Kroeker
cd5a59b9cf
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vectorize
...
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
2019-02-14 21:55:11 +01:00
Bart Oldeman
69a97ca7b9
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
...
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
2019-02-14 16:27:58 +00:00
Martin Kroeker
b55c586fac
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function ( #2017 )
...
* Fix missing clobber in blas_quickdivide assembly
2019-02-14 15:21:36 +01:00
Martin Kroeker
056917d616
Merge pull request #2013 from martin-frbg/issue2011
...
Fix invalid memory access in PPC gemm_beta
2019-02-14 09:29:34 +01:00
Martin Kroeker
718efcec6f
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
2019-02-13 22:08:37 +01:00
Martin Kroeker
f9d67bb5e8
Fix out-of-bounds memory access in gemm_beta
...
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
2019-02-13 22:06:41 +01:00
Martin Kroeker
76bb74fcd4
Merge pull request #2012 from maamountki/z14
...
[ZARCH] Many improvements
2019-02-13 20:15:56 +01:00
maamountki
0a54c98b9d
[ZARCH] Modify constraints
2019-02-13 21:06:25 +02:00
maamountki
bec54ae366
[ZARCH] Fix caxpy
2019-02-13 12:54:35 +02:00
Martin Kroeker
63d7bad8a5
Merge pull request #2010 from martin-frbg/issue2009
...
Fix declaration of input arguments in x86_64 GEMV, SYMV and DSCAL
2019-02-12 23:24:02 +01:00
Martin Kroeker
ab1630f9fa
Fix declaration of arguments in inline assembly
...
Argument 0 is modified so should be input and output
2019-02-12 16:14:02 +01:00
Martin Kroeker
b824fa70eb
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
...
Arguments 0 and 1 are both input and output
2019-02-12 16:00:18 +01:00
Martin Kroeker
91481a3e4e
Fix declaration of input arguments in inline assembly
...
Argument 0 is modified as it doubles as a counter
2019-02-12 15:51:43 +01:00
Martin Kroeker
dc6ac9eab0
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
...
Arguments 0 and 1 need to be tagged as both input and output
2019-02-12 15:33:48 +01:00
maamountki
f583674109
[ZARCH] Fix cgemv_t_4
2019-02-12 13:12:28 +02:00
maamountki
77fe70019f
[ZARCH] Fix constraints and source code formatting
2019-02-11 16:01:13 +02:00
Martin Kroeker
03a2bf2602
Fix potential memory leak in cpu enumeration on Linux ( #2008 )
...
* Fix potential memory leak in cpu enumeration with glibc
An early return after a failed call to sched_getaffinity would leak the previously allocated cpu_set_t. Wrong calculation of the size argument in that call increased the likelyhood of that failure. Fixes #2003
2019-02-10 23:24:45 +01:00
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
...
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f
rather than a8002e2
, thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
2019-02-07 20:06:13 +01:00
maamountki
7039770165
[ZARCH] Undo the last commit
2019-02-06 20:11:44 +02:00
Martin Kroeker
641767f846
Merge pull request #2001 from martin-frbg/cmake-dynlist
...
Support DYNAMIC_LIST option in cmake
2019-02-06 08:39:24 +01:00
Martin Kroeker
af6e2253a2
Merge pull request #2000 from martin-frbg/issue1989
...
Make c_check robust against old or incomplete perl installations
2019-02-06 00:29:30 +01:00
Martin Kroeker
5952e586ce
Support DYNAMIC_LIST option in cmake
...
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker
f10408aae8
Merge pull request #1999 from martin-frbg/issue1996-2
...
fix second instance of complex.h for c++ as well
2019-02-05 22:02:11 +01:00
Martin Kroeker
d70ae3ab43
Make c_check robust against old or incomplete perl installations
...
by catching and working around failures to load modules, and avoiding object-oriented syntax in tempfile creation.
Fixes #1989
2019-02-05 20:06:34 +01:00
Martin Kroeker
1391fc46d2
fix second instance of complex.h for c++ as well
2019-02-05 19:29:33 +01:00
maamountki
11a43e8116
[ZARCH] Set alignment hint for vl/vst
2019-02-05 19:17:08 +02:00
Martin Kroeker
817fe9865c
Merge pull request #1998 from martin-frbg/issue1992
...
Include complex rather than complex.h in C++ contexts
2019-02-05 17:39:59 +01:00
Martin Kroeker
f4b82d7bc4
Include complex rather than complex.h in C++ contexts
...
to avoid name clashes e.g. with boost headers that use I as a generic placeholder.
Fixes #1992 as suggested by aprokop in that issue ticket.
2019-02-05 13:30:13 +01:00
maamountki
61526480f9
[ZARCH] Fix copy constraint
2019-02-05 07:51:19 +02:00
maamountki
81daf6bc38
[ZARCH] Format source code, Fix constraints
2019-02-05 07:30:38 +02:00
maamountki
a38aa56e76
Merge pull request #1 from xianyi/develop
...
Update
2019-02-05 07:25:38 +02:00
Martin Kroeker
729e925174
Merge pull request #1996 from quickwritereader/develop
...
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 16:52:04 +01:00
Ubuntu
498ac98581
Note for unused kernels
2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 06:57:11 +00:00
Martin Kroeker
f9c5023e04
Merge pull request #1994 from quickwritereader/develop
...
sgemv cgemv pairs
2019-02-01 21:04:47 +01:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Martin Kroeker
874df65491
Fix incorrect sgemv results for IBM z14
...
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker
1f4b61f572
Delete misplaced file sgemv_t_4.c
...
from #1993 , file should have gone into kernel/zarch
2019-02-01 12:57:01 +01:00
Martin Kroeker
282230c303
Merge pull request #1993 from martin-frbg/aarnes-zarch
...
Various fixes for the new Z14 target
2019-01-31 21:27:00 +01:00
Martin Kroeker
cce574c3e0
Improve the z14 SGEMVT kernel
...
from patch provided by aarnez in #991
2019-01-31 21:24:55 +01:00
Martin Kroeker
877023e1e1
Fix precision of zarch DSDOT
...
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker
265142edd5
Fix typo in the zarch min/max kernels
...
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
Martin Kroeker
885a3c4350
USE_TRMM on Z14
...
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
Martin Kroeker
4b512f84dd
Add cache sizes for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:16:44 +01:00
Martin Kroeker
72d3e7c9b4
Add FORCE Z14
...
from patch provided by aarnez in #991
2019-01-31 21:15:50 +01:00
Martin Kroeker
bdc73a49e0
Add parameters for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:14:37 +01:00
Martin Kroeker
1249ee1fd0
Add Z14 target
...
from patch provided by aarnez in #991
2019-01-31 21:13:46 +01:00