Martin Kroeker
f9bb76d29a
Fix inline assembly constraints in Bulldozer TRSM kernels
...
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
2019-02-16 20:06:48 +01:00
Martin Kroeker
69edc5bbe7
Restore dropped patches in the non-TLS branch of memory.c ( #2004 )
...
* Restore dropped patches in the non-TLS branch of memory.c
As discovered in #2002 , the reintroduction of the "original" non-TLS version of memory.c as an alternate branch had inadvertently used ba1f91f rather than a8002e2 , thereby dropping the commits for #1450 , #1468 , #1501 , #1504 and #1520 .
2019-02-07 20:06:13 +01:00
Martin Kroeker
641767f846
Merge pull request #2001 from martin-frbg/cmake-dynlist
...
Support DYNAMIC_LIST option in cmake
2019-02-06 08:39:24 +01:00
Martin Kroeker
af6e2253a2
Merge pull request #2000 from martin-frbg/issue1989
...
Make c_check robust against old or incomplete perl installations
2019-02-06 00:29:30 +01:00
Martin Kroeker
5952e586ce
Support DYNAMIC_LIST option in cmake
...
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker
f10408aae8
Merge pull request #1999 from martin-frbg/issue1996-2
...
fix second instance of complex.h for c++ as well
2019-02-05 22:02:11 +01:00
Martin Kroeker
d70ae3ab43
Make c_check robust against old or incomplete perl installations
...
by catching and working around failures to load modules, and avoiding object-oriented syntax in tempfile creation.
Fixes #1989
2019-02-05 20:06:34 +01:00
Martin Kroeker
1391fc46d2
fix second instance of complex.h for c++ as well
2019-02-05 19:29:33 +01:00
Martin Kroeker
817fe9865c
Merge pull request #1998 from martin-frbg/issue1992
...
Include complex rather than complex.h in C++ contexts
2019-02-05 17:39:59 +01:00
Martin Kroeker
f4b82d7bc4
Include complex rather than complex.h in C++ contexts
...
to avoid name clashes e.g. with boost headers that use I as a generic placeholder.
Fixes #1992 as suggested by aprokop in that issue ticket.
2019-02-05 13:30:13 +01:00
Martin Kroeker
729e925174
Merge pull request #1996 from quickwritereader/develop
...
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 16:52:04 +01:00
Ubuntu
498ac98581
Note for unused kernels
2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 06:57:11 +00:00
Martin Kroeker
f9c5023e04
Merge pull request #1994 from quickwritereader/develop
...
sgemv cgemv pairs
2019-02-01 21:04:47 +01:00
Ubuntu
4abc375a91
sgemv cgemv pairs
2019-02-01 13:45:00 +00:00
Martin Kroeker
874df65491
Fix incorrect sgemv results for IBM z14
...
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker
1f4b61f572
Delete misplaced file sgemv_t_4.c
...
from #1993 , file should have gone into kernel/zarch
2019-02-01 12:57:01 +01:00
Martin Kroeker
282230c303
Merge pull request #1993 from martin-frbg/aarnes-zarch
...
Various fixes for the new Z14 target
2019-01-31 21:27:00 +01:00
Martin Kroeker
cce574c3e0
Improve the z14 SGEMVT kernel
...
from patch provided by aarnez in #991
2019-01-31 21:24:55 +01:00
Martin Kroeker
877023e1e1
Fix precision of zarch DSDOT
...
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker
265142edd5
Fix typo in the zarch min/max kernels
...
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
Martin Kroeker
885a3c4350
USE_TRMM on Z14
...
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
Martin Kroeker
4b512f84dd
Add cache sizes for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:16:44 +01:00
Martin Kroeker
72d3e7c9b4
Add FORCE Z14
...
from patch provided by aarnez in #991
2019-01-31 21:15:50 +01:00
Martin Kroeker
bdc73a49e0
Add parameters for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:14:37 +01:00
Martin Kroeker
1249ee1fd0
Add Z14 target
...
from patch provided by aarnez in #991
2019-01-31 21:13:46 +01:00
Martin Kroeker
42df9efa0c
Merge pull request #1991 from maamountki/z14
...
[ZARCH] Z14 Support, BLAS 1/2 single precision implementations
2019-01-31 19:10:03 +01:00
maamountki
82124729af
Merge branch 'develop' into z14
2019-01-31 19:36:41 +02:00
maamountki
29416cb5a3
[ZARCH] Add Z13 version for max/min functions
2019-01-31 19:11:11 +02:00
maamountki
48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax
2019-01-31 18:52:11 +02:00
Martin Kroeker
86a824c97f
Fix wrong comparison that made IMIN identical to IMAX
...
as reported by aarnez in #1990
2019-01-31 15:27:21 +01:00
Martin Kroeker
808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
...
as suggested in #1990
2019-01-31 15:25:15 +01:00
maamountki
eaf20f0e7a
Remove ztest
2019-01-31 09:26:50 +02:00
maamountki
fcd814a8d2
[ZARCH] Fix bug in max/min functions
2019-01-29 17:59:38 +02:00
maamountki
dc4d3bccd5
[ZARCH] Fix icamax/icamin
2019-01-29 03:47:49 +02:00
maamountki
c7143c1019
[ZARCH] Fix iamax/imax single precision
2019-01-28 17:52:23 +02:00
maamountki
04873bb174
[ZARCH] Undo the last commit
2019-01-28 17:32:24 +02:00
maamountki
c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin
2019-01-28 17:16:18 +02:00
Martin Kroeker
5be61f4b47
Merge pull request #1985 from martin-frbg/issue1984
...
Correct naming of getrf_parallel object
2019-01-28 15:44:57 +01:00
Martin Kroeker
3d155cff83
Merge pull request #1981 from edisongustavo/develop
...
Fix include directory of exported targets
2019-01-28 15:44:42 +01:00
Martin Kroeker
7d47f0a82d
Merge pull request #1978 from danielgindi/feature/msvc_cmake
...
Better support for MSVC/Windows in CMake (v0.3.x)
2019-01-28 15:43:35 +01:00
Martin Kroeker
a529c71a74
Merge pull request #1962 from brada4/r
...
Modrenize R benchmarks slightly
2019-01-28 15:42:57 +01:00
Martin Kroeker
89b60dab8a
Merge pull request #1987 from martin-frbg/issue1961
...
Change ARMV8 target with BINARY=32 to ARMV7 automatically
2019-01-26 22:25:29 +01:00
Martin Kroeker
58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32
2019-01-26 17:52:33 +01:00
Martin Kroeker
36b844af88
Change ARMV8 target to ARMV7 when BINARY32 is set
...
fixes #1961
2019-01-26 17:47:22 +01:00
Martin Kroeker
e882b239aa
Correct naming of getrf_parallel object
...
fixes #1984
2019-01-26 00:45:45 +01:00
Martin Kroeker
3f7bb87a2a
Merge pull request #1971 from martin-frbg/trsm-threshold
...
Shift transition to multithreading towards larger matrix sizes
2019-01-24 09:17:48 +01:00
Edison Gustavo Muenz
e908ac2a51
Fix include directory of exported targets
2019-01-23 15:09:13 +01:00
Martin Kroeker
8533aca964
Avoid penalizing tall skinny matrices
2019-01-23 10:03:00 +01:00
Martin Kroeker
16494cb7c4
Merge pull request #1980 from martin-frbg/issue1979
...
Report SkylakeX as Haswell if compiler does not support AVX512
2019-01-22 21:10:38 +01:00