Commit Graph

7452 Commits

Author SHA1 Message Date
Ubuntu 498ac98581 Note for unused kernels 2019-02-04 15:41:56 +00:00
Ubuntu cd9ea45463 NBMAX=4096 for gemvn, added sgemvn 8x8 for future 2019-02-04 06:57:11 +00:00
Martin Kroeker f9c5023e04
Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
2019-02-01 21:04:47 +01:00
Ubuntu 4abc375a91 sgemv cgemv pairs 2019-02-01 13:45:00 +00:00
Martin Kroeker 874df65491
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker 1f4b61f572
Delete misplaced file sgemv_t_4.c
from #1993 , file should have gone into kernel/zarch
2019-02-01 12:57:01 +01:00
Martin Kroeker 282230c303
Merge pull request #1993 from martin-frbg/aarnes-zarch
Various fixes for the new Z14 target
2019-01-31 21:27:00 +01:00
Martin Kroeker cce574c3e0
Improve the z14 SGEMVT kernel
from patch provided by aarnez in #991
2019-01-31 21:24:55 +01:00
Martin Kroeker 877023e1e1
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker 265142edd5
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
Martin Kroeker 885a3c4350
USE_TRMM on Z14
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
Martin Kroeker 4b512f84dd
Add cache sizes for Z14
from patch provided by aarnez in #991
2019-01-31 21:16:44 +01:00
Martin Kroeker 72d3e7c9b4
Add FORCE Z14
from patch provided by aarnez in #991
2019-01-31 21:15:50 +01:00
Martin Kroeker bdc73a49e0
Add parameters for Z14
from patch provided by aarnez in #991
2019-01-31 21:14:37 +01:00
Martin Kroeker 1249ee1fd0
Add Z14 target
from patch provided by aarnez in #991
2019-01-31 21:13:46 +01:00
Martin Kroeker 42df9efa0c
Merge pull request #1991 from maamountki/z14
[ZARCH] Z14 Support, BLAS 1/2 single precision implementations
2019-01-31 19:10:03 +01:00
maamountki 82124729af
Merge branch 'develop' into z14 2019-01-31 19:36:41 +02:00
maamountki 29416cb5a3
[ZARCH] Add Z13 version for max/min functions 2019-01-31 19:11:11 +02:00
maamountki 48b9b94f7f
[ZARCH] Improve loading performance for camax/icamax 2019-01-31 18:52:11 +02:00
Martin Kroeker 86a824c97f
Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
2019-01-31 15:27:21 +01:00
Martin Kroeker 808410c2c7
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
2019-01-31 15:25:15 +01:00
maamountki eaf20f0e7a Remove ztest 2019-01-31 09:26:50 +02:00
maamountki fcd814a8d2
[ZARCH] Fix bug in max/min functions 2019-01-29 17:59:38 +02:00
maamountki dc4d3bccd5
[ZARCH] Fix icamax/icamin 2019-01-29 03:47:49 +02:00
maamountki c7143c1019
[ZARCH] Fix iamax/imax single precision 2019-01-28 17:52:23 +02:00
maamountki 04873bb174
[ZARCH] Undo the last commit 2019-01-28 17:32:24 +02:00
maamountki c8ef9fb220
[ZARCH] Fix bug in iamax/iamin/imax/imin 2019-01-28 17:16:18 +02:00
Martin Kroeker 5be61f4b47
Merge pull request #1985 from martin-frbg/issue1984
Correct naming of getrf_parallel object
2019-01-28 15:44:57 +01:00
Martin Kroeker 3d155cff83
Merge pull request #1981 from edisongustavo/develop
Fix include directory of exported targets
2019-01-28 15:44:42 +01:00
Martin Kroeker 7d47f0a82d
Merge pull request #1978 from danielgindi/feature/msvc_cmake
Better support for MSVC/Windows in CMake (v0.3.x)
2019-01-28 15:43:35 +01:00
Martin Kroeker a529c71a74
Merge pull request #1962 from brada4/r
Modrenize R benchmarks slightly
2019-01-28 15:42:57 +01:00
TiborGY ea1716ce2a
Update Makefile.rule
Revert generate to install, explain the nature of the affinity conflict
2019-01-27 17:22:26 +01:00
TiborGY 0f24b39ebf
Reword/expand comments in Makefile.rule
Lots of small changes in the wording of the comments, plus an expansion of the NUM_THREADS and NO_AFFINITY sections.
2019-01-27 15:33:00 +01:00
Martin Kroeker 89b60dab8a
Merge pull request #1987 from martin-frbg/issue1961
Change ARMV8 target with BINARY=32 to ARMV7 automatically
2019-01-26 22:25:29 +01:00
Martin Kroeker 58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32 2019-01-26 17:52:33 +01:00
Martin Kroeker 36b844af88
Change ARMV8 target to ARMV7 when BINARY32 is set
fixes #1961
2019-01-26 17:47:22 +01:00
Martin Kroeker e882b239aa
Correct naming of getrf_parallel object
fixes #1984
2019-01-26 00:45:45 +01:00
Martin Kroeker 3f7bb87a2a
Merge pull request #1971 from martin-frbg/trsm-threshold
Shift transition to multithreading towards larger matrix sizes
2019-01-24 09:17:48 +01:00
Edison Gustavo Muenz e908ac2a51 Fix include directory of exported targets 2019-01-23 15:09:13 +01:00
Martin Kroeker 8533aca964
Avoid penalizing tall skinny matrices 2019-01-23 10:03:00 +01:00
Martin Kroeker 16494cb7c4
Merge pull request #1980 from martin-frbg/issue1979
Report SkylakeX as Haswell if compiler does not support AVX512
2019-01-22 21:10:38 +01:00
Martin Kroeker b56b34a75c
Syntax fix 2019-01-22 18:55:43 +01:00
Martin Kroeker 21eda8b577
Report SkylakeX as Haswell if compiler does not support AVX512
... or make was invoked with NO_AVX512=1
2019-01-22 18:47:12 +01:00
Daniel Cohen Gindi 24288803b3 Adjust test script for correct deployment 2019-01-22 14:38:01 +02:00
Martin Kroeker f0d834b824
Use VERSION_LESS for comparisons involving software version numbers 2019-01-22 12:32:24 +01:00
Daniel Cohen Gindi 63bbd7b0d7 Better support for MSVC/Windows in CMake 2019-01-21 17:47:47 +02:00
maamountki b111829226
[ZARCH] Update max/min functions 2019-01-21 15:56:04 +02:00
Martin Kroeker 010d59bfee
Merge pull request #1973 from martin-frbg/issue1464
Increase Zen SWITCH_RATIO to 16
2019-01-20 20:30:11 +01:00
Martin Kroeker 83b5c6b92d
Fix compilation with NO_AVX=1 set
fixes #1974
2019-01-20 12:18:53 +01:00
Martin Kroeker bbfdd6c0fe
Increase Zen SWITCH_RATIO to 16
following GEMM benchmarks on Ryzen2700X. For #1464
2019-01-19 23:01:31 +01:00