Commit Graph

1113 Commits

Author SHA1 Message Date
Martin Kroeker
f9d67bb5e8 Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
2019-02-13 22:06:41 +01:00
Martin Kroeker
729e925174 Merge pull request #1996 from quickwritereader/develop
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
2019-02-04 16:52:04 +01:00
Ubuntu
498ac98581 Note for unused kernels 2019-02-04 15:41:56 +00:00
Ubuntu
cd9ea45463 NBMAX=4096 for gemvn, added sgemvn 8x8 for future 2019-02-04 06:57:11 +00:00
Martin Kroeker
f9c5023e04 Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
2019-02-01 21:04:47 +01:00
Ubuntu
4abc375a91 sgemv cgemv pairs 2019-02-01 13:45:00 +00:00
Martin Kroeker
874df65491 Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
2019-02-01 12:58:59 +01:00
Martin Kroeker
877023e1e1 Fix precision of zarch DSDOT
from patch provided by aarnez in #991
2019-01-31 21:22:26 +01:00
Martin Kroeker
265142edd5 Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
2019-01-31 21:21:40 +01:00
Martin Kroeker
885a3c4350 USE_TRMM on Z14
from patch provided by aarnez in #991
2019-01-31 21:18:09 +01:00
maamountki
82124729af Merge branch 'develop' into z14 2019-01-31 19:36:41 +02:00
maamountki
29416cb5a3 [ZARCH] Add Z13 version for max/min functions 2019-01-31 19:11:11 +02:00
maamountki
48b9b94f7f [ZARCH] Improve loading performance for camax/icamax 2019-01-31 18:52:11 +02:00
Martin Kroeker
86a824c97f Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
2019-01-31 15:27:21 +01:00
Martin Kroeker
808410c2c7 Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
2019-01-31 15:25:15 +01:00
maamountki
fcd814a8d2 [ZARCH] Fix bug in max/min functions 2019-01-29 17:59:38 +02:00
maamountki
dc4d3bccd5 [ZARCH] Fix icamax/icamin 2019-01-29 03:47:49 +02:00
maamountki
c7143c1019 [ZARCH] Fix iamax/imax single precision 2019-01-28 17:52:23 +02:00
maamountki
04873bb174 [ZARCH] Undo the last commit 2019-01-28 17:32:24 +02:00
maamountki
c8ef9fb220 [ZARCH] Fix bug in iamax/iamin/imax/imin 2019-01-28 17:16:18 +02:00
maamountki
b111829226 [ZARCH] Update max/min functions 2019-01-21 15:56:04 +02:00
Martin Kroeker
32b0f1168e Fix declaration of input arguments in the Sandybridge GER microkernels (#1967)
* Tag arguments 0 and 1 as both input and output
2019-01-18 08:11:39 +01:00
Martin Kroeker
b495e54310 Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966)
* Tag arguments 0 and 1 as both input and output (see #1964)
2019-01-18 08:11:07 +01:00
Martin Kroeker
d5e6940253 Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965)
* Tag operands 0 and 1 as both input and output

For #1964 (basically a continuation of coding problems first seen in #1292)
2019-01-17 23:20:32 +01:00
Ubuntu
43a4572038 crot fix 2019-01-17 14:45:31 +00:00
Abdelrauf
a034e65512 Merge branch 'develop' into develop 2019-01-16 19:25:13 +04:00
Ubuntu
8c3386be87 Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
2019-01-16 15:16:21 +00:00
maamountki
b815a04c87 [ZARCH] fix a bug in max/min functions 2019-01-15 21:04:22 +02:00
maamountki
1a7925b3a3 [ZARCH] Update dgemv_n_4.c 2019-01-11 17:43:11 +02:00
maamountki
406f835f00 [ZARCH] update cgemv_n_4.c 2019-01-11 17:39:17 +02:00
maamountki
621dedb37b [ZARCH] Update cgemv_t_4.c 2019-01-11 17:37:11 +02:00
maamountki
b731e8246f Update sgemv_t_4.c 2019-01-11 17:14:04 +02:00
maamountki
ecc31b743f Update dgemv_t_4.c 2019-01-11 17:13:02 +02:00
maamountki
5d89d6b143 [ZARCH] fix sgemv_n_4.c 2019-01-11 17:08:24 +02:00
maamountki
67432b23c2 [ZARCH] fix cgemv_n_4.c 2019-01-11 16:44:46 +02:00
maamountki
be66f5d5c2 [ZARCH] fix data prefetch type in sdot 2019-01-09 16:50:07 +02:00
maamountki
c2ffef8156 [ZARCH] fix data prefetch type in ddot 2019-01-09 16:49:44 +02:00
maamountki
e7455f500c [ZARCH] fix dsdot.c 2019-01-09 16:33:54 +02:00
maamountki
3eafcfa650 [ZARCH] fix cgemv_n_4.c 2019-01-09 07:43:45 +02:00
maamountki
94cd946b96 [ZARCH] fix cgemv_n_4.c 2019-01-04 17:45:56 +02:00
maamountki
1aa840a0a2 [ZARCH] fix sgemv_t_4.c 2019-01-04 01:38:18 +02:00
Arjan van de Ven
795285c587 Fix thinko in skylake beta handling
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
2018-12-24 18:49:50 +00:00
Arjan van de Ven
d321448a63 dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
2018-12-16 23:09:22 +00:00
Arjan van de Ven
c43331ad0a dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
2018-12-16 23:09:17 +00:00
Martin Kroeker
c4e23dd016 Update Makefile 2018-12-16 18:14:40 +01:00
Martin Kroeker
cfc4acc221 typo 2018-12-16 16:19:51 +01:00
Martin Kroeker
545c2b1bbb Add -mavx2 on Haswell only if the compiler supports it 2018-12-16 13:09:19 +01:00
Arjan van de Ven
69d206440a Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support 2018-12-16 00:19:41 +00:00
Martin Kroeker
3843e3e017 use -maxv2 on haswell 2018-12-15 23:30:31 +01:00
Martin Kroeker
fbcb14a74b should be core-avx2 2018-12-15 20:18:59 +01:00