Martin Kroeker
|
246ca29679
|
Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
|
2019-03-30 22:49:05 +01:00 |
Martin Kroeker
|
9d717cb5ee
|
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
2019-03-30 22:27:04 +01:00 |
Martin Kroeker
|
e3bc83f2a8
|
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
2019-03-30 22:26:10 +01:00 |
Martin Kroeker
|
70f2a4e0d7
|
Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
|
2019-03-30 22:25:06 +01:00 |
Martin Kroeker
|
706dfe263b
|
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
|
2019-03-30 22:23:42 +01:00 |
Martin Kroeker
|
688fa9201c
|
Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
|
2019-03-30 22:22:15 +01:00 |
Martin Kroeker
|
cdbe0f0235
|
Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
|
2019-03-30 22:20:14 +01:00 |
Martin Kroeker
|
f8b82bc6dc
|
Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
|
2019-03-30 22:18:03 +01:00 |
Martin Kroeker
|
3e3ccb9011
|
Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
|
2019-03-30 22:13:36 +01:00 |
Martin Kroeker
|
94ab4e6fb2
|
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
|
2019-03-30 22:11:38 +01:00 |
Martin Kroeker
|
c3cfc6986b
|
Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
|
2019-03-30 22:05:11 +01:00 |
Martin Kroeker
|
b9f4943a14
|
Add ?sum
|
2019-03-30 22:01:13 +01:00 |
Martin Kroeker
|
32c7063cb0
|
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
|
2019-03-30 21:21:38 +01:00 |
Martin Kroeker
|
7c51cc8527
|
Merge branch 'develop' into develop
|
2019-03-29 19:36:29 +01:00 |
AbdelRauf
|
853a18bc17
|
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
|
2019-03-29 15:49:40 +00:00 |
Martin Kroeker
|
e608d4f7fe
|
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
|
2019-03-13 22:10:28 +01:00 |
Martin Kroeker
|
03d7110900
|
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
|
2019-03-12 22:57:39 +01:00 |
Martin Kroeker
|
f18ab6c17b
|
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
|
2019-03-09 16:39:35 +01:00 |
Martin Kroeker
|
5b95534afc
|
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
|
2019-03-09 11:21:16 +01:00 |
Celelibi
|
b7f59da42d
|
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047.
Signed-off-by: Celelibi <celelibi@gmail.com>
|
2019-03-07 16:55:13 +01:00 |
maomao194313
|
783ba8058f
|
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
|
2019-03-04 16:30:50 +08:00 |
Andrew
|
6eee1beac5
|
move fix to right place
|
2019-02-24 20:41:02 +02:00 |
Martin Kroeker
|
e12cdf58ef
|
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
|
2019-02-17 11:49:15 +01:00 |
Martin Kroeker
|
1860c9456d
|
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
|
2019-02-17 11:48:57 +01:00 |
Martin Kroeker
|
f9bb76d29a
|
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
|
2019-02-16 20:06:48 +01:00 |
Martin Kroeker
|
efb9038f72
|
Fix inline assembly constraints
|
2019-02-16 18:46:17 +01:00 |
Martin Kroeker
|
e976557d29
|
Fix inline assembly constraints
rework indices to allow marking argument lda as input and output.
|
2019-02-16 18:36:39 +01:00 |
Martin Kroeker
|
9d8be15789
|
Fix inline assembly constraints
rework indices to allow marking argument lda4 as input and output. For #2009
|
2019-02-16 18:24:11 +01:00 |
Martin Kroeker
|
d752799a0f
|
Merge pull request #2021 from martin-frbg/gcc9fixes2
Fix wrong constraints in inline assembly of Haswell DTRSM kernel
|
2019-02-16 18:05:40 +01:00 |
Martin Kroeker
|
c26c0b77a7
|
Fix wrong constraints in inline assembly
for #2009
|
2019-02-15 15:08:16 +01:00 |
Martin Kroeker
|
1c6da2d03c
|
Merge pull request #2019 from martin-frbg/gcc9fixes
Fix unannounced modification of input operand 8 (lda4) in Haswell GEMVN microkernel
|
2019-02-15 15:02:54 +01:00 |
Martin Kroeker
|
4255a58cd2
|
Rename operands to put lda on the input/output constraint list
|
2019-02-15 10:10:04 +01:00 |
Martin Kroeker
|
46e415b140
|
Save and restore input argument 8 (lda4)
Fixes miscompilation with gcc9 -ftree-vectorize (related to issue #2009)
|
2019-02-14 22:43:18 +01:00 |
Bart Oldeman
|
69a97ca7b9
|
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2
See also:
https://github.com/easybuilders/easybuild-easyconfigs/issues/7180
|
2019-02-14 16:27:58 +00:00 |
Martin Kroeker
|
056917d616
|
Merge pull request #2013 from martin-frbg/issue2011
Fix invalid memory access in PPC gemm_beta
|
2019-02-14 09:29:34 +01:00 |
Martin Kroeker
|
718efcec6f
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq), assuming typo by K.Goto
|
2019-02-13 22:08:37 +01:00 |
Martin Kroeker
|
f9d67bb5e8
|
Fix out-of-bounds memory access in gemm_beta
Fixes #2011 (as suggested by davemq) presuming typo by K.Goto
|
2019-02-13 22:06:41 +01:00 |
Martin Kroeker
|
76bb74fcd4
|
Merge pull request #2012 from maamountki/z14
[ZARCH] Many improvements
|
2019-02-13 20:15:56 +01:00 |
maamountki
|
0a54c98b9d
|
[ZARCH] Modify constraints
|
2019-02-13 21:06:25 +02:00 |
maamountki
|
bec54ae366
|
[ZARCH] Fix caxpy
|
2019-02-13 12:54:35 +02:00 |
Martin Kroeker
|
ab1630f9fa
|
Fix declaration of arguments in inline assembly
Argument 0 is modified so should be input and output
|
2019-02-12 16:14:02 +01:00 |
Martin Kroeker
|
b824fa70eb
|
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
Arguments 0 and 1 are both input and output
|
2019-02-12 16:00:18 +01:00 |
Martin Kroeker
|
91481a3e4e
|
Fix declaration of input arguments in inline assembly
Argument 0 is modified as it doubles as a counter
|
2019-02-12 15:51:43 +01:00 |
Martin Kroeker
|
dc6ac9eab0
|
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGEMV_N kernels
Arguments 0 and 1 need to be tagged as both input and output
|
2019-02-12 15:33:48 +01:00 |
maamountki
|
f583674109
|
[ZARCH] Fix cgemv_t_4
|
2019-02-12 13:12:28 +02:00 |
maamountki
|
77fe70019f
|
[ZARCH] Fix constraints and source code formatting
|
2019-02-11 16:01:13 +02:00 |
maamountki
|
7039770165
|
[ZARCH] Undo the last commit
|
2019-02-06 20:11:44 +02:00 |
maamountki
|
11a43e8116
|
[ZARCH] Set alignment hint for vl/vst
|
2019-02-05 19:17:08 +02:00 |
maamountki
|
61526480f9
|
[ZARCH] Fix copy constraint
|
2019-02-05 07:51:19 +02:00 |
maamountki
|
81daf6bc38
|
[ZARCH] Format source code, Fix constraints
|
2019-02-05 07:30:38 +02:00 |
Martin Kroeker
|
729e925174
|
Merge pull request #1996 from quickwritereader/develop
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
|
2019-02-04 16:52:04 +01:00 |
Ubuntu
|
498ac98581
|
Note for unused kernels
|
2019-02-04 15:41:56 +00:00 |
Ubuntu
|
cd9ea45463
|
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
|
2019-02-04 06:57:11 +00:00 |
Martin Kroeker
|
f9c5023e04
|
Merge pull request #1994 from quickwritereader/develop
sgemv cgemv pairs
|
2019-02-01 21:04:47 +01:00 |
Ubuntu
|
4abc375a91
|
sgemv cgemv pairs
|
2019-02-01 13:45:00 +00:00 |
Martin Kroeker
|
874df65491
|
Fix incorrect sgemv results for IBM z14
part of PR #1993 that was inadvertently misplaced into the toplevel directory
|
2019-02-01 12:58:59 +01:00 |
Martin Kroeker
|
877023e1e1
|
Fix precision of zarch DSDOT
from patch provided by aarnez in #991
|
2019-01-31 21:22:26 +01:00 |
Martin Kroeker
|
265142edd5
|
Fix typo in the zarch min/max kernels
from patch provided by aarnez in #991
|
2019-01-31 21:21:40 +01:00 |
Martin Kroeker
|
885a3c4350
|
USE_TRMM on Z14
from patch provided by aarnez in #991
|
2019-01-31 21:18:09 +01:00 |
maamountki
|
82124729af
|
Merge branch 'develop' into z14
|
2019-01-31 19:36:41 +02:00 |
maamountki
|
29416cb5a3
|
[ZARCH] Add Z13 version for max/min functions
|
2019-01-31 19:11:11 +02:00 |
maamountki
|
48b9b94f7f
|
[ZARCH] Improve loading performance for camax/icamax
|
2019-01-31 18:52:11 +02:00 |
Martin Kroeker
|
86a824c97f
|
Fix wrong comparison that made IMIN identical to IMAX
as reported by aarnez in #1990
|
2019-01-31 15:27:21 +01:00 |
Martin Kroeker
|
808410c2c7
|
Fix wrong comparison that made IMIN identical to IMAX
as suggested in #1990
|
2019-01-31 15:25:15 +01:00 |
maamountki
|
fcd814a8d2
|
[ZARCH] Fix bug in max/min functions
|
2019-01-29 17:59:38 +02:00 |
maamountki
|
dc4d3bccd5
|
[ZARCH] Fix icamax/icamin
|
2019-01-29 03:47:49 +02:00 |
maamountki
|
c7143c1019
|
[ZARCH] Fix iamax/imax single precision
|
2019-01-28 17:52:23 +02:00 |
maamountki
|
04873bb174
|
[ZARCH] Undo the last commit
|
2019-01-28 17:32:24 +02:00 |
maamountki
|
c8ef9fb220
|
[ZARCH] Fix bug in iamax/iamin/imax/imin
|
2019-01-28 17:16:18 +02:00 |
maamountki
|
b111829226
|
[ZARCH] Update max/min functions
|
2019-01-21 15:56:04 +02:00 |
Martin Kroeker
|
32b0f1168e
|
Fix declaration of input arguments in the Sandybridge GER microkernels (#1967)
* Tag arguments 0 and 1 as both input and output
|
2019-01-18 08:11:39 +01:00 |
Martin Kroeker
|
b495e54310
|
Fix declaration of input arguments in the x86_64 SCAL microkernels (#1966)
* Tag arguments 0 and 1 as both input and output (see #1964)
|
2019-01-18 08:11:07 +01:00 |
Martin Kroeker
|
d5e6940253
|
Fix declaration of input arguments in the x86_64 microkernels for DOT and AXPY (#1965)
* Tag operands 0 and 1 as both input and output
For #1964 (basically a continuation of coding problems first seen in #1292)
|
2019-01-17 23:20:32 +01:00 |
Ubuntu
|
43a4572038
|
crot fix
|
2019-01-17 14:45:31 +00:00 |
Abdelrauf
|
a034e65512
|
Merge branch 'develop' into develop
|
2019-01-16 19:25:13 +04:00 |
Ubuntu
|
8c3386be87
|
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored version of srot),isamax ,isamin, icamax, icamin},
Fixed idamin,icamin choosing the first occurance index of equal minimals
|
2019-01-16 15:16:21 +00:00 |
maamountki
|
b815a04c87
|
[ZARCH] fix a bug in max/min functions
|
2019-01-15 21:04:22 +02:00 |
maamountki
|
1a7925b3a3
|
[ZARCH] Update dgemv_n_4.c
|
2019-01-11 17:43:11 +02:00 |
maamountki
|
406f835f00
|
[ZARCH] update cgemv_n_4.c
|
2019-01-11 17:39:17 +02:00 |
maamountki
|
621dedb37b
|
[ZARCH] Update cgemv_t_4.c
|
2019-01-11 17:37:11 +02:00 |
maamountki
|
b731e8246f
|
Update sgemv_t_4.c
|
2019-01-11 17:14:04 +02:00 |
maamountki
|
ecc31b743f
|
Update dgemv_t_4.c
|
2019-01-11 17:13:02 +02:00 |
maamountki
|
5d89d6b143
|
[ZARCH] fix sgemv_n_4.c
|
2019-01-11 17:08:24 +02:00 |
maamountki
|
67432b23c2
|
[ZARCH] fix cgemv_n_4.c
|
2019-01-11 16:44:46 +02:00 |
maamountki
|
be66f5d5c2
|
[ZARCH] fix data prefetch type in sdot
|
2019-01-09 16:50:07 +02:00 |
maamountki
|
c2ffef8156
|
[ZARCH] fix data prefetch type in ddot
|
2019-01-09 16:49:44 +02:00 |
maamountki
|
e7455f500c
|
[ZARCH] fix dsdot.c
|
2019-01-09 16:33:54 +02:00 |
maamountki
|
3eafcfa650
|
[ZARCH] fix cgemv_n_4.c
|
2019-01-09 07:43:45 +02:00 |
maamountki
|
94cd946b96
|
[ZARCH] fix cgemv_n_4.c
|
2019-01-04 17:45:56 +02:00 |
maamountki
|
1aa840a0a2
|
[ZARCH] fix sgemv_t_4.c
|
2019-01-04 01:38:18 +02:00 |
Arjan van de Ven
|
795285c587
|
Fix thinko in skylake beta handling
casting ints is cheaper but it has a rounding, not memory casing effect, resulting in
invalid outcome
|
2018-12-24 18:49:50 +00:00 |
Arjan van de Ven
|
d321448a63
|
dgemm: use dgemm_ncopy_8_skylakex.c also for Haswell
The dgemm_ncopy_8_skylakex.c code is not avx512 specific and gives
a nice performance boost for medium sized matrices
|
2018-12-16 23:09:22 +00:00 |
Arjan van de Ven
|
c43331ad0a
|
dgemm: Use the skylakex beta function also for haswell
it's more efficient for certain tall/skinny matrices
|
2018-12-16 23:09:17 +00:00 |
Martin Kroeker
|
c4e23dd016
|
Update Makefile
|
2018-12-16 18:14:40 +01:00 |
Martin Kroeker
|
cfc4acc221
|
typo
|
2018-12-16 16:19:51 +01:00 |
Martin Kroeker
|
545c2b1bbb
|
Add -mavx2 on Haswell only if the compiler supports it
|
2018-12-16 13:09:19 +01:00 |
Arjan van de Ven
|
69d206440a
|
Make the skylakex/haswell sgemm code compile and run even with compilers without avx2 support
|
2018-12-16 00:19:41 +00:00 |
Martin Kroeker
|
3843e3e017
|
use -maxv2 on haswell
|
2018-12-15 23:30:31 +01:00 |
Martin Kroeker
|
fbcb14a74b
|
should be core-avx2
|
2018-12-15 20:18:59 +01:00 |
Martin Kroeker
|
2a3190dc76
|
fix elseifeq and use older option core2-avx for compatibility
|
2018-12-15 20:17:44 +01:00 |