Martin Kroeker
f16e39554d
Change PPCG4 CGEMM_M to match kernel change
2020-06-03 09:15:29 +02:00
张丹枫
ea5bdc3f72
split cortex-a53 param to match 8x8 kernel
2020-05-20 22:34:47 +08:00
Marius Hillenbrand
1b0b4349a1
s390x/Z14: Change register blocking for SGEMM to 16x4
...
Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4
by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy
implementations. Actually make KERNEL.Z14 more flexible, so that the
change in param.h suffices. As a result, performance for SGEMM improves
by around 30% on z15.
On z14, FP SIMD instructions can operate on float-sized scalars in
vector registers, while z13 could do that for double-sized scalars only.
Thus, we can double the amount of elements of C that are held in
registers in an SGEMM kernel.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com >
2020-05-12 15:59:51 +02:00
Martin Kroeker
03ff213c51
Increase POWER8 ZGEMM_R and use same R values for POWER9
...
fixes lapack-test zger failures seen in #2299 after application of my PR #2551
2020-04-24 21:46:54 +02:00
Martin Kroeker
00172d440b
Typo fix in MIPS24K addition
2020-04-18 21:16:49 +02:00
Martin Kroeker
61bbae3ac1
Handle MIPS24K like P5600
...
and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)
2020-04-18 21:09:32 +02:00
Martin Kroeker
a33d177430
Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9
...
As shown in #2538 , default buffersizes on some platforms were smaller than required in memory.c
and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used
2020-04-12 19:44:48 +02:00
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
...
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
2020-03-30 20:15:59 +02:00
wjc404
64daad4365
Update param.h
2020-03-20 21:46:18 +00:00
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
...
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
2020-02-29 19:07:35 +01:00
Ali Saidi
c623a965f9
Add Neoverse-N1 core
...
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
Xianyi Zhang
265ab484c8
Change default RISC-V 64-bit corename to RISCV64_GENERIC
...
e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc
2020-02-27 14:46:15 +08:00
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
2020-02-27 13:53:49 +08:00
Martin Kroeker
8164fd1328
Always assume server-class cpu count for TSV110 and EMAG8180
2020-02-26 22:19:57 +01:00
Martin Kroeker
71e5669c3e
Add preliminary support for EMAG8180 ARMV8 processor
2020-02-19 18:57:26 +01:00
wjc404
b0558c11b9
Update param.h
2020-02-16 23:01:31 +08:00
wjc404
83b6be7976
Update param.h
2020-02-04 19:55:26 +08:00
wjc404
f3f969f681
Update param.h
2020-02-03 21:34:12 +08:00
Wang,Long
fbf4f48f4a
fix a few performance drop in some matrix size per data type
...
Signed-off-by: Wang,Long <long1.wang@intel.com >
2020-01-22 15:15:04 +00:00
wjc404
1c67567008
improve skylakex paralleled sgemm performance
2020-01-13 16:26:03 +08:00
wjc404
b7b408a120
optimize AVX2 SGEMM
2020-01-06 12:16:09 +08:00
wjc404
6362c34ee6
Update param.h
2019-12-30 16:08:19 +08:00
wjc404
64639f440f
Update param.h
2019-12-27 18:06:42 +08:00
wjc404
611445c7f8
Update param.h
2019-12-23 23:44:55 +08:00
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
2019-12-21 14:38:51 +08:00
wjc404
e20709e976
Update param.h
2019-11-28 19:57:50 +08:00
Martin Kroeker
6082e556cd
Use "generic" S/CGEMM unroll M on big-endian PPC970
...
as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian
2019-11-17 15:10:26 +01:00
Martin Kroeker
4c6a457358
Merge pull request #2300 from wjc404/develop
...
Optimize SGEMM on SKYLAKEX CPUs
2019-11-06 07:27:33 +01:00
wjc404
ae43b75a6a
Add files via upload
2019-11-02 10:09:19 +08:00
wjc404
274ff5cdb8
update sgemm_q on skylakex cpus
2019-11-01 23:59:18 +08:00
Martin Kroeker
df857551c0
Remove special parameter set for obsolete IOS/ARMV8 workaround
2019-10-25 23:07:00 +02:00
wjc404
5da9484d93
Add files via upload
2019-10-16 02:01:13 +08:00
Martin Kroeker
6b83079368
Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters ( #2267 )
...
There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.
2019-09-25 23:13:24 +02:00
Martin Kroeker
6b6c9b1441
Merge pull request #2172 from quickwritereader/develop
...
power9 cgemm/ctrmm. new sgemm 8x16
2019-07-01 21:06:02 +02:00
AbdelRauf
a97b301aaa
cgemm/ctrmm power9
2019-07-01 14:07:54 +00:00
pkubaj
7c7505a778
Fix build for PPC970 on FreeBSD pt.2
...
FreeBSD needs those macros too.
2019-06-28 10:31:45 +00:00
AbdelRauf
cdbfb891da
new sgemm 8x16
2019-06-17 15:33:38 +00:00
AbdelRauf
d0c3543c3f
power9 zgemm ztrmm optimized
2019-06-05 20:07:16 +00:00
AbdelRauf
a469b32cf4
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
2019-06-04 07:11:30 +00:00
AbdelRauf
8fe794f059
improved zgemm power9 based on power8
2019-05-30 15:31:25 +00:00
xoviat
6cfd6195c5
param: define constant as blaslong to prevent overflow
2019-05-05 13:10:36 -05:00
AbdelRauf
628b335e83
Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop
2019-04-29 08:57:44 +00:00
AbdelRauf
0f105dd8a5
sgemm/strmm
2019-04-29 08:49:50 +00:00
Martin Kroeker
7c51cc8527
Merge branch 'develop' into develop
2019-03-29 19:36:29 +01:00
AbdelRauf
853a18bc17
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
2019-03-29 15:49:40 +00:00
Martin Kroeker
03d7110900
Merge pull request #2042 from maomao194313/develop
...
add TARGET support for HiSilicon tsv110 CPUs
2019-03-12 22:57:39 +01:00
maomao194313
7e3eb9b25d
make DYNAMIC_ARCH=1 package work on TSV110
2019-03-12 16:11:01 +08:00
ken-cunningham-webuse
b0c714ef60
param.h : enable defines for PPC970 on DarwinOS
...
fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
#define SGEMM_P SGEMM_DEFAULT_P
^
2019-03-07 12:03:25 -08:00
Martin Kroeker
bdc73a49e0
Add parameters for Z14
...
from patch provided by aarnez in #991
2019-01-31 21:14:37 +01:00
Martin Kroeker
bbfdd6c0fe
Increase Zen SWITCH_RATIO to 16
...
following GEMM benchmarks on Ryzen2700X. For #1464
2019-01-19 23:01:31 +01:00