Martin Kroeker
4cf7315a5d
Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS
2018-09-06 21:41:54 +02:00
Arjan van de Ven
6eb4b9ae7c
Tune HASWELL SWITCH_RATIO as well
...
Similar to the SKYLAKEX patch, 32 seems to work best
(much better than 4 or 16)
Before (4)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15554.3 7.2 0.2% 30353.8 3.7 0.3%
64 x 64 30346.8 8.7 1.6% 63495.0 4.1 -0.1%
65 x 65 81668.1 3.4 -123.3% 82705.2 3.3 -21.2%
80 x 80 105045.9 4.9 -95.5% 115226.0 4.5 -2.2%
96 x 96 152461.2 5.8 -74.3% 148156.3 6.0 16.4%
112 x 112 188505.2 7.5 -42.2% 171187.3 8.2 36.4%
128 x 128 257884.0 8.1 -39.5% 224764.8 9.3 46.0%
Intermediate (16)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15565.7 7.2 0.2% 30378.9 3.7 0.2%
64 x 64 30430.2 8.7 1.3% 63046.4 4.2 0.6%
65 x 65 27306.0 10.1 25.3% 38879.2 7.1 43.0%
80 x 80 51008.7 10.1 5.1% 61007.6 8.4 45.9%
96 x 96 70856.7 12.5 19.0% 83403.1 10.6 53.0%
112 x 112 84769.9 16.6 36.0% 99920.1 14.1 62.9%
128 x 128 84213.2 25.0 54.5% 113024.2 18.6 72.8%
After (32)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15537.3 7.2 0.3% 30537.0 3.6 -0.3%
64 x 64 30352.7 8.7 1.6% 62597.8 4.2 1.3%
65 x 65 36857.0 7.5 -0.8% 56167.6 4.9 17.7%
80 x 80 42552.6 12.1 20.8% 69536.7 7.4 38.3%
96 x 96 52101.5 17.1 40.5% 91016.1 9.7 48.7%
112 x 112 63853.7 22.1 51.8% 110507.4 12.7 58.9%
128 x 128 73966.1 28.4 60.0% 163146.4 12.9 60.8%
2018-06-17 17:08:36 +00:00
Arjan van de Ven
5c6f008365
Tune param.h for SkylakeX
...
param.h defines a per-platform SWITCH_RATIO, which is used as a measure for how fine
grained the blocks for gemm need to be split up. Many platforms define this to 4.
The reality is that the gemm low level implementation for SkylakeX likes bigger blocks
due to the nature of SIMD... by tuning the SWITCH_RATIO to 32 the threading performance
improves significantly:
Before
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 10756.0 10.5 -0.5% 18296.7 6.1 -1.7%
64 x 64 20490.0 12.9 1.4% 40615.0 6.5 0.0%
65 x 65 83528.3 3.3 -210.9% 96319.0 2.9 -83.3%
80 x 80 101453.5 5.1 -166.3% 128021.7 4.0 -76.6%
96 x 96 149795.1 5.9 -143.1% 168059.4 5.3 -47.4%
112 x 112 191481.2 7.3 -105.8% 204165.0 6.9 -14.6%
128 x 128 265019.2 7.9 -99.0% 272006.4 7.7 -5.3%
After
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 10666.3 10.6 0.4% 18236.9 6.2 -1.4%
64 x 64 20410.1 13.0 1.8% 39925.8 6.6 1.7%
65 x 65 34983.0 7.9 -30.2% 51494.6 5.4 2.0%
80 x 80 39769.1 13.0 -4.4% 63805.2 8.1 12.0%
96 x 96 45169.6 19.7 26.7% 80065.8 11.1 29.8%
112 x 112 57026.1 24.7 38.7% 99535.5 14.2 44.1%
128 x 128 64789.8 32.5 51.3% 117407.2 17.9 54.6%
With this change, threading starts to be a win already at 96x96
2018-06-17 15:47:50 +00:00
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
...
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker
d94d7baf7e
Add mips32r2 api target
2018-05-02 20:17:26 +02:00
Shivraj Patil
e3d844b062
Added mips I6500 core
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2017-09-22 11:57:43 +05:30
Gian-Carlo Pascutto
832a272784
Revert Zen param.h to Haswell values (instead of Excavator).
2017-04-18 12:40:25 +02:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
...
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Abdurrauf
08786c4b95
strmm and ctrmm
2017-03-13 01:23:16 +04:00
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
2017-03-06 04:27:40 +04:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Abdurrauf
0d96b0e2a7
Merge branch 'z13' into develop
2017-02-26 06:17:33 +04:00
Abdurrauf
848cb27b1e
ztrmm kernel.
2017-02-26 06:14:12 +04:00
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
2017-01-16 21:44:33 +05:30
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
2017-01-11 11:18:40 +05:30
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
2017-01-10 15:01:17 +05:30
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
...
Conflicts:
CONTRIBUTORS.md
2017-01-09 05:52:42 -05:00
Abdurrauf
6418667818
dtrmm and dgemm for z13
2017-01-04 19:32:33 +04:00
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-10 17:44:22 +05:30
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-08-08 11:58:01 +05:30
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-07-15 18:38:25 +05:30
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
...
Improvements to Aarch64 kernels
2016-07-14 15:47:55 -04:00
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
2016-07-14 13:56:04 +05:30
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-06-28 17:51:10 +05:30
Shivraj Patil
c4ba40e308
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-19 11:04:42 +05:30
Werner Saar
88011f625d
Merge pull request #876 from wernsaar/develop
...
optimized dgemm on power8 for 20 threads
2016-05-16 14:52:40 +02:00
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
2016-05-16 14:14:25 +02:00
Shivraj Patil
085cf236c2
conflict resolved by syncing with 'xianyi:develop'
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-04 11:07:14 +05:30
Shivraj Patil
b7b3d8ec8e
DGEMM optimization for MIPS P5600 and I6400 using MSA
...
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-05-03 14:42:26 +05:30
Zhang Xianyi
cd7af5260a
Merge pull request #847 from sva-img/develop
...
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
2016-04-29 11:44:36 -04:00
Werner Saar
782f75ba94
optimized param.h for POWER8
2016-04-27 15:48:09 +02:00
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
2016-04-27 14:01:08 +02:00
Werner Saar
40ac64ae4f
updated param.h for EXCAVATOR
2016-04-25 10:40:04 +02:00
Werner Saar
089aad57f7
updated param.h for POWER8
2016-04-23 14:26:24 +02:00
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
2016-04-22 13:07:12 +02:00
Shivraj Patil
2c3dfe2bf3
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
...
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2016-04-22 14:03:18 +05:30
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
2016-04-19 16:08:54 +02:00
Zhang Xianyi
dd43661cfd
Init IBM z system (s390x) porting.
2016-04-15 18:02:24 -04:00
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
2016-04-08 09:05:37 +02:00
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
2016-04-07 15:08:15 +02:00
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
2016-04-06 11:15:21 +02:00
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
2016-04-04 09:12:08 +02:00
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
2016-04-02 17:16:36 +02:00
Werner Saar
e1df5a6e23
fixed sgemm- and strmm-kernel
2016-03-18 12:12:03 +01:00
Werner Saar
5c658f8746
add optimized cgemm- and ctrmm-kernel for POWER8
2016-03-18 08:17:25 +01:00
Werner Saar
96284ab295
added sgemm- and strmm-kernel for POWER8
2016-03-14 13:52:44 +01:00