Martin Kroeker
|
b36018be6d
|
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
|
2020-01-09 23:23:09 +01:00 |
wjc404
|
3a100b2797
|
Update KERNEL.SKYLAKEX
|
2020-01-09 13:48:41 +08:00 |
Martin Kroeker
|
38742d5547
|
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
|
2020-01-08 16:20:28 +01:00 |
wjc404
|
bd4c032f52
|
Update sgemm_kernel_8x4_haswell.c
|
2020-01-07 11:22:46 +08:00 |
wjc404
|
9dc9b7b95e
|
Update sgemm_kernel_8x4_haswell.c
|
2020-01-06 20:11:36 +08:00 |
wjc404
|
92b10212de
|
optimize AVX2 SGEMM
|
2020-01-06 12:11:21 +08:00 |
wjc404
|
b73bf01378
|
optimize AVX2 SGEMM
|
2020-01-06 12:09:14 +08:00 |
wjc404
|
eb3c9f1db9
|
optimize AVX2 SGEMM
|
2020-01-06 12:07:02 +08:00 |
Martin Kroeker
|
456ee2e1f0
|
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
|
2020-01-02 22:28:36 +01:00 |
shengyang
|
80db5f11e1
|
update
|
2020-01-02 11:01:57 +08:00 |
chenxuqiang
|
52de4cc8fd
|
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
|
2020-01-01 21:50:45 -05:00 |
Martin Kroeker
|
44028581cc
|
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
|
2020-01-01 22:14:16 +01:00 |
Martin Kroeker
|
86ab939936
|
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
|
2020-01-01 22:13:37 +01:00 |
Martin Kroeker
|
6c85cb1869
|
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
|
2019-12-31 18:08:10 +01:00 |
Martin Kroeker
|
995768bbc5
|
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
|
2019-12-31 18:07:37 +01:00 |
int_13h
|
96ad579428
|
add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
|
2019-12-31 18:03:27 +01:00 |
shengyang
|
8d84403205
|
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
|
2019-12-31 17:06:35 +08:00 |
w00421467
|
0833a4846a
|
Use arm neon instructions to optimize sgemm_beta operation
|
2019-12-31 10:42:03 +08:00 |
zq
|
50f7fc1401
|
[WIP] Use arm neon instructions to optimize tcopy operation
|
2019-12-31 10:21:23 +08:00 |
w00421467
|
d1b53806be
|
Merge remote-tracking branch 'pub/develop' into develop
|
2019-12-31 10:13:24 +08:00 |
wjc404
|
a0f0a802fc
|
Update zgemm3m_kernel_4x4_haswell.c
|
2019-12-30 17:33:42 +08:00 |
wjc404
|
700fe5b5ee
|
Add files via upload
|
2019-12-30 17:18:59 +08:00 |
wjc404
|
f60840c420
|
Update KERNEL.ZEN
|
2019-12-30 16:04:23 +08:00 |
wjc404
|
109e18cd96
|
Update KERNEL.HASWELL
|
2019-12-30 16:03:24 +08:00 |
wjc404
|
ae1579be13
|
Create zgemm3m_kernel_4x4_haswell.c
|
2019-12-30 16:02:51 +08:00 |
w00421467
|
3ccf8885ac
|
prefetching for dgemm_beta
|
2019-12-30 11:45:49 +08:00 |
wjc404
|
cd765f094b
|
Update cgemm3m_kernel_8x4_haswell.c
|
2019-12-27 18:23:29 +08:00 |
wjc404
|
3a66c8cac1
|
Update KERNEL.ZEN
|
2019-12-27 18:04:08 +08:00 |
wjc404
|
ed9af2f7da
|
Update KERNEL.HASWELL
|
2019-12-27 18:01:38 +08:00 |
wjc404
|
5fd1edead9
|
Create cgemm3m_kernel_8x4_haswell.c
|
2019-12-27 18:00:55 +08:00 |
wjc404
|
eeecd623d8
|
Update cgemm_kernel_8x2_haswell.c
|
2019-12-24 00:40:16 +08:00 |
wjc404
|
2cd9306bb5
|
Update KERNEL.ZEN
|
2019-12-23 23:42:30 +08:00 |
wjc404
|
c418c81224
|
Update KERNEL.HASWELL
|
2019-12-23 23:41:44 +08:00 |
wjc404
|
025741f16a
|
Fast Haswell CGEMM kernel
|
2019-12-23 23:40:03 +08:00 |
wjc404
|
f41d52665d
|
Fast Haswell ZGEMM kernel
|
2019-12-21 14:37:06 +08:00 |
wjc404
|
d573d24de7
|
Fast Haswell ZGEMM kernel
|
2019-12-21 14:35:15 +08:00 |
w00421467
|
b7cc69ee62
|
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
|
2019-12-20 10:11:50 +08:00 |
w00421467
|
aeef942c4f
|
use arm neon instructions to optimize gemm beta operation
|
2019-12-17 10:00:13 +08:00 |
Martin Kroeker
|
1a6ea8ee6d
|
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
|
2019-12-09 17:54:49 +01:00 |
Kavana Bhat
|
6baa9b07d7
|
AIX changes for Power8
|
2019-12-06 04:33:32 -06:00 |
Kavana Bhat
|
3938e59569
|
AIX changes for Power8
|
2019-12-04 00:23:46 -06:00 |
Isuru Fernando
|
b863b32ac5
|
Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
|
2019-12-01 12:59:46 -06:00 |
Martin Kroeker
|
dd04143d4a
|
Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
|
2019-11-30 12:23:57 +01:00 |
Martin Kroeker
|
f3a6164bff
|
Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
|
2019-11-30 00:03:42 +01:00 |
Martin Kroeker
|
dedd822d1a
|
Fix caxpy/caxpyc naming in localentry
|
2019-11-29 23:56:57 +01:00 |
Martin Kroeker
|
2181fb7047
|
Fix caxpy/caxpyc naming in localentry
|
2019-11-29 23:54:15 +01:00 |
Martin Kroeker
|
a9b62c03f8
|
Substitute precompiled gcc7 codes only when gcc is older than 9.x
|
2019-11-29 23:49:50 +01:00 |
Martin Kroeker
|
97762234f9
|
Add variable for gcc >=9 test
used in KERNEL.POWER9
|
2019-11-29 23:47:23 +01:00 |
wjc404
|
934e601e93
|
Update dgemm_kernel_4x8_skylakex_2.c
|
2019-11-28 19:56:35 +08:00 |
Anton Blanchard
|
cf2a8e410c
|
Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
|
2019-11-26 21:55:04 -07:00 |