wjc404
|
3447d04eaf
|
Update dgemm_kernel_16x2_skylakex.c
|
2020-02-06 02:14:10 +00:00 |
wjc404
|
8b5cdcc64c
|
Update sgemm_kernel_8x4_haswell.c
|
2020-02-06 01:47:46 +00:00 |
wjc404
|
4e00d96a78
|
Update dgemm_kernel_16x2_skylakex.c
|
2020-02-06 01:46:36 +00:00 |
wjc404
|
096da2f51a
|
Update dgemm_kernel_16x2_skylakex.c
|
2020-02-05 13:36:57 +08:00 |
wjc404
|
081b188529
|
Update KERNEL.SKYLAKEX
|
2020-02-03 21:38:08 +08:00 |
wjc404
|
8019e70211
|
AVX512 16x2 DGEMM kernel
|
2020-02-03 21:32:56 +08:00 |
Qiyu8
|
ff42e68652
|
Optimize genenal Gemm Beta
|
2020-01-20 11:49:42 +08:00 |
Martin Kroeker
|
70f45749b9
|
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
|
2020-01-15 21:13:43 +01:00 |
wjc404
|
e5dcdeb550
|
Update sgemm_direct_skylakex.c
|
2020-01-13 16:59:23 +08:00 |
wjc404
|
952cc2ba38
|
Update sgemm_kernel_16x4_skylakex_2.c
|
2020-01-13 16:58:54 +08:00 |
wjc404
|
feaafbedd3
|
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
|
2020-01-13 16:28:41 +08:00 |
Martin Kroeker
|
b36018be6d
|
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
|
2020-01-09 23:23:09 +01:00 |
wjc404
|
3a100b2797
|
Update KERNEL.SKYLAKEX
|
2020-01-09 13:48:41 +08:00 |
Martin Kroeker
|
38742d5547
|
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
|
2020-01-08 16:20:28 +01:00 |
wjc404
|
bd4c032f52
|
Update sgemm_kernel_8x4_haswell.c
|
2020-01-07 11:22:46 +08:00 |
wjc404
|
9dc9b7b95e
|
Update sgemm_kernel_8x4_haswell.c
|
2020-01-06 20:11:36 +08:00 |
wjc404
|
92b10212de
|
optimize AVX2 SGEMM
|
2020-01-06 12:11:21 +08:00 |
wjc404
|
b73bf01378
|
optimize AVX2 SGEMM
|
2020-01-06 12:09:14 +08:00 |
wjc404
|
eb3c9f1db9
|
optimize AVX2 SGEMM
|
2020-01-06 12:07:02 +08:00 |
Martin Kroeker
|
456ee2e1f0
|
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
|
2020-01-02 22:28:36 +01:00 |
shengyang
|
80db5f11e1
|
update
|
2020-01-02 11:01:57 +08:00 |
chenxuqiang
|
52de4cc8fd
|
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
|
2020-01-01 21:50:45 -05:00 |
Martin Kroeker
|
44028581cc
|
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
|
2020-01-01 22:14:16 +01:00 |
Martin Kroeker
|
86ab939936
|
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
|
2020-01-01 22:13:37 +01:00 |
Martin Kroeker
|
6c85cb1869
|
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
|
2019-12-31 18:08:10 +01:00 |
Martin Kroeker
|
995768bbc5
|
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
|
2019-12-31 18:07:37 +01:00 |
int_13h
|
96ad579428
|
add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
|
2019-12-31 18:03:27 +01:00 |
shengyang
|
8d84403205
|
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
|
2019-12-31 17:06:35 +08:00 |
w00421467
|
0833a4846a
|
Use arm neon instructions to optimize sgemm_beta operation
|
2019-12-31 10:42:03 +08:00 |
zq
|
50f7fc1401
|
[WIP] Use arm neon instructions to optimize tcopy operation
|
2019-12-31 10:21:23 +08:00 |
w00421467
|
d1b53806be
|
Merge remote-tracking branch 'pub/develop' into develop
|
2019-12-31 10:13:24 +08:00 |
wjc404
|
a0f0a802fc
|
Update zgemm3m_kernel_4x4_haswell.c
|
2019-12-30 17:33:42 +08:00 |
wjc404
|
700fe5b5ee
|
Add files via upload
|
2019-12-30 17:18:59 +08:00 |
wjc404
|
f60840c420
|
Update KERNEL.ZEN
|
2019-12-30 16:04:23 +08:00 |
wjc404
|
109e18cd96
|
Update KERNEL.HASWELL
|
2019-12-30 16:03:24 +08:00 |
wjc404
|
ae1579be13
|
Create zgemm3m_kernel_4x4_haswell.c
|
2019-12-30 16:02:51 +08:00 |
w00421467
|
3ccf8885ac
|
prefetching for dgemm_beta
|
2019-12-30 11:45:49 +08:00 |
wjc404
|
cd765f094b
|
Update cgemm3m_kernel_8x4_haswell.c
|
2019-12-27 18:23:29 +08:00 |
wjc404
|
3a66c8cac1
|
Update KERNEL.ZEN
|
2019-12-27 18:04:08 +08:00 |
wjc404
|
ed9af2f7da
|
Update KERNEL.HASWELL
|
2019-12-27 18:01:38 +08:00 |
wjc404
|
5fd1edead9
|
Create cgemm3m_kernel_8x4_haswell.c
|
2019-12-27 18:00:55 +08:00 |
wjc404
|
eeecd623d8
|
Update cgemm_kernel_8x2_haswell.c
|
2019-12-24 00:40:16 +08:00 |
wjc404
|
2cd9306bb5
|
Update KERNEL.ZEN
|
2019-12-23 23:42:30 +08:00 |
wjc404
|
c418c81224
|
Update KERNEL.HASWELL
|
2019-12-23 23:41:44 +08:00 |
wjc404
|
025741f16a
|
Fast Haswell CGEMM kernel
|
2019-12-23 23:40:03 +08:00 |
wjc404
|
f41d52665d
|
Fast Haswell ZGEMM kernel
|
2019-12-21 14:37:06 +08:00 |
wjc404
|
d573d24de7
|
Fast Haswell ZGEMM kernel
|
2019-12-21 14:35:15 +08:00 |
w00421467
|
b7cc69ee62
|
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
|
2019-12-20 10:11:50 +08:00 |
w00421467
|
aeef942c4f
|
use arm neon instructions to optimize gemm beta operation
|
2019-12-17 10:00:13 +08:00 |
Martin Kroeker
|
1a6ea8ee6d
|
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
|
2019-12-09 17:54:49 +01:00 |