Commit Graph

1313 Commits

Author SHA1 Message Date
Martin Kroeker b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404 3a100b2797
Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
Martin Kroeker 38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404 bd4c032f52
Update sgemm_kernel_8x4_haswell.c 2020-01-07 11:22:46 +08:00
wjc404 9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c 2020-01-06 20:11:36 +08:00
wjc404 92b10212de
optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404 b73bf01378
optimize AVX2 SGEMM 2020-01-06 12:09:14 +08:00
wjc404 eb3c9f1db9
optimize AVX2 SGEMM 2020-01-06 12:07:02 +08:00
Martin Kroeker 456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
shengyang 80db5f11e1 update 2020-01-02 11:01:57 +08:00
chenxuqiang 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
2020-01-01 21:50:45 -05:00
Martin Kroeker 44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker 86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00
Martin Kroeker 6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker 995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h 96ad579428 add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang 8d84403205 Use arm neon instructions to optimize ncopy operation
modified:   KERNEL.ARMV8
	modified:   KERNEL.TSV110
	new file:   sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
w00421467 0833a4846a Use arm neon instructions to optimize sgemm_beta operation 2019-12-31 10:42:03 +08:00
zq 50f7fc1401 [WIP] Use arm neon instructions to optimize tcopy operation 2019-12-31 10:21:23 +08:00
w00421467 d1b53806be Merge remote-tracking branch 'pub/develop' into develop 2019-12-31 10:13:24 +08:00
wjc404 a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
wjc404 700fe5b5ee
Add files via upload 2019-12-30 17:18:59 +08:00
wjc404 f60840c420
Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404 109e18cd96
Update KERNEL.HASWELL 2019-12-30 16:03:24 +08:00
wjc404 ae1579be13
Create zgemm3m_kernel_4x4_haswell.c 2019-12-30 16:02:51 +08:00
w00421467 3ccf8885ac prefetching for dgemm_beta 2019-12-30 11:45:49 +08:00
wjc404 cd765f094b
Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
wjc404 3a66c8cac1
Update KERNEL.ZEN 2019-12-27 18:04:08 +08:00
wjc404 ed9af2f7da
Update KERNEL.HASWELL 2019-12-27 18:01:38 +08:00
wjc404 5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:00:55 +08:00
wjc404 eeecd623d8
Update cgemm_kernel_8x2_haswell.c 2019-12-24 00:40:16 +08:00
wjc404 2cd9306bb5
Update KERNEL.ZEN 2019-12-23 23:42:30 +08:00
wjc404 c418c81224
Update KERNEL.HASWELL 2019-12-23 23:41:44 +08:00
wjc404 025741f16a
Fast Haswell CGEMM kernel 2019-12-23 23:40:03 +08:00
wjc404 f41d52665d
Fast Haswell ZGEMM kernel 2019-12-21 14:37:06 +08:00
wjc404 d573d24de7
Fast Haswell ZGEMM kernel 2019-12-21 14:35:15 +08:00
w00421467 b7cc69ee62 declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL 2019-12-20 10:11:50 +08:00
w00421467 aeef942c4f use arm neon instructions to optimize gemm beta operation 2019-12-17 10:00:13 +08:00
Martin Kroeker 1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Kavana Bhat 6baa9b07d7 AIX changes for Power8 2019-12-06 04:33:32 -06:00
Kavana Bhat 3938e59569 AIX changes for Power8 2019-12-04 00:23:46 -06:00
Isuru Fernando b863b32ac5 Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker dedd822d1a
Fix caxpy/caxpyc naming in localentry 2019-11-29 23:56:57 +01:00
Martin Kroeker 2181fb7047
Fix caxpy/caxpyc naming in localentry 2019-11-29 23:54:15 +01:00
Martin Kroeker a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x 2019-11-29 23:49:50 +01:00
Martin Kroeker 97762234f9
Add variable for gcc >=9 test
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
wjc404 934e601e93
Update dgemm_kernel_4x8_skylakex_2.c 2019-11-28 19:56:35 +08:00
Anton Blanchard cf2a8e410c Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00