Commit Graph

1336 Commits

Author SHA1 Message Date
Martin Kroeker cafdd999b8
Update caxpy_power8.S 2020-02-13 22:44:09 +01:00
Martin Kroeker 92ca92a46c
Update caxpy_power8.S 2020-02-13 21:24:54 +01:00
Martin Kroeker 486c35c5dc
Update icamin_power8.S 2020-02-13 18:38:43 +01:00
Martin Kroeker 5ba3699f41
Update isamin_power8.S 2020-02-13 00:00:32 +01:00
Martin Kroeker 8eefa530cd
Update isamax_power8.S 2020-02-12 23:59:50 +01:00
Martin Kroeker de40d47edf
Update isamin_power8.S 2020-02-12 23:57:48 +01:00
Martin Kroeker 7c162b8a21
Update isamax_power8.S 2020-02-12 23:56:57 +01:00
Martin Kroeker 0544cbc806
Fix syntax of endianness conditional 2020-02-12 20:00:29 +01:00
Martin Kroeker 120d20731f
Fix syntax of endianness conditional 2020-02-12 19:58:42 +01:00
Martin Kroeker dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround 2020-02-12 19:56:52 +01:00
Bart Oldeman 7ea5e07d1c Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
2020-02-12 14:11:44 +00:00
Martin Kroeker 7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM 2020-02-10 21:17:39 +01:00
wjc404 3447d04eaf
Update dgemm_kernel_16x2_skylakex.c 2020-02-06 02:14:10 +00:00
wjc404 8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c 2020-02-06 01:47:46 +00:00
wjc404 4e00d96a78
Update dgemm_kernel_16x2_skylakex.c 2020-02-06 01:46:36 +00:00
wjc404 096da2f51a
Update dgemm_kernel_16x2_skylakex.c 2020-02-05 13:36:57 +08:00
wjc404 081b188529
Update KERNEL.SKYLAKEX 2020-02-03 21:38:08 +08:00
wjc404 8019e70211
AVX512 16x2 DGEMM kernel 2020-02-03 21:32:56 +08:00
Qiyu8 ff42e68652 Optimize genenal Gemm Beta 2020-01-20 11:49:42 +08:00
Martin Kroeker 70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
2020-01-15 21:13:43 +01:00
wjc404 e5dcdeb550
Update sgemm_direct_skylakex.c 2020-01-13 16:59:23 +08:00
wjc404 952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c 2020-01-13 16:58:54 +08:00
wjc404 feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
Martin Kroeker b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404 3a100b2797
Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
Martin Kroeker 38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404 bd4c032f52
Update sgemm_kernel_8x4_haswell.c 2020-01-07 11:22:46 +08:00
wjc404 9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c 2020-01-06 20:11:36 +08:00
wjc404 92b10212de
optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404 b73bf01378
optimize AVX2 SGEMM 2020-01-06 12:09:14 +08:00
wjc404 eb3c9f1db9
optimize AVX2 SGEMM 2020-01-06 12:07:02 +08:00
Martin Kroeker 456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
shengyang 80db5f11e1 update 2020-01-02 11:01:57 +08:00
chenxuqiang 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
2020-01-01 21:50:45 -05:00
Martin Kroeker 44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker 86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00
Martin Kroeker 6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker 995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h 96ad579428 add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang 8d84403205 Use arm neon instructions to optimize ncopy operation
modified:   KERNEL.ARMV8
	modified:   KERNEL.TSV110
	new file:   sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
w00421467 0833a4846a Use arm neon instructions to optimize sgemm_beta operation 2019-12-31 10:42:03 +08:00
zq 50f7fc1401 [WIP] Use arm neon instructions to optimize tcopy operation 2019-12-31 10:21:23 +08:00
w00421467 d1b53806be Merge remote-tracking branch 'pub/develop' into develop 2019-12-31 10:13:24 +08:00
wjc404 a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
wjc404 700fe5b5ee
Add files via upload 2019-12-30 17:18:59 +08:00
wjc404 f60840c420
Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404 109e18cd96
Update KERNEL.HASWELL 2019-12-30 16:03:24 +08:00
wjc404 ae1579be13
Create zgemm3m_kernel_4x4_haswell.c 2019-12-30 16:02:51 +08:00
w00421467 3ccf8885ac prefetching for dgemm_beta 2019-12-30 11:45:49 +08:00
wjc404 cd765f094b
Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00