Commit Graph

7452 Commits

Author SHA1 Message Date
wjc404 1c67567008
improve skylakex paralleled sgemm performance 2020-01-13 16:26:03 +08:00
Martin Kroeker 4e979bf75b
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
2020-01-13 09:00:21 +01:00
Martin Kroeker daa4310db5
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
2020-01-12 22:00:50 +01:00
Martin Kroeker b8f3605132
Merge pull request #23 from xianyi/develop
rebase
2020-01-12 21:57:23 +01:00
Martin Kroeker b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404 3a100b2797
Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
Martin Kroeker 38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404 bd4c032f52
Update sgemm_kernel_8x4_haswell.c 2020-01-07 11:22:46 +08:00
wjc404 9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c 2020-01-06 20:11:36 +08:00
wjc404 9f5cdc49d4
Update CONTRIBUTORS.md 2020-01-06 12:28:43 +08:00
wjc404 b7b408a120
optimize AVX2 SGEMM 2020-01-06 12:16:09 +08:00
wjc404 92b10212de
optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404 b73bf01378
optimize AVX2 SGEMM 2020-01-06 12:09:14 +08:00
wjc404 eb3c9f1db9
optimize AVX2 SGEMM 2020-01-06 12:07:02 +08:00
Martin Kroeker fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
2020-01-03 15:03:30 +01:00
Martin Kroeker 2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
2020-01-03 11:10:00 +01:00
Martin Kroeker fbb894948c
Merge pull request #22 from xianyi/develop
rebase
2020-01-03 10:23:25 +01:00
Martin Kroeker e711659c90
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
2020-01-03 09:02:03 +01:00
shengyang 893e6e57c4 modified: ctest/din3 ctest/sin3 2020-01-03 10:03:33 +08:00
Martin Kroeker 456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
Martin Kroeker 9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
2020-01-02 22:27:44 +01:00
shengyang 80db5f11e1 update 2020-01-02 11:01:57 +08:00
chenxuqiang 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
2020-01-01 21:50:45 -05:00
Martin Kroeker 44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker 86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00
Martin Kroeker 375b1875c8
[WIP] Update LAPACK to 3.9.0 (#2353)
* Update make.inc entries for LAPACK 3.9.0

Reference-LAPACK PR 347 changed some variable names and relative paths

* Update LAPACK to 3.9.0

* Add new functions from LAPACK 3.9.0

* Add new functions from LAPACK 3.9.0

* Restore LOADER command 

as it makes it easier to specify pthread as needed

* Restore LOADER

* Restore EIG/LIN prefixes in cmdbase

* add binary path to lapack_testing.py call

* Restore OpenMP version check

* Restore OpenMP version check

* Restore fix for out-of-bounds array accesses

from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker 6c85cb1869
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker 995768bbc5
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h 96ad579428 add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang 8d84403205 Use arm neon instructions to optimize ncopy operation
modified:   KERNEL.ARMV8
	modified:   KERNEL.TSV110
	new file:   sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
shengyang 8729db117c modified: ctest/din3
modified:   ctest/sin3
2019-12-31 15:59:52 +08:00
w00421467 0833a4846a Use arm neon instructions to optimize sgemm_beta operation 2019-12-31 10:42:03 +08:00
zq 50f7fc1401 [WIP] Use arm neon instructions to optimize tcopy operation 2019-12-31 10:21:23 +08:00
w00421467 d1b53806be Merge remote-tracking branch 'pub/develop' into develop 2019-12-31 10:13:24 +08:00
wjc404 a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
wjc404 700fe5b5ee
Add files via upload 2019-12-30 17:18:59 +08:00
wjc404 bb2729c855
Update CONTRIBUTORS.md 2019-12-30 16:11:37 +08:00
wjc404 aae44d040d
Update CONTRIBUTORS.md 2019-12-30 16:10:08 +08:00
wjc404 6362c34ee6
Update param.h 2019-12-30 16:08:19 +08:00
wjc404 f60840c420
Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404 109e18cd96
Update KERNEL.HASWELL 2019-12-30 16:03:24 +08:00
wjc404 ae1579be13
Create zgemm3m_kernel_4x4_haswell.c 2019-12-30 16:02:51 +08:00
w00421467 3ccf8885ac prefetching for dgemm_beta 2019-12-30 11:45:49 +08:00
Martin Kroeker 454847588e
Update LAPACK to 3.9.0 2019-12-29 21:27:18 +01:00
Martin Kroeker 0257f26488
Merge pull request #21 from xianyi/develop
rebase
2019-12-29 18:08:55 +01:00
Martin Kroeker c45b7aef14
Merge pull request #2348 from wjc404/develop
AVX2 CGEMM3M kernel
2019-12-28 20:07:56 +01:00
wjc404 312060d0d6
Update CONTRIBUTORS.md 2019-12-27 23:36:13 +08:00
wjc404 cd765f094b
Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
wjc404 64639f440f
Update param.h 2019-12-27 18:06:42 +08:00
wjc404 3a66c8cac1
Update KERNEL.ZEN 2019-12-27 18:04:08 +08:00