Commit Graph

4617 Commits

Author SHA1 Message Date
wjc404 2f96a2c55b
Update trmm_R.c 2020-02-05 10:15:02 +08:00
wjc404 833bd0f8ff
Update trmm_L.c 2020-02-05 10:09:41 +08:00
wjc404 77b8f49556
Update level3_thread.c 2020-02-04 20:33:08 +08:00
wjc404 1c3e20ce48
Update level3.c 2020-02-04 20:30:23 +08:00
wjc404 83b6be7976
Update param.h 2020-02-04 19:55:26 +08:00
wjc404 081b188529
Update KERNEL.SKYLAKEX 2020-02-03 21:38:08 +08:00
wjc404 f3f969f681
Update param.h 2020-02-03 21:34:12 +08:00
wjc404 8019e70211
AVX512 16x2 DGEMM kernel 2020-02-03 21:32:56 +08:00
Martin Kroeker 8d2a796f49
Merge pull request #2378 from martin-frbg/issue2377
Add -march option for AVX512 in cmake as well
2020-01-30 17:07:19 +01:00
Martin Kroeker 8dc9fd4dfe
Add -march option for AVX512 2020-01-30 12:41:18 +01:00
Martin Kroeker abc67bdd74
Merge pull request #2375 from ewanglong/master
fix a few performance drop in some matrix size per data type
2020-01-30 10:27:29 +01:00
Martin Kroeker 1f62a82789
Merge pull request #2376 from wjc404/develop
Fix remaining bugs in parallel GEMM3M
2020-01-23 21:50:19 +01:00
wjc404 e9fb8f62b1
Update level3_gemm3m_thread.c 2020-01-22 17:40:03 +00:00
Wang,Long fbf4f48f4a fix a few performance drop in some matrix size per data type
Signed-off-by: Wang,Long <long1.wang@intel.com>
2020-01-22 15:15:04 +00:00
Martin Kroeker b9ad450295
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
Optimize genenal Gemm Beta
2020-01-21 15:05:38 +01:00
Martin Kroeker e011ad820a
Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
2020-01-21 14:56:45 +01:00
Qiyu8 ff42e68652 Optimize genenal Gemm Beta 2020-01-20 11:49:42 +08:00
Martin Kroeker 23f322f997
Do not run any cleanup if the program is exiting anyway
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
2020-01-19 13:28:27 +01:00
Martin Kroeker 093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
2020-01-18 20:39:34 +01:00
Martin Kroeker d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
fix error messages
2020-01-18 20:39:04 +01:00
Martin Kroeker 78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
as suggested by hjmndv in #2370
2020-01-18 15:06:39 +01:00
Martin Kroeker 70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
2020-01-15 21:13:43 +01:00
wjc404 e5dcdeb550
Update sgemm_direct_skylakex.c 2020-01-13 16:59:23 +08:00
wjc404 952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c 2020-01-13 16:58:54 +08:00
wjc404 feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
wjc404 1c67567008
improve skylakex paralleled sgemm performance 2020-01-13 16:26:03 +08:00
Martin Kroeker 4e979bf75b
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
2020-01-13 09:00:21 +01:00
Martin Kroeker daa4310db5
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
2020-01-12 22:00:50 +01:00
Martin Kroeker b8f3605132
Merge pull request #23 from xianyi/develop
rebase
2020-01-12 21:57:23 +01:00
Martin Kroeker b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404 3a100b2797
Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
Martin Kroeker 38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404 bd4c032f52
Update sgemm_kernel_8x4_haswell.c 2020-01-07 11:22:46 +08:00
wjc404 9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c 2020-01-06 20:11:36 +08:00
wjc404 9f5cdc49d4
Update CONTRIBUTORS.md 2020-01-06 12:28:43 +08:00
wjc404 b7b408a120
optimize AVX2 SGEMM 2020-01-06 12:16:09 +08:00
wjc404 92b10212de
optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404 b73bf01378
optimize AVX2 SGEMM 2020-01-06 12:09:14 +08:00
wjc404 eb3c9f1db9
optimize AVX2 SGEMM 2020-01-06 12:07:02 +08:00
Martin Kroeker fd2ff2714f
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
2020-01-03 15:03:30 +01:00
Martin Kroeker 2ea2bd99c7
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
2020-01-03 11:10:00 +01:00
Martin Kroeker fbb894948c
Merge pull request #22 from xianyi/develop
rebase
2020-01-03 10:23:25 +01:00
Martin Kroeker e711659c90
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
2020-01-03 09:02:03 +01:00
shengyang 893e6e57c4 modified: ctest/din3 ctest/sin3 2020-01-03 10:03:33 +08:00
Martin Kroeker 456ee2e1f0
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
2020-01-02 22:28:36 +01:00
Martin Kroeker 9998f8ed8b
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
2020-01-02 22:27:44 +01:00
shengyang 80db5f11e1 update 2020-01-02 11:01:57 +08:00
chenxuqiang 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
2020-01-01 21:50:45 -05:00
Martin Kroeker 44028581cc
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
2020-01-01 22:14:16 +01:00
Martin Kroeker 86ab939936
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
2020-01-01 22:13:37 +01:00