Martin Kroeker
b3cbd60d7a
Remove PGI from list again as it is actually still not capable
2020-02-08 10:20:13 +01:00
Martin Kroeker
70199d1905
Merge pull request #2389 from Zeyiii/develop
...
Fix bugs in benchmark of gemv
2020-02-07 16:05:46 +01:00
Martin Kroeker
cfe63d8cc2
Remove OpenMP libraries from link list
2020-02-07 16:03:51 +01:00
Martin Kroeker
d55b10830f
Remove OpenMP libraries from link list
2020-02-07 16:02:17 +01:00
Martin Kroeker
c1c10cbb21
Merge pull request #2384 from wjc404/develop
...
Optimize AVX512 DGEMM (& DTRMM)
2020-02-07 13:47:12 +01:00
Martin Kroeker
5989841524
Add PGI to avx512-supporting compilers
2020-02-07 13:01:31 +01:00
Martin Kroeker
68a43db358
Fix utest compilation with PGI
2020-02-07 10:15:18 +01:00
Martin Kroeker
9694037b23
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test
2020-02-07 10:09:25 +01:00
Martin Kroeker
71faa1c1a7
Merge pull request #24 from xianyi/develop
...
rebase
2020-02-07 10:03:02 +01:00
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 02:14:10 +00:00
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
2020-02-06 01:47:46 +00:00
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
2020-02-06 01:46:36 +00:00
w00421467
ce9ea8f826
Fix another branch
2020-02-05 15:07:18 +08:00
w00421467
0b909203cb
Fix bugs in benchmark of gemv
2020-02-05 14:53:37 +08:00
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
2020-02-05 13:36:57 +08:00
wjc404
2f96a2c55b
Update trmm_R.c
2020-02-05 10:15:02 +08:00
wjc404
833bd0f8ff
Update trmm_L.c
2020-02-05 10:09:41 +08:00
wjc404
77b8f49556
Update level3_thread.c
2020-02-04 20:33:08 +08:00
wjc404
1c3e20ce48
Update level3.c
2020-02-04 20:30:23 +08:00
wjc404
83b6be7976
Update param.h
2020-02-04 19:55:26 +08:00
wjc404
081b188529
Update KERNEL.SKYLAKEX
2020-02-03 21:38:08 +08:00
wjc404
f3f969f681
Update param.h
2020-02-03 21:34:12 +08:00
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
2020-02-03 21:32:56 +08:00
Martin Kroeker
8d2a796f49
Merge pull request #2378 from martin-frbg/issue2377
...
Add -march option for AVX512 in cmake as well
2020-01-30 17:07:19 +01:00
Martin Kroeker
8dc9fd4dfe
Add -march option for AVX512
2020-01-30 12:41:18 +01:00
Martin Kroeker
abc67bdd74
Merge pull request #2375 from ewanglong/master
...
fix a few performance drop in some matrix size per data type
2020-01-30 10:27:29 +01:00
Martin Kroeker
1f62a82789
Merge pull request #2376 from wjc404/develop
...
Fix remaining bugs in parallel GEMM3M
2020-01-23 21:50:19 +01:00
wjc404
e9fb8f62b1
Update level3_gemm3m_thread.c
2020-01-22 17:40:03 +00:00
Wang,Long
fbf4f48f4a
fix a few performance drop in some matrix size per data type
...
Signed-off-by: Wang,Long <long1.wang@intel.com>
2020-01-22 15:15:04 +00:00
Martin Kroeker
b9ad450295
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
...
Optimize genenal Gemm Beta
2020-01-21 15:05:38 +01:00
Martin Kroeker
e011ad820a
Merge pull request #2372 from martin-frbg/winexit
...
Do not run any cleanup if the program is exiting anyway
2020-01-21 14:56:45 +01:00
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
2020-01-20 11:49:42 +08:00
Martin Kroeker
23f322f997
Do not run any cleanup if the program is exiting anyway
...
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
2020-01-19 13:28:27 +01:00
Martin Kroeker
093d37de8d
Merge pull request #2371 from martin-frbg/issue2370
...
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
2020-01-18 20:39:34 +01:00
Martin Kroeker
d65e9a2bbd
Merge pull request #2253 from thrasibule/xerbla
...
fix error messages
2020-01-18 20:39:04 +01:00
Martin Kroeker
78100b8093
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
...
as suggested by hjmndv in #2370
2020-01-18 15:06:39 +01:00
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
...
Improve paralleled SGEMM performance on SKYLAKEX CPUs
2020-01-15 21:13:43 +01:00
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
2020-01-13 16:59:23 +08:00
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
2020-01-13 16:58:54 +08:00
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
...
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
wjc404
1c67567008
improve skylakex paralleled sgemm performance
2020-01-13 16:26:03 +08:00
Martin Kroeker
4e979bf75b
Merge pull request #2366 from martin-frbg/install390
...
Add new file lapack.h from LAPACK 3.9.0 to installable headers
2020-01-13 09:00:21 +01:00
Martin Kroeker
daa4310db5
Install new lapack.h
...
new file in LAPACK 3.9.0, split off from lapacke.h
2020-01-12 22:00:50 +01:00
Martin Kroeker
b8f3605132
Merge pull request #23 from xianyi/develop
...
rebase
2020-01-12 21:57:23 +01:00
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
...
Fix SKYLAKEX STRMM issues
2020-01-09 23:23:09 +01:00
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
2020-01-09 13:48:41 +08:00
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
...
Optimize AVX2 SGEMM & STRMM
2020-01-08 16:20:28 +01:00
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
2020-01-07 11:22:46 +08:00
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
2020-01-06 20:11:36 +08:00
wjc404
9f5cdc49d4
Update CONTRIBUTORS.md
2020-01-06 12:28:43 +08:00