Commit Graph

4151 Commits

Author SHA1 Message Date
Martin Kroeker
169be3f097 Merge pull request #2403 from martin-frbg/issue2400
Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont
2020-02-11 13:00:16 +01:00
Martin Kroeker
6ccbb089c2 Merge pull request #2402 from gxw-loongson/develop
Avoid printing the following information on mips and mips64 when check msa
2020-02-11 12:59:53 +01:00
Martin Kroeker
59ebe3636a Merge pull request #2399 from martin-frbg/buffersize
Make BUFFER_SIZE configurable at build time
2020-02-11 12:56:56 +01:00
Martin Kroeker
303bdb673b Fix coretype detection for Intel extended models 6 and 7
affecting Goldmont, Cannon Lake, Ice Lake autodetection
2020-02-10 19:17:32 +01:00
gxw
754433f420 Avoid printing the following information on mips and mips64 when check msa:
"unrecognized command line option ‘-mmsa’"
2020-02-10 19:11:45 +08:00
Martin Kroeker
7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker
c353d8b106 Make BUFFER_SIZE configurable 2020-02-09 23:30:22 +01:00
Martin Kroeker
579be3aa9d Add configuration option for BUFFER_SIZE 2020-02-09 23:28:04 +01:00
Martin Kroeker
449e8ea443 Merge pull request #26 from xianyi/develop
rebase
2020-02-09 23:23:55 +01:00
Martin Kroeker
3bec250cf9 Increment version to 0.3.9.dev 2020-02-09 23:18:44 +01:00
Martin Kroeker
f03dd23e90 Increment version to 0.3.9.dev 2020-02-09 23:18:07 +01:00
Martin Kroeker
fa93d63365 Merge branch 'release-0.3.0' into develop 2020-02-09 23:16:06 +01:00
Martin Kroeker
90e6c66a57 Merge pull request #2397 from martin-frbg/038changes
Update Changelog with changes from 0.3.8
2020-02-09 23:01:52 +01:00
Martin Kroeker
32d97330b3 Update with changes from 0.3.8 2020-02-09 23:00:36 +01:00
Martin Kroeker
29eaf4b6d7 Merge pull request #25 from xianyi/develop
rebase
2020-02-09 22:48:15 +01:00
Martin Kroeker
47c1bf7f4d typo fixes 2020-02-09 01:06:40 +01:00
Martin Kroeker
2b55f0ad30 Merge pull request #2393 from martin-frbg/issue2388
Provide more documentation in README.md
2020-02-09 01:00:33 +01:00
Martin Kroeker
a5b32ab06c Merge pull request #2390 from martin-frbg/pgi
Small corrections for compilation with PGI compilers
2020-02-09 00:13:40 +01:00
Martin Kroeker
50545b19d0 Update CPU and OS support and document DYNAMIC_ARCH option in README.md
prompted by #2388
2020-02-09 00:06:07 +01:00
Martin Kroeker
b3cbd60d7a Remove PGI from list again as it is actually still not capable 2020-02-08 10:20:13 +01:00
Martin Kroeker
70199d1905 Merge pull request #2389 from Zeyiii/develop
Fix bugs in benchmark of gemv
2020-02-07 16:05:46 +01:00
Martin Kroeker
cfe63d8cc2 Remove OpenMP libraries from link list 2020-02-07 16:03:51 +01:00
Martin Kroeker
d55b10830f Remove OpenMP libraries from link list 2020-02-07 16:02:17 +01:00
Martin Kroeker
c1c10cbb21 Merge pull request #2384 from wjc404/develop
Optimize AVX512 DGEMM (& DTRMM)
2020-02-07 13:47:12 +01:00
Martin Kroeker
5989841524 Add PGI to avx512-supporting compilers 2020-02-07 13:01:31 +01:00
Martin Kroeker
68a43db358 Fix utest compilation with PGI 2020-02-07 10:15:18 +01:00
Martin Kroeker
9694037b23 Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test 2020-02-07 10:09:25 +01:00
Martin Kroeker
71faa1c1a7 Merge pull request #24 from xianyi/develop
rebase
2020-02-07 10:03:02 +01:00
wjc404
3447d04eaf Update dgemm_kernel_16x2_skylakex.c 2020-02-06 02:14:10 +00:00
wjc404
8b5cdcc64c Update sgemm_kernel_8x4_haswell.c 2020-02-06 01:47:46 +00:00
wjc404
4e00d96a78 Update dgemm_kernel_16x2_skylakex.c 2020-02-06 01:46:36 +00:00
w00421467
ce9ea8f826 Fix another branch 2020-02-05 15:07:18 +08:00
w00421467
0b909203cb Fix bugs in benchmark of gemv 2020-02-05 14:53:37 +08:00
wjc404
096da2f51a Update dgemm_kernel_16x2_skylakex.c 2020-02-05 13:36:57 +08:00
wjc404
2f96a2c55b Update trmm_R.c 2020-02-05 10:15:02 +08:00
wjc404
833bd0f8ff Update trmm_L.c 2020-02-05 10:09:41 +08:00
wjc404
77b8f49556 Update level3_thread.c 2020-02-04 20:33:08 +08:00
wjc404
1c3e20ce48 Update level3.c 2020-02-04 20:30:23 +08:00
wjc404
83b6be7976 Update param.h 2020-02-04 19:55:26 +08:00
wjc404
081b188529 Update KERNEL.SKYLAKEX 2020-02-03 21:38:08 +08:00
wjc404
f3f969f681 Update param.h 2020-02-03 21:34:12 +08:00
wjc404
8019e70211 AVX512 16x2 DGEMM kernel 2020-02-03 21:32:56 +08:00
Martin Kroeker
8d2a796f49 Merge pull request #2378 from martin-frbg/issue2377
Add -march option for AVX512 in cmake as well
2020-01-30 17:07:19 +01:00
Martin Kroeker
8dc9fd4dfe Add -march option for AVX512 2020-01-30 12:41:18 +01:00
Martin Kroeker
abc67bdd74 Merge pull request #2375 from ewanglong/master
fix a few performance drop in some matrix size per data type
2020-01-30 10:27:29 +01:00
Martin Kroeker
1f62a82789 Merge pull request #2376 from wjc404/develop
Fix remaining bugs in parallel GEMM3M
2020-01-23 21:50:19 +01:00
wjc404
e9fb8f62b1 Update level3_gemm3m_thread.c 2020-01-22 17:40:03 +00:00
Wang,Long
fbf4f48f4a fix a few performance drop in some matrix size per data type
Signed-off-by: Wang,Long <long1.wang@intel.com>
2020-01-22 15:15:04 +00:00
Martin Kroeker
b9ad450295 Merge pull request #2373 from Qiyu8/optimize#gemmbeta
Optimize genenal Gemm Beta
2020-01-21 15:05:38 +01:00
Martin Kroeker
e011ad820a Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
2020-01-21 14:56:45 +01:00