Commit Graph

4058 Commits

Author SHA1 Message Date
wjc404
e5dcdeb550 Update sgemm_direct_skylakex.c 2020-01-13 16:59:23 +08:00
wjc404
952cc2ba38 Update sgemm_kernel_16x4_skylakex_2.c 2020-01-13 16:58:54 +08:00
wjc404
feaafbedd3 make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
2020-01-13 16:28:41 +08:00
wjc404
1c67567008 improve skylakex paralleled sgemm performance 2020-01-13 16:26:03 +08:00
wjc404
3a100b2797 Update KERNEL.SKYLAKEX 2020-01-09 13:48:41 +08:00
wjc404
bd4c032f52 Update sgemm_kernel_8x4_haswell.c 2020-01-07 11:22:46 +08:00
wjc404
9dc9b7b95e Update sgemm_kernel_8x4_haswell.c 2020-01-06 20:11:36 +08:00
wjc404
9f5cdc49d4 Update CONTRIBUTORS.md 2020-01-06 12:28:43 +08:00
wjc404
b7b408a120 optimize AVX2 SGEMM 2020-01-06 12:16:09 +08:00
wjc404
92b10212de optimize AVX2 SGEMM 2020-01-06 12:11:21 +08:00
wjc404
b73bf01378 optimize AVX2 SGEMM 2020-01-06 12:09:14 +08:00
wjc404
eb3c9f1db9 optimize AVX2 SGEMM 2020-01-06 12:07:02 +08:00
wjc404
a0f0a802fc Update zgemm3m_kernel_4x4_haswell.c 2019-12-30 17:33:42 +08:00
wjc404
700fe5b5ee Add files via upload 2019-12-30 17:18:59 +08:00
wjc404
bb2729c855 Update CONTRIBUTORS.md 2019-12-30 16:11:37 +08:00
wjc404
aae44d040d Update CONTRIBUTORS.md 2019-12-30 16:10:08 +08:00
wjc404
6362c34ee6 Update param.h 2019-12-30 16:08:19 +08:00
wjc404
f60840c420 Update KERNEL.ZEN 2019-12-30 16:04:23 +08:00
wjc404
109e18cd96 Update KERNEL.HASWELL 2019-12-30 16:03:24 +08:00
wjc404
ae1579be13 Create zgemm3m_kernel_4x4_haswell.c 2019-12-30 16:02:51 +08:00
wjc404
312060d0d6 Update CONTRIBUTORS.md 2019-12-27 23:36:13 +08:00
wjc404
cd765f094b Update cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:23:29 +08:00
wjc404
64639f440f Update param.h 2019-12-27 18:06:42 +08:00
wjc404
3a66c8cac1 Update KERNEL.ZEN 2019-12-27 18:04:08 +08:00
wjc404
4c35b8dbaa Update gemm3m_level3.c 2019-12-27 18:03:01 +08:00
wjc404
ed9af2f7da Update KERNEL.HASWELL 2019-12-27 18:01:38 +08:00
wjc404
5fd1edead9 Create cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:00:55 +08:00
wjc404
eeecd623d8 Update cgemm_kernel_8x2_haswell.c 2019-12-24 00:40:16 +08:00
wjc404
3ce6bcdb5f Update CONTRIBUTORS.md 2019-12-24 00:30:16 +08:00
wjc404
6fbe51072b Update CONTRIBUTORS.md 2019-12-24 00:24:40 +08:00
wjc404
611445c7f8 Update param.h 2019-12-23 23:44:55 +08:00
wjc404
2cd9306bb5 Update KERNEL.ZEN 2019-12-23 23:42:30 +08:00
wjc404
c418c81224 Update KERNEL.HASWELL 2019-12-23 23:41:44 +08:00
wjc404
025741f16a Fast Haswell CGEMM kernel 2019-12-23 23:40:03 +08:00
wjc404
105e26e12a Adjust Haswell ZGEMM blocking parameters 2019-12-21 14:38:51 +08:00
wjc404
f41d52665d Fast Haswell ZGEMM kernel 2019-12-21 14:37:06 +08:00
wjc404
d573d24de7 Fast Haswell ZGEMM kernel 2019-12-21 14:35:15 +08:00
Martin Kroeker
31d6c2eb7d Merge pull request #2340 from Zeyiii/develop
[WIP] Use arm neon instructions to optimize gemm beta operation
2019-12-20 08:38:57 +01:00
w00421467
b7cc69ee62 declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL 2019-12-20 10:11:50 +08:00
w00421467
aeef942c4f use arm neon instructions to optimize gemm beta operation 2019-12-17 10:00:13 +08:00
Martin Kroeker
445ca2f418 Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan
13226e3101 driver: more reasonable thread wait timeout on Windows.
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.

Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker
1a6ea8ee6d Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6 Merge pull request #2337 from martin-frbg/issue2336
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429 Support two-digit version numbers in gcc version check
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7 AIX changes for Power8 2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538 Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat
3938e59569 AIX changes for Power8 2019-12-04 00:23:46 -06:00
Martin Kroeker
9d5079008f Merge pull request #2334 from martin-frbg/fix2228
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker
3518617f5b Add Intel Goldmont+ cpuid
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00