Commit Graph

7452 Commits

Author SHA1 Message Date
wjc404 4c35b8dbaa
Update gemm3m_level3.c 2019-12-27 18:03:01 +08:00
wjc404 ed9af2f7da
Update KERNEL.HASWELL 2019-12-27 18:01:38 +08:00
wjc404 5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c 2019-12-27 18:00:55 +08:00
Martin Kroeker 26478eb0d0
Merge pull request #2345 from wjc404/develop
Optimize AVX2 CGEMM
2019-12-25 22:26:41 +01:00
wjc404 eeecd623d8
Update cgemm_kernel_8x2_haswell.c 2019-12-24 00:40:16 +08:00
wjc404 3ce6bcdb5f
Update CONTRIBUTORS.md 2019-12-24 00:30:16 +08:00
wjc404 6fbe51072b
Update CONTRIBUTORS.md 2019-12-24 00:24:40 +08:00
wjc404 611445c7f8
Update param.h 2019-12-23 23:44:55 +08:00
wjc404 2cd9306bb5
Update KERNEL.ZEN 2019-12-23 23:42:30 +08:00
wjc404 c418c81224
Update KERNEL.HASWELL 2019-12-23 23:41:44 +08:00
wjc404 025741f16a
Fast Haswell CGEMM kernel 2019-12-23 23:40:03 +08:00
Martin Kroeker 0ae49d2990
Merge pull request #2344 from wjc404/develop
Optimize AVX2 ZGEMM
2019-12-21 12:16:55 +01:00
wjc404 105e26e12a
Adjust Haswell ZGEMM blocking parameters 2019-12-21 14:38:51 +08:00
wjc404 f41d52665d
Fast Haswell ZGEMM kernel 2019-12-21 14:37:06 +08:00
wjc404 d573d24de7
Fast Haswell ZGEMM kernel 2019-12-21 14:35:15 +08:00
Martin Kroeker 31d6c2eb7d
Merge pull request #2340 from Zeyiii/develop
[WIP] Use arm neon instructions to optimize gemm beta operation
2019-12-20 08:38:57 +01:00
w00421467 b7cc69ee62 declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL 2019-12-20 10:11:50 +08:00
w00421467 aeef942c4f use arm neon instructions to optimize gemm beta operation 2019-12-17 10:00:13 +08:00
Martin Kroeker 445ca2f418
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan 13226e3101 driver: more reasonable thread wait timeout on Windows.
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.

Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker 1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker b28db31429
Support two-digit version numbers in gcc version check
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat 6baa9b07d7 AIX changes for Power8 2019-12-06 04:33:32 -06:00
Martin Kroeker a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat 3938e59569 AIX changes for Power8 2019-12-04 00:23:46 -06:00
Martin Kroeker 9d5079008f
Merge pull request #2334 from martin-frbg/fix2228
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker 3518617f5b
Add Intel Goldmont+ cpuid
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00
Martin Kroeker 715f4650d9
Delete stray copy of dynamic.c from PR 2228 2019-12-03 08:24:10 +01:00
Martin Kroeker 10705183ce
Merge pull request #20 from xianyi/develop
Rebase
2019-12-03 08:22:40 +01:00
Martin Kroeker 235599f17a
Merge pull request #2329 from isuruf/patch-1
Workaround an ICE in clang 9.0.0
2019-12-02 08:30:43 +01:00
Isuru Fernando b863b32ac5 Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker dedd822d1a
Fix caxpy/caxpyc naming in localentry 2019-11-29 23:56:57 +01:00
Martin Kroeker 2181fb7047
Fix caxpy/caxpyc naming in localentry 2019-11-29 23:54:15 +01:00
Martin Kroeker a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x 2019-11-29 23:49:50 +01:00
Martin Kroeker 97762234f9
Add variable for gcc >=9 test
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
Martin Kroeker 948d11fc51
Merge pull request #19 from xianyi/develop
rebase
2019-11-29 23:44:09 +01:00
Martin Kroeker c815b8fb85
Merge pull request #2323 from wjc404/develop
some optimizations of AVX512 DGEMM
2019-11-28 20:55:16 +01:00
wjc404 e20709e976
Update param.h 2019-11-28 19:57:50 +08:00
wjc404 934e601e93
Update dgemm_kernel_4x8_skylakex_2.c 2019-11-28 19:56:35 +08:00
Martin Kroeker a4c3668f99
Merge pull request #2321 from martin-frbg/issue2319
Fix race conditions in multithreaded GEMM3M
2019-11-28 09:30:24 +01:00
Martin Kroeker 867232c6a4
Merge pull request #2327 from martin-frbg/travisosx
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
2019-11-28 08:43:45 +01:00
Martin Kroeker 5aaf70ef95
Merge pull request #2326 from xianyi/revert-2325-travisosx
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:17:19 +01:00
Martin Kroeker ae2a0995cc
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker 83dae28ae2
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now" 2019-11-28 00:09:06 +01:00
Martin Kroeker da986d2e83
Merge pull request #2325 from martin-frbg/travisosx
Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now
2019-11-27 21:59:36 +01:00
Martin Kroeker 6bc487de35
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-27 15:10:57 +01:00
Anton Blanchard cf2a8e410c Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00