Commit Graph

4019 Commits

Author SHA1 Message Date
w00421467
aeef942c4f use arm neon instructions to optimize gemm beta operation 2019-12-17 10:00:13 +08:00
Martin Kroeker
445ca2f418 Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan
13226e3101 driver: more reasonable thread wait timeout on Windows.
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.

Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker
1a6ea8ee6d Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6 Merge pull request #2337 from martin-frbg/issue2336
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429 Support two-digit version numbers in gcc version check
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7 AIX changes for Power8 2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538 Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat
3938e59569 AIX changes for Power8 2019-12-04 00:23:46 -06:00
Martin Kroeker
9d5079008f Merge pull request #2334 from martin-frbg/fix2228
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker
3518617f5b Add Intel Goldmont+ cpuid
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00
Martin Kroeker
715f4650d9 Delete stray copy of dynamic.c from PR 2228 2019-12-03 08:24:10 +01:00
Martin Kroeker
10705183ce Merge pull request #20 from xianyi/develop
Rebase
2019-12-03 08:22:40 +01:00
Martin Kroeker
235599f17a Merge pull request #2329 from isuruf/patch-1
Workaround an ICE in clang 9.0.0
2019-12-02 08:30:43 +01:00
Isuru Fernando
b863b32ac5 Workaround an ICE in clang 9.0.0
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker
dd04143d4a Merge pull request #2328 from martin-frbg/ppc9
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker
f3a6164bff Merge pull request #2324 from antonblanchard/power9_segv
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker
dedd822d1a Fix caxpy/caxpyc naming in localentry 2019-11-29 23:56:57 +01:00
Martin Kroeker
2181fb7047 Fix caxpy/caxpyc naming in localentry 2019-11-29 23:54:15 +01:00
Martin Kroeker
a9b62c03f8 Substitute precompiled gcc7 codes only when gcc is older than 9.x 2019-11-29 23:49:50 +01:00
Martin Kroeker
97762234f9 Add variable for gcc >=9 test
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
Martin Kroeker
948d11fc51 Merge pull request #19 from xianyi/develop
rebase
2019-11-29 23:44:09 +01:00
Martin Kroeker
c815b8fb85 Merge pull request #2323 from wjc404/develop
some optimizations of AVX512 DGEMM
2019-11-28 20:55:16 +01:00
wjc404
e20709e976 Update param.h 2019-11-28 19:57:50 +08:00
wjc404
934e601e93 Update dgemm_kernel_4x8_skylakex_2.c 2019-11-28 19:56:35 +08:00
Martin Kroeker
a4c3668f99 Merge pull request #2321 from martin-frbg/issue2319
Fix race conditions in multithreaded GEMM3M
2019-11-28 09:30:24 +01:00
Martin Kroeker
867232c6a4 Merge pull request #2327 from martin-frbg/travisosx
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
2019-11-28 08:43:45 +01:00
Martin Kroeker
5aaf70ef95 Merge pull request #2326 from xianyi/revert-2325-travisosx
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:17:19 +01:00
Martin Kroeker
ae2a0995cc Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker
83dae28ae2 Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now" 2019-11-28 00:09:06 +01:00
Martin Kroeker
da986d2e83 Merge pull request #2325 from martin-frbg/travisosx
Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now
2019-11-27 21:59:36 +01:00
Martin Kroeker
6bc487de35 Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-27 15:10:57 +01:00
Anton Blanchard
cf2a8e410c Fix SEGV in cdot_power9
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00
wjc404
eb1e9c8c92 some optimizations 2019-11-26 14:12:20 +08:00
Martin Kroeker
f95989cbc1 Fix AVX512 capability test (always returning zero)
from #2322
2019-11-23 22:38:07 +01:00
Martin Kroeker
f3065a0eed Fix race conditions in multithreaded GEMM3M
by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago
2019-11-23 19:54:56 +01:00
Martin Kroeker
04226f1e97 Add the cpuid of the business/rackmount version of z15 as well 2019-11-21 18:14:29 +01:00
Martin Kroeker
0925ef70db Merge pull request #2316 from sharkcz/s390x
zarch: treat z15 as z14 instead of generic
2019-11-21 18:03:00 +01:00
Martin Kroeker
371e6f73d4 Merge pull request #2317 from aarnez/develop
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
2019-11-21 17:59:21 +01:00
Andreas Arnez
d117dfd505 Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead.  The mismatch causes a build error.  This is fixed.
2019-11-21 13:49:13 +01:00
Dan Horák
883c39773a zarch: treat z15 as z14 instead of generic
Signed-off-by: Dan Horák <dan@danny.cz>
2019-11-21 12:53:23 +01:00
Martin Kroeker
b09b5be0a4 Merge pull request #2315 from ewanglong/develop
revised fix windows compatible for #2313
2019-11-21 05:06:44 +01:00
Wang, Long
bfb5fbdb4d revised fix windows compatible for #2313
Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-21 10:22:58 +08:00
Martin Kroeker
3da6d66da9 Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
Fix usage of TerminateThread() causing critical section corruption.
2019-11-20 16:16:35 +01:00
Martin Kroeker
08fa83aba2 Merge pull request #2312 from martin-frbg/power8be
Further Power8 big-endian corrections
2019-11-20 15:12:06 +01:00
Martin Kroeker
63d3ee8dfc Merge pull request #2313 from ewanglong/develop
Fix the integer overflow issue for large matrix size
2019-11-20 14:49:15 +01:00
Wang, Long
1191db1a49 For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-20 21:30:47 +08:00
Jehan
1f6071590d Fix usage of TerminateThread() causing critical section corruption.
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:

First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread)

Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).

And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
2019-11-20 13:00:49 +01:00
Wang, Long
0caf1434c9 Fix the integer overflow issue for large matrix size
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.

Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-20 14:11:17 +08:00
Martin Kroeker
73128f3883 Merge pull request #2310 from martin-frbg/ppc440
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
2019-11-17 23:19:48 +01:00