Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
...
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
...
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429
Support two-digit version numbers in gcc version check
...
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7
AIX changes for Power8
2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC ( #2332 )
...
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
Kavana Bhat
3938e59569
AIX changes for Power8
2019-12-04 00:23:46 -06:00
Martin Kroeker
9d5079008f
Merge pull request #2334 from martin-frbg/fix2228
...
Remove misplaced file
2019-12-03 22:23:52 +01:00
Martin Kroeker
3518617f5b
Add Intel Goldmont+ cpuid
...
was originally in #2228 but that PR had misplaced the file in the toplevel directory
2019-12-03 08:32:29 +01:00
Martin Kroeker
715f4650d9
Delete stray copy of dynamic.c from PR 2228
2019-12-03 08:24:10 +01:00
Martin Kroeker
10705183ce
Merge pull request #20 from xianyi/develop
...
Rebase
2019-12-03 08:22:40 +01:00
Martin Kroeker
235599f17a
Merge pull request #2329 from isuruf/patch-1
...
Workaround an ICE in clang 9.0.0
2019-12-02 08:30:43 +01:00
Isuru Fernando
b863b32ac5
Workaround an ICE in clang 9.0.0
...
This bug is not there in 8.x nor in the 9.0 daily snapshot.
2019-12-01 12:59:46 -06:00
Martin Kroeker
dd04143d4a
Merge pull request #2328 from martin-frbg/ppc9
...
Fix precompiled kernels on POWER9 and make their use conditional on (old) gcc version
2019-11-30 12:23:57 +01:00
Martin Kroeker
f3a6164bff
Merge pull request #2324 from antonblanchard/power9_segv
...
Fix SEGV in cdot_power9
2019-11-30 00:03:42 +01:00
Martin Kroeker
dedd822d1a
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:56:57 +01:00
Martin Kroeker
2181fb7047
Fix caxpy/caxpyc naming in localentry
2019-11-29 23:54:15 +01:00
Martin Kroeker
a9b62c03f8
Substitute precompiled gcc7 codes only when gcc is older than 9.x
2019-11-29 23:49:50 +01:00
Martin Kroeker
97762234f9
Add variable for gcc >=9 test
...
used in KERNEL.POWER9
2019-11-29 23:47:23 +01:00
Martin Kroeker
948d11fc51
Merge pull request #19 from xianyi/develop
...
rebase
2019-11-29 23:44:09 +01:00
Martin Kroeker
c815b8fb85
Merge pull request #2323 from wjc404/develop
...
some optimizations of AVX512 DGEMM
2019-11-28 20:55:16 +01:00
wjc404
e20709e976
Update param.h
2019-11-28 19:57:50 +08:00
wjc404
934e601e93
Update dgemm_kernel_4x8_skylakex_2.c
2019-11-28 19:56:35 +08:00
Martin Kroeker
a4c3668f99
Merge pull request #2321 from martin-frbg/issue2319
...
Fix race conditions in multithreaded GEMM3M
2019-11-28 09:30:24 +01:00
Martin Kroeker
867232c6a4
Merge pull request #2327 from martin-frbg/travisosx
...
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
2019-11-28 08:43:45 +01:00
Martin Kroeker
5aaf70ef95
Merge pull request #2326 from xianyi/revert-2325-travisosx
...
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:17:19 +01:00
Martin Kroeker
ae2a0995cc
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-28 00:15:36 +01:00
Martin Kroeker
83dae28ae2
Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now"
2019-11-28 00:09:06 +01:00
Martin Kroeker
da986d2e83
Merge pull request #2325 from martin-frbg/travisosx
...
Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now
2019-11-27 21:59:36 +01:00
Martin Kroeker
6bc487de35
Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now
...
Travis recently appears unable to find a matching homebrew package for 32bit gfortran,
and the IOS crossbuild suffered from excessive output due to the known problem with "ASMNAME redefined"
warnings when CFLAGS is set in the environment
2019-11-27 15:10:57 +01:00
Anton Blanchard
cf2a8e410c
Fix SEGV in cdot_power9
...
We were corrupting r2 because the local entry wasn't being
setup correctly.
2019-11-26 21:55:04 -07:00
wjc404
eb1e9c8c92
some optimizations
2019-11-26 14:12:20 +08:00
Martin Kroeker
f95989cbc1
Fix AVX512 capability test (always returning zero)
...
from #2322
2019-11-23 22:38:07 +01:00
Martin Kroeker
f3065a0eed
Fix race conditions in multithreaded GEMM3M
...
by adding barriers (and a mutex lock for the non-OpenMP case) like it was already done for GEMM in level3_thread.c some time ago
2019-11-23 19:54:56 +01:00
Martin Kroeker
04226f1e97
Add the cpuid of the business/rackmount version of z15 as well
2019-11-21 18:14:29 +01:00
Martin Kroeker
0925ef70db
Merge pull request #2316 from sharkcz/s390x
...
zarch: treat z15 as z14 instead of generic
2019-11-21 18:03:00 +01:00
Martin Kroeker
371e6f73d4
Merge pull request #2317 from aarnez/develop
...
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
2019-11-21 17:59:21 +01:00
Andreas Arnez
d117dfd505
Change bad usage of "asum" to "sum" in ZARCH versions of ?sum
...
The ZARCH implementations of ?sum contain a cut & paste-error: An inline
assembly argument is named "sum", but the assembly references "asum"
instead. The mismatch causes a build error. This is fixed.
2019-11-21 13:49:13 +01:00
Dan Horák
883c39773a
zarch: treat z15 as z14 instead of generic
...
Signed-off-by: Dan Horák <dan@danny.cz>
2019-11-21 12:53:23 +01:00
Martin Kroeker
b09b5be0a4
Merge pull request #2315 from ewanglong/develop
...
revised fix windows compatible for #2313
2019-11-21 05:06:44 +01:00
Wang, Long
bfb5fbdb4d
revised fix windows compatible for #2313
...
Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-21 10:22:58 +08:00
Martin Kroeker
3da6d66da9
Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash
...
Fix usage of TerminateThread() causing critical section corruption.
2019-11-20 16:16:35 +01:00
Martin Kroeker
08fa83aba2
Merge pull request #2312 from martin-frbg/power8be
...
Further Power8 big-endian corrections
2019-11-20 15:12:06 +01:00
Martin Kroeker
63d3ee8dfc
Merge pull request #2313 from ewanglong/develop
...
Fix the integer overflow issue for large matrix size
2019-11-20 14:49:15 +01:00
Wang, Long
1191db1a49
For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length
...
Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-20 21:30:47 +08:00
Jehan
1f6071590d
Fix usage of TerminateThread() causing critical section corruption.
...
This patch was submitted to the GIMP project by a publisher wishing to
keep confidentiality (hence anonymously). I just pass along the patch.
Here is the patch explanation which came with:
First they remind us what Microsoft documentation says about
TerminateThread:
> TerminateThread is a dangerous function that should only be used in
> the most extreme cases. You should call TerminateThread only if you
> know exactly what the target thread is doing, and you control all of
> the code that the target thread could possibly be running at the time
> of the termination.
(https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-terminatethread )
Then they say that 5 milliseconds time-out might not be long enough for
the thread to exit gracefully. They propose to set it to a much higher
value (for instance here 5 seconds).
And finally you should always check the return value of
WaitForSingleObject(). In particular you want to run TerminateThread()
only if WaitForSingleObject() failed, not on success case.
2019-11-20 13:00:49 +01:00
Wang, Long
0caf1434c9
Fix the integer overflow issue for large matrix size
...
For large matrix, e.g. M=N=K, and M>1290, int mnk=M*N*K will overflow.
This will lead to wrong branching to single-threading. The performance
is downgraded significantly.
Signed-off-by: Wang, Long <long1.wang@intel.com>
2019-11-20 14:11:17 +08:00
Martin Kroeker
73128f3883
Merge pull request #2310 from martin-frbg/ppc440
...
Fix PPC440 big-endian support and disable the QCDOC qalloc routine by default
2019-11-17 23:19:48 +01:00
Martin Kroeker
cad0d150db
Define alternate kernels for big-endian POWER8
2019-11-17 23:12:10 +01:00
Martin Kroeker
eba0aeb7cd
Fix compilation for big-endian POWER8
2019-11-17 22:58:32 +01:00
Martin Kroeker
0c07c356c1
Define alternate kernels for big-endian PPC440
2019-11-17 19:25:08 +01:00