Martin Kroeker
375b1875c8
[WIP] Update LAPACK to 3.9.0 ( #2353 )
...
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker
6c85cb1869
Merge pull request #2352 from wjc404/develop
...
AVX2 ZGEMM3M kernel
2019-12-31 18:08:10 +01:00
Martin Kroeker
995768bbc5
Merge pull request #2351 from Zeyiii/develop
...
prefetching for dgemm_beta
2019-12-31 18:07:37 +01:00
int_13h
96ad579428
add in runtime cpu detection for zarch ( #2349 )
...
add in runtime cpu detection for zarch
2019-12-31 18:03:27 +01:00
shengyang
8d84403205
Use arm neon instructions to optimize ncopy operation
...
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
2019-12-31 17:06:35 +08:00
shengyang
8729db117c
modified: ctest/din3
...
modified: ctest/sin3
2019-12-31 15:59:52 +08:00
w00421467
0833a4846a
Use arm neon instructions to optimize sgemm_beta operation
2019-12-31 10:42:03 +08:00
zq
50f7fc1401
[WIP] Use arm neon instructions to optimize tcopy operation
2019-12-31 10:21:23 +08:00
w00421467
d1b53806be
Merge remote-tracking branch 'pub/develop' into develop
2019-12-31 10:13:24 +08:00
wjc404
a0f0a802fc
Update zgemm3m_kernel_4x4_haswell.c
2019-12-30 17:33:42 +08:00
wjc404
700fe5b5ee
Add files via upload
2019-12-30 17:18:59 +08:00
wjc404
bb2729c855
Update CONTRIBUTORS.md
2019-12-30 16:11:37 +08:00
wjc404
aae44d040d
Update CONTRIBUTORS.md
2019-12-30 16:10:08 +08:00
wjc404
6362c34ee6
Update param.h
2019-12-30 16:08:19 +08:00
wjc404
f60840c420
Update KERNEL.ZEN
2019-12-30 16:04:23 +08:00
wjc404
109e18cd96
Update KERNEL.HASWELL
2019-12-30 16:03:24 +08:00
wjc404
ae1579be13
Create zgemm3m_kernel_4x4_haswell.c
2019-12-30 16:02:51 +08:00
w00421467
3ccf8885ac
prefetching for dgemm_beta
2019-12-30 11:45:49 +08:00
Martin Kroeker
454847588e
Update LAPACK to 3.9.0
2019-12-29 21:27:18 +01:00
Martin Kroeker
0257f26488
Merge pull request #21 from xianyi/develop
...
rebase
2019-12-29 18:08:55 +01:00
Martin Kroeker
c45b7aef14
Merge pull request #2348 from wjc404/develop
...
AVX2 CGEMM3M kernel
2019-12-28 20:07:56 +01:00
wjc404
312060d0d6
Update CONTRIBUTORS.md
2019-12-27 23:36:13 +08:00
wjc404
cd765f094b
Update cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:23:29 +08:00
wjc404
64639f440f
Update param.h
2019-12-27 18:06:42 +08:00
wjc404
3a66c8cac1
Update KERNEL.ZEN
2019-12-27 18:04:08 +08:00
wjc404
4c35b8dbaa
Update gemm3m_level3.c
2019-12-27 18:03:01 +08:00
wjc404
ed9af2f7da
Update KERNEL.HASWELL
2019-12-27 18:01:38 +08:00
wjc404
5fd1edead9
Create cgemm3m_kernel_8x4_haswell.c
2019-12-27 18:00:55 +08:00
Martin Kroeker
26478eb0d0
Merge pull request #2345 from wjc404/develop
...
Optimize AVX2 CGEMM
2019-12-25 22:26:41 +01:00
wjc404
eeecd623d8
Update cgemm_kernel_8x2_haswell.c
2019-12-24 00:40:16 +08:00
wjc404
3ce6bcdb5f
Update CONTRIBUTORS.md
2019-12-24 00:30:16 +08:00
wjc404
6fbe51072b
Update CONTRIBUTORS.md
2019-12-24 00:24:40 +08:00
wjc404
611445c7f8
Update param.h
2019-12-23 23:44:55 +08:00
wjc404
2cd9306bb5
Update KERNEL.ZEN
2019-12-23 23:42:30 +08:00
wjc404
c418c81224
Update KERNEL.HASWELL
2019-12-23 23:41:44 +08:00
wjc404
025741f16a
Fast Haswell CGEMM kernel
2019-12-23 23:40:03 +08:00
Martin Kroeker
0ae49d2990
Merge pull request #2344 from wjc404/develop
...
Optimize AVX2 ZGEMM
2019-12-21 12:16:55 +01:00
wjc404
105e26e12a
Adjust Haswell ZGEMM blocking parameters
2019-12-21 14:38:51 +08:00
wjc404
f41d52665d
Fast Haswell ZGEMM kernel
2019-12-21 14:37:06 +08:00
wjc404
d573d24de7
Fast Haswell ZGEMM kernel
2019-12-21 14:35:15 +08:00
Martin Kroeker
31d6c2eb7d
Merge pull request #2340 from Zeyiii/develop
...
[WIP] Use arm neon instructions to optimize gemm beta operation
2019-12-20 08:38:57 +01:00
w00421467
b7cc69ee62
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
2019-12-20 10:11:50 +08:00
w00421467
aeef942c4f
use arm neon instructions to optimize gemm beta operation
2019-12-17 10:00:13 +08:00
Martin Kroeker
445ca2f418
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
...
driver: more reasonable thread wait timeout on Windows.
2019-12-13 14:57:26 +01:00
Jehan
13226e3101
driver: more reasonable thread wait timeout on Windows.
...
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.
Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
2019-12-13 09:52:33 +01:00
Martin Kroeker
1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
...
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Martin Kroeker
c6ecb195e6
Merge pull request #2337 from martin-frbg/issue2336
...
Support two-digit version numbers in gcc version check
2019-12-07 09:38:06 +01:00
Martin Kroeker
b28db31429
Support two-digit version numbers in gcc version check
...
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
2019-12-06 21:23:56 +01:00
Kavana Bhat
6baa9b07d7
AIX changes for Power8
2019-12-06 04:33:32 -06:00
Martin Kroeker
a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC ( #2332 )
...
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00