CodesWithWolves
d2bda3b56a
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
...
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
2021-03-31 15:44:25 -04:00
Martin Kroeker
903fd85c85
Merge pull request #3167 from xianyi/fix3126
...
Fix compilation of the benchmarks on older OSX versions
2021-03-27 12:40:42 +01:00
Martin Kroeker
d57c681a6d
Fix compilation on older OSX versions
2021-03-26 22:29:29 +01:00
Martin Kroeker
d7efe5857c
Merge pull request #3165 from martin-frbg/azure-osx
...
Add OSX build to Azure
2021-03-24 14:05:34 +01:00
Martin Kroeker
8fd694c18f
Update .travis.yml
2021-03-24 10:36:29 +01:00
Martin Kroeker
e69b0b1771
Update azure-pipelines.yml
2021-03-24 10:34:24 +01:00
Martin Kroeker
9dc0bfd617
Update azure-pipelines.yml
2021-03-24 08:54:30 +01:00
Martin Kroeker
e6664ec2c9
Update azure-pipelines.yml
2021-03-24 08:41:48 +01:00
Martin Kroeker
dbb33f412f
Update azure-pipelines.yml
2021-03-24 08:30:48 +01:00
Martin Kroeker
70b89a6205
Add OSX build to Azure
2021-03-24 07:50:35 +01:00
Martin Kroeker
07b144855a
Merge pull request #3164 from martin-frbg/travisosxomp
...
Fix xcode12 build on Travis and add OSX/OpenMP job
2021-03-24 06:56:10 +01:00
Martin Kroeker
292a0aed66
Fix xcode12 build and add OSX/OpenMP
2021-03-24 06:55:14 +01:00
Martin Kroeker
42f0201e21
Merge pull request #20 from xianyi/develop
...
rebase
2021-03-22 17:53:43 +01:00
Martin Kroeker
22db876d48
Merge pull request #3158 from austinpagan/Gemm.CZPQ
...
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 20:53:21 +01:00
Martin Kroeker
bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final
...
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
2021-03-19 15:23:12 +01:00
Martin Kroeker
7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
...
Move x86_64 DOMATCOPY_RT back to the C implementation
2021-03-19 15:22:48 +01:00
Gordon Fossum
198adea961
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 10:05:23 -04:00
Martin Kroeker
86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
2021-03-19 11:47:58 +01:00
Martin Kroeker
ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
2021-03-19 11:46:25 +01:00
Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
2021-03-19 11:44:31 +01:00
Martin Kroeker
c4b91bfcf1
Merge pull request #3155 from martin-frbg/issue3152
...
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
2021-03-19 09:55:31 +01:00
Martin Kroeker
0f5e86a0d9
Remove premature entry for DOMATCOPY_RT
2021-03-18 21:53:50 +01:00
Martin Kroeker
7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
2021-03-18 21:28:19 +01:00
Martin Kroeker
1e4b2e98d9
Merge pull request #3154 from martin-frbg/issue3153
...
Fix premature include in getarch_2nd
2021-03-18 12:35:47 +01:00
Martin Kroeker
3fd6ccdf76
Include just the definition of BLASLONG rather than all of common.h
2021-03-18 07:50:19 +01:00
Martin Kroeker
fa9a30b491
Merge pull request #19 from xianyi/develop
...
rebase
2021-03-18 07:47:03 +01:00
Martin Kroeker
d90ca75a6c
Update version to 0.3.14.dev
2021-03-17 21:14:42 +01:00
Martin Kroeker
e107454454
Update version to 0.3.14.dev
2021-03-17 21:14:05 +01:00
Martin Kroeker
d43962d013
Merge pull request #3151 from xianyi/release-0.3.0
...
Merge 0.3.14 release branch back into develop to acquire tag
2021-03-17 21:13:25 +01:00
Martin Kroeker
2f6d35c3d4
Merge pull request #3150 from xianyi/develop
...
Update branch from develop for 0.3.14 release
2021-03-17 20:21:42 +01:00
Martin Kroeker
86de5f768b
Update version to 0.3.14 for release
2021-03-17 20:20:34 +01:00
Martin Kroeker
2663e44724
Update version to 0.3.14 for release
2021-03-17 20:20:00 +01:00
Martin Kroeker
6f2900c164
Merge pull request #3149 from martin-frbg/changelog14
...
Update Changelog for 0.3.14
2021-03-17 20:14:50 +01:00
Martin Kroeker
7888b5127c
Update Changelog for 0.3.14
2021-03-17 16:17:55 +01:00
Martin Kroeker
8808c291b9
Merge pull request #3148 from martin-frbg/issue3145
...
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
2021-03-17 09:05:43 +01:00
Martin Kroeker
8cdf0825de
Add workaround for older gcc on ppc64be not supporting casts in defines
2021-03-16 21:20:05 +01:00
Martin Kroeker
9e0dbe8e59
Merge pull request #18 from xianyi/develop
...
rebase
2021-03-16 21:09:45 +01:00
Martin Kroeker
52f99d3944
Merge pull request #3147 from martin-frbg/issue3146
...
Fix DYNAMIC_ARCH builds with CLANG on ppc64
2021-03-16 20:25:42 +01:00
Martin Kroeker
186368ddc3
Fix compilation with CLANG
2021-03-16 16:52:57 +01:00
Martin Kroeker
c0b94ae1df
Merge pull request #3143 from martin-frbg/fix3088
...
Resolve circular dependency between common.h and param.h
2021-03-14 23:12:55 +01:00
Martin Kroeker
ddd86309a1
Merge pull request #3144 from xoviat/fix-test
...
disable openmp
2021-03-14 23:12:33 +01:00
xoviat
e9d453b623
disable openmp
2021-03-14 16:34:02 -05:00
Martin Kroeker
ecb4babcf4
remove inclusion of common.h again to avoid circular dependency
2021-03-14 17:36:51 +01:00
Martin Kroeker
34753eaebb
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies
2021-03-14 17:28:43 +01:00
Martin Kroeker
efa72a631b
Merge pull request #17 from xianyi/develop
...
rebase
2021-03-14 17:20:49 +01:00
Martin Kroeker
30d835168a
Merge pull request #3088 from xoviat/msvc
...
add misc fixes.
2021-03-14 17:14:28 +01:00
Martin Kroeker
8f6a744807
Merge pull request #3141 from martin-frbg/nagfor-2
...
Leave out ARM64 march/mtune options when compiling with nagfor
2021-03-13 23:04:53 +01:00
Martin Kroeker
6726771645
Support compilation with NAG fortran
2021-03-13 20:16:18 +01:00
Martin Kroeker
a51cae6b2e
Merge pull request #3140 from martin-frbg/issue3139
...
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
2021-03-12 15:35:58 +01:00
Martin Kroeker
d30b943251
Merge pull request #3138 from martin-frbg/nagfor
...
Add support for compilation with the NAG Fortran compiler
2021-03-12 12:46:19 +01:00