Commit Graph

5621 Commits

Author SHA1 Message Date
CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.

Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
2021-03-31 15:44:25 -04:00
Martin Kroeker 903fd85c85
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
2021-03-27 12:40:42 +01:00
Martin Kroeker d57c681a6d
Fix compilation on older OSX versions 2021-03-26 22:29:29 +01:00
Martin Kroeker d7efe5857c
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
2021-03-24 14:05:34 +01:00
Martin Kroeker 8fd694c18f
Update .travis.yml 2021-03-24 10:36:29 +01:00
Martin Kroeker e69b0b1771
Update azure-pipelines.yml 2021-03-24 10:34:24 +01:00
Martin Kroeker 9dc0bfd617
Update azure-pipelines.yml 2021-03-24 08:54:30 +01:00
Martin Kroeker e6664ec2c9
Update azure-pipelines.yml 2021-03-24 08:41:48 +01:00
Martin Kroeker dbb33f412f
Update azure-pipelines.yml 2021-03-24 08:30:48 +01:00
Martin Kroeker 70b89a6205
Add OSX build to Azure 2021-03-24 07:50:35 +01:00
Martin Kroeker 07b144855a
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
2021-03-24 06:56:10 +01:00
Martin Kroeker 292a0aed66
Fix xcode12 build and add OSX/OpenMP 2021-03-24 06:55:14 +01:00
Martin Kroeker 42f0201e21
Merge pull request #20 from xianyi/develop
rebase
2021-03-22 17:53:43 +01:00
Martin Kroeker 22db876d48
Merge pull request #3158 from austinpagan/Gemm.CZPQ
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
2021-03-19 20:53:21 +01:00
Martin Kroeker bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
2021-03-19 15:23:12 +01:00
Martin Kroeker 7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
2021-03-19 15:22:48 +01:00
Gordon Fossum 198adea961 Changed default P/Q values for CGEMM and ZGEMM (Power10 only) 2021-03-19 10:05:23 -04:00
Martin Kroeker 86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler 2021-03-19 11:47:58 +01:00
Martin Kroeker ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:46:25 +01:00
Martin Kroeker d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler 2021-03-19 11:44:31 +01:00
Martin Kroeker c4b91bfcf1
Merge pull request #3155 from martin-frbg/issue3152
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
2021-03-19 09:55:31 +01:00
Martin Kroeker 0f5e86a0d9
Remove premature entry for DOMATCOPY_RT 2021-03-18 21:53:50 +01:00
Martin Kroeker 7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time 2021-03-18 21:28:19 +01:00
Martin Kroeker 1e4b2e98d9
Merge pull request #3154 from martin-frbg/issue3153
Fix premature include in getarch_2nd
2021-03-18 12:35:47 +01:00
Martin Kroeker 3fd6ccdf76
Include just the definition of BLASLONG rather than all of common.h 2021-03-18 07:50:19 +01:00
Martin Kroeker fa9a30b491
Merge pull request #19 from xianyi/develop
rebase
2021-03-18 07:47:03 +01:00
Martin Kroeker d90ca75a6c
Update version to 0.3.14.dev 2021-03-17 21:14:42 +01:00
Martin Kroeker e107454454
Update version to 0.3.14.dev 2021-03-17 21:14:05 +01:00
Martin Kroeker d43962d013
Merge pull request #3151 from xianyi/release-0.3.0
Merge 0.3.14 release branch back into develop to acquire tag
2021-03-17 21:13:25 +01:00
Martin Kroeker 2f6d35c3d4
Merge pull request #3150 from xianyi/develop
Update branch from develop for 0.3.14 release
2021-03-17 20:21:42 +01:00
Martin Kroeker 86de5f768b
Update version to 0.3.14 for release 2021-03-17 20:20:34 +01:00
Martin Kroeker 2663e44724
Update version to 0.3.14 for release 2021-03-17 20:20:00 +01:00
Martin Kroeker 6f2900c164
Merge pull request #3149 from martin-frbg/changelog14
Update Changelog for 0.3.14
2021-03-17 20:14:50 +01:00
Martin Kroeker 7888b5127c
Update Changelog for 0.3.14 2021-03-17 16:17:55 +01:00
Martin Kroeker 8808c291b9
Merge pull request #3148 from martin-frbg/issue3145
Add workaround for older gcc on big-endian ppc64 not supporting casts in defines
2021-03-17 09:05:43 +01:00
Martin Kroeker 8cdf0825de
Add workaround for older gcc on ppc64be not supporting casts in defines 2021-03-16 21:20:05 +01:00
Martin Kroeker 9e0dbe8e59
Merge pull request #18 from xianyi/develop
rebase
2021-03-16 21:09:45 +01:00
Martin Kroeker 52f99d3944
Merge pull request #3147 from martin-frbg/issue3146
Fix DYNAMIC_ARCH builds with CLANG on ppc64
2021-03-16 20:25:42 +01:00
Martin Kroeker 186368ddc3
Fix compilation with CLANG 2021-03-16 16:52:57 +01:00
Martin Kroeker c0b94ae1df
Merge pull request #3143 from martin-frbg/fix3088
Resolve circular dependency between common.h and param.h
2021-03-14 23:12:55 +01:00
Martin Kroeker ddd86309a1
Merge pull request #3144 from xoviat/fix-test
disable openmp
2021-03-14 23:12:33 +01:00
xoviat e9d453b623 disable openmp 2021-03-14 16:34:02 -05:00
Martin Kroeker ecb4babcf4
remove inclusion of common.h again to avoid circular dependency 2021-03-14 17:36:51 +01:00
Martin Kroeker 34753eaebb
Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies 2021-03-14 17:28:43 +01:00
Martin Kroeker efa72a631b
Merge pull request #17 from xianyi/develop
rebase
2021-03-14 17:20:49 +01:00
Martin Kroeker 30d835168a
Merge pull request #3088 from xoviat/msvc
add misc fixes.
2021-03-14 17:14:28 +01:00
Martin Kroeker 8f6a744807
Merge pull request #3141 from martin-frbg/nagfor-2
Leave out ARM64 march/mtune options when compiling with nagfor
2021-03-13 23:04:53 +01:00
Martin Kroeker 6726771645
Support compilation with NAG fortran 2021-03-13 20:16:18 +01:00
Martin Kroeker a51cae6b2e
Merge pull request #3140 from martin-frbg/issue3139
Fix compilation on older x86_64 targets with old compilers that lack intrinsics support
2021-03-12 15:35:58 +01:00
Martin Kroeker d30b943251
Merge pull request #3138 from martin-frbg/nagfor
Add support for compilation with the NAG Fortran compiler
2021-03-12 12:46:19 +01:00