OpenBLAS/kernel
CodesWithWolves d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.

Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
2021-03-31 15:44:25 -04:00
..
alpha Add implementations of ssum/dsum and csum/zsum 2019-03-30 22:05:11 +01:00
arm Typo fix 2021-02-23 13:14:35 +01:00
arm64 Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro 2021-03-31 15:44:25 -04:00
generic Add the support for RISC-V Vector. 2020-10-15 16:09:02 +08:00
ia64 Add ia64 implementation of ?sum 2019-03-30 22:18:03 +01:00
mips Add msa support for loongson 2020-12-09 10:28:46 +08:00
mips64 Add msa support for loongson 2020-12-09 10:28:46 +08:00
power Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler 2021-03-19 11:47:58 +01:00
riscv64 Refs #2899 2020-11-10 09:38:04 +08:00
simd fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
sparc Work around DOT and SWAP test failures 2020-12-06 19:15:37 +01:00
x86 Enable COOPERLAKE build target 2020-08-13 06:18:00 +08:00
x86_64 Merge pull request #3156 from martin-frbg/omatcopy_d 2021-03-19 15:22:48 +01:00
zarch s390x: fix cscal and zscal implementations 2020-09-21 13:10:05 +02:00
CMakeLists.txt Fix building "generic" TRMM kernel with CMake 2021-01-14 10:00:49 +01:00
Makefile Amend SkylakeX options to support the NVIDIA compiler 2020-12-19 22:11:49 +01:00
Makefile.L1 Conditionally add -mfma to compiler options where needed 2020-12-17 11:34:05 +01:00
Makefile.L2 Implementation of BF16 based gemv 2020-10-29 02:08:23 +08:00
Makefile.L3 Add msa support for loongson 2020-12-09 10:28:46 +08:00
Makefile.LA Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
setparam-ref.c Add msa support for loongson 2020-12-09 10:28:46 +08:00