Commit Graph

32 Commits

Author SHA1 Message Date
Andrew
1e531701b7 fix small typo 2018-09-09 16:52:25 +02:00
Martin Kroeker
7a7619af6d Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Andrew
e5cc3d72c0 core.IdenticalExpr clang501 checker 2018-01-19 23:17:43 +01:00
Andrew
9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew
3eed97f6b9 Initialize values to silence cppcheck 2018-01-12 22:35:00 +01:00
Andrew
d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Andrew
4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Andrew
03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew
47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Andrew
281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker
8213385ab8 Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Andrew
441a9c8385 more dead increments clang4 scan-build deadcode.deadstores 2017-11-26 17:24:08 +01:00
Andrew
1236dbe5a6 Eliminate 2-8 dead increments code 2017-11-26 13:26:11 +01:00
Martin Kroeker
65bf0a343c Remove unused variable btpr 2017-11-14 23:25:50 +01:00
Martin Kroeker
9d92f526dd Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker
f96afd94b0 Fix out-of-bounds accesses where the data should be zero anyway 2017-10-01 01:06:39 +02:00
Andrew
becf8bc7a0 remove dead code 2016-10-31 12:46:56 +01:00
Yichao Yu
594b9f4c73 Do not use vsub to clear the register values since it doesn't work with non-normal numbers. 2016-01-05 16:54:05 +00:00
Ashwin Sekhar T K
45f78963ac Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30
Martin Koehler
711ca33bc6 Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi
1cf2b10224 Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC

?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
Werner Saar
9bd962f655 modified haswell parameter dgemm_unroll_n 2015-06-13 10:28:27 +02:00
Zhang Xianyi
ea7f9dacf4 Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1. 2015-02-26 01:47:11 +08:00
Zhang Xianyi
2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Benedikt Huber
58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar
b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
Timothy Gu
6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
Zhang Xianyi
6c4a7d0828 Import AMD Piledriver DGEMM kernel generated by AUGEM.
So far, this kernel doesn't deal with edge.

AUGEM: Automatically Generate High Performance Dense Linear Algebra
Kernels on x86 CPUs.
Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. In the
International Conference for High Performance Computing, Networking,
Storage and Analysis (SC'13). Denver, CO. Nov, 2013.
2013-08-25 10:16:01 -03:00
wernsaar
cff70a666d added generic trmm kernels and modified Makefile.L3 2013-07-30 20:18:57 +02:00
wangqian
f76f952547 Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions. 2012-06-19 16:37:12 +08:00
Wang Qian
8e53b57bb2 Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel. 2012-01-10 17:16:13 +00:00
Xianyi Zhang
342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00