Commit Graph

55 Commits

Author SHA1 Message Date
Martin Kroeker cfa0a80664
Restore initialization of data variables 2023-07-13 23:23:12 +02:00
Martin Kroeker 9567305e4c
Restore initialization of data01,data02 2023-07-13 23:21:18 +02:00
Sergei Lewis cb0a70e0e2 dot.c early bail fix 2023-03-02 09:51:10 +00:00
Ivan Pribec 802e71bf05 Add const attribute to lsame 2022-08-08 15:15:52 +02:00
Martin Kroeker ef24712030
Move a conditionally used variable 2021-09-11 14:37:44 +02:00
Wangyang Guo 619588fbab sbgemm: remove unnecessary b0 files 2021-08-30 17:55:01 +08:00
Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 2021-08-30 17:40:20 +08:00
Wangyang Guo 989e6bbdd3 Small Matrix: reduce generic kernel source files 2021-08-13 03:17:38 +00:00
Wangyang Guo 6b58bca18b Small Matrix: disable low performance default kernel 2021-08-03 06:49:03 +00:00
Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 2021-08-02 07:06:54 +00:00
Xianyi Zhang 6022e5629c Refs #2587 fix small matrix c/zgemm bug. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang 0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
damonyu ef8e7d0279 Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
2020-10-15 16:09:02 +08:00
Martin Kroeker 756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:56:17 +02:00
Qiyu8 60e6c68e38 Adapt ARM architect 2020-09-29 16:36:14 +08:00
Qiyu8 1b1a757f5f Optimize the performance of dot by using universal intrinsics in X86/ARM 2020-09-28 20:36:53 +08:00
Rajalakshmi Srinivasaraghavan d23419accc powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Rajalakshmi Srinivasaraghavan a87793e03c Fix DYNAMIC_ARCH compilation errors 2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Qiyu8 ff42e68652 Optimize genenal Gemm Beta 2020-01-20 11:49:42 +08:00
Andrew 1e531701b7 fix small typo 2018-09-09 16:52:25 +02:00
Martin Kroeker 7a7619af6d
Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
2018-05-17 11:40:08 +02:00
Andrew e5cc3d72c0 core.IdenticalExpr clang501 checker 2018-01-19 23:17:43 +01:00
Andrew 9fa986337d add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
Andrew 3eed97f6b9 Initialize values to silence cppcheck 2018-01-12 22:35:00 +01:00
Andrew d602b99386 LAPACK helpers in C that need care too 2018-01-02 14:38:50 +01:00
Andrew 4d0b005e5b Eliminate remaining unused results in kernels (clang5 analyzer) 2018-01-01 20:54:39 +01:00
Andrew 03e5ff0687 initialize potentially unitialized variables (clang5) 2017-12-26 09:24:24 +01:00
Andrew 47deec2c1a fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
Andrew 281a2b952f warning cleanup (#1380)
* dead increments in driver/level2

* dead increments in kernel/generic

* part dead increments in kernel/x86_64
2017-12-05 19:54:10 +01:00
Martin Kroeker 8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels 2017-12-02 22:51:58 +01:00
Andrew 441a9c8385 more dead increments clang4 scan-build deadcode.deadstores 2017-11-26 17:24:08 +01:00
Andrew 1236dbe5a6 Eliminate 2-8 dead increments code 2017-11-26 13:26:11 +01:00
Martin Kroeker 65bf0a343c
Remove unused variable btpr 2017-11-14 23:25:50 +01:00
Martin Kroeker 9d92f526dd Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
2017-10-06 23:51:32 +02:00
Martin Kroeker f96afd94b0 Fix out-of-bounds accesses where the data should be zero anyway 2017-10-01 01:06:39 +02:00
Andrew becf8bc7a0 remove dead code 2016-10-31 12:46:56 +01:00
Yichao Yu 594b9f4c73 Do not use vsub to clear the register values since it doesn't work with non-normal numbers. 2016-01-05 16:54:05 +00:00
Ashwin Sekhar T K 45f78963ac Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
2015-11-09 14:15:53 +05:30
Martin Koehler 711ca33bc6 Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
2015-09-07 14:36:16 +02:00
Zhang Xianyi 1cf2b10224 Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC

?gemm3m is unimplemented on generic target.
2015-08-03 23:55:56 -05:00
Werner Saar 9bd962f655 modified haswell parameter dgemm_unroll_n 2015-06-13 10:28:27 +02:00
Zhang Xianyi ea7f9dacf4 Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1. 2015-02-26 01:47:11 +08:00
Zhang Xianyi 2fb02626da Update organization info. 2014-11-25 15:28:58 +08:00
Benedikt Huber 58c90d5937 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.
2014-11-11 22:19:23 +08:00
wernsaar b079df9ef4 added optimized sdot- and dsdot-kernel, written in C 2014-06-30 14:46:38 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00