Commit Graph

19 Commits

Author SHA1 Message Date
Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker 5d26223f4a
remove extra/intermediate size step of min_jj from PR747 2020-12-08 20:59:56 +01:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
wjc404 1c3e20ce48
Update level3.c 2020-02-04 20:30:23 +08:00
Andrew 11a627c54e remove surplus parentheses to silence clang5 2018-01-01 20:56:26 +01:00
Andrew bfc2a88594 remove unused buffer 2017-12-22 00:55:40 +01:00
Werner Saar a2672d5589 prepared driver/level3 functions for UNROLL values, that are not a power of two 2017-01-09 10:38:15 +01:00
Werner Saar b07d733a71 added updates for syrk and syr2k 2016-01-21 13:16:44 +01:00
wernsaar 1d33547222 optimized zgemm kernel for haswell 2014-07-27 11:51:42 +02:00
Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2014-06-27 12:05:18 -07:00
wernsaar c947ab85dc changed level3.c 2013-12-01 13:46:30 +01:00
wernsaar 2840d56aeb added dgemm_kernel for Piledriver 2013-10-19 09:47:15 +02:00
wernsaar 6f008abcef replaced defined(DOUBLE) by !defined(XDOUBLE) 2013-07-09 18:17:50 +02:00
wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
Xianyi Zhang 1b97ec1a7c Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes. 2011-02-26 11:19:54 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00