Commit Graph

82 Commits

Author SHA1 Message Date
Martin Kroeker 61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC 2023-08-06 15:17:38 +02:00
Martin Kroeker 898cf5faf3
Add Elbrus e2k architecture support 2022-01-22 18:55:10 +01:00
Bine Brank b6a445cfd8 adapt Makefile for SVE trsm 2022-01-16 21:40:56 +01:00
Bine Brank bb33446b40 fix makefile.L3 2022-01-06 10:26:11 +01:00
Bine Brank 07fa6fa3b1 configure Makefile for sve 2022-01-05 08:57:51 +01:00
Bine Brank 0140373802 add sve ztrmm 2022-01-02 19:15:33 +01:00
Bine Brank 774267fdac adjust Makefile.L3 for SVE 2021-12-11 16:35:08 +01:00
Bine Brank 86ae89bf33 add sgemm kernel and copy functions for sgemm and ssymm 2021-11-28 18:12:47 +01:00
Bine Brank 9b9cb90bb1 modify Makefile for SVE copy 2021-11-22 09:54:20 +01:00
Bine Brank 9388f05a3c configure SVE Makefile 2021-11-21 18:33:43 +01:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker f1e3305974
Add workaround for Windows10 macro name clash 2021-09-01 21:36:50 +02:00
Wangyang Guo 619588fbab sbgemm: remove unnecessary b0 files 2021-08-30 17:55:01 +08:00
Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 2021-08-30 17:40:20 +08:00
Wangyang Guo 989e6bbdd3 Small Matrix: reduce generic kernel source files 2021-08-13 03:17:38 +00:00
Wangyang Guo 5dc7c3c8e5 Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case 2021-08-02 07:06:54 +00:00
Xianyi Zhang 57ed58cefe Refs #2587 Add small matrix optimization reference kernel for c/zgemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 17d32a4a82 Change a1b0 gemm to b0 gemm. 2021-08-02 07:06:54 +00:00
Xianyi Zhang 59cb5de46b Refs #2587 Fix typos. 2021-08-02 07:06:54 +00:00
Xianyi Zhang be3349405d Add alpha=1.0 beta=0.0 for small gemm. 2021-08-02 07:01:47 +00:00
Xianyi Zhang 0a2077901c Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
2021-08-02 07:01:47 +00:00
Martin Kroeker c4da892ba0
Only filter out -mavx on Sandybridge ZGEMM/ZTRMM kernels 2021-05-14 23:19:10 +02:00
Martin Kroeker bd60fb6ffc
filter out -mavx flag on zgemm kernels as it can cause problems with older gcc 2021-05-13 23:05:00 +02:00
gxw 4b548857d6 Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson

Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
2020-12-09 10:28:46 +08:00
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker 3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:00:55 +02:00
Martin Kroeker 6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types 2020-10-11 14:52:09 +02:00
Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
2020-08-20 19:54:29 +02:00
Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782)
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add  sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c  to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
2020-08-19 14:51:09 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Rajalakshmi Srinivasaraghavan 475b5c95b9 Remove extra symbol in Makefile
While trying out different unroll values, noted that
make failed due to this extra symbol.
2020-08-07 15:27:44 -05:00
Martin Kroeker da17abec87
fix trailing whitespace 2020-07-14 18:20:03 +02:00
Martin Kroeker b144423f0f
Do not define USE_TRMM for 32bit POWER8 2020-07-14 18:10:12 +02:00
Martin Kroeker ed7e155c35
Merge branch 'develop' into aix 2020-07-07 18:52:06 +02:00
Martin Kroeker c854ef5471
Fix variable names in conditional 2020-06-25 13:29:52 +02:00
Martin Kroeker c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Kavana Bhat df4ade070f Fix for #2671 2020-06-24 04:25:47 -05:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan ff010f496e Build shgemm for all architecture 2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Xianyi Zhang 4aa2d89217 Merge branch 'develop' into risc-v 2020-02-27 13:53:49 +08:00
Martin Kroeker 1a6ea8ee6d
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
2019-12-09 17:54:49 +01:00
Kavana Bhat 6baa9b07d7 AIX changes for Power8 2019-12-06 04:33:32 -06:00
Kavana Bhat 3938e59569 AIX changes for Power8 2019-12-04 00:23:46 -06:00
Martin Kroeker e7c4d6705a
Revert #2051 and replace with a better fix (#2261)
* Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH
fixes #2257 without breaking #2048 again
2019-09-17 18:56:04 +02:00
Kavana Bhat 3dc6b26eff AIX changes for Power8 2019-08-20 06:51:35 -05:00
Martin Kroeker 7c51cc8527
Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
AbdelRauf 853a18bc17 power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 2019-03-29 15:49:40 +00:00