OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Rajalakshmi Srinivasaraghavan	b5d30b390d	Fix build issues with bfloat16 This patch fixes compilation errors due to recent renaming from SH to SB with BUILD_BFLOAT16.	2020-10-13 11:00:22 -05:00
Martin Kroeker	3aecafad80	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:00:55 +02:00
Martin Kroeker	6b6adf8a4a	Allow compiling only a subset of kernels for specific variable types	2020-10-11 14:52:09 +02:00
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	2020-08-20 19:54:29 +02:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Rajalakshmi Srinivasaraghavan	475b5c95b9	Remove extra symbol in Makefile While trying out different unroll values, noted that make failed due to this extra symbol.	2020-08-07 15:27:44 -05:00
Martin Kroeker	da17abec87	fix trailing whitespace	2020-07-14 18:20:03 +02:00
Martin Kroeker	b144423f0f	Do not define USE_TRMM for 32bit POWER8	2020-07-14 18:10:12 +02:00
Martin Kroeker	ed7e155c35	Merge branch 'develop' into aix	2020-07-07 18:52:06 +02:00
Martin Kroeker	c854ef5471	Fix variable names in conditional	2020-06-25 13:29:52 +02:00
Martin Kroeker	c0afc11742	Fix POWERPC builds on AIX (gcc/gfortran 7) 1. macro preprocessing for POWER8 and later kernels only 2. default buffer size used by AIX version of m4 is too small	2020-06-25 13:12:36 +02:00
Kavana Bhat	df4ade070f	Fix for #2671	2020-06-24 04:25:47 -05:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan	ff010f496e	Build shgemm for all architecture	2020-04-14 20:38:53 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	1a6ea8ee6d	Merge pull request #2338 from kavanabhat/aix_mod Changes to build on AIX in POWER8 mode	2019-12-09 17:54:49 +01:00
Kavana Bhat	6baa9b07d7	AIX changes for Power8	2019-12-06 04:33:32 -06:00
Kavana Bhat	3938e59569	AIX changes for Power8	2019-12-04 00:23:46 -06:00
Martin Kroeker	e7c4d6705a	Revert #2051 and replace with a better fix (#2261 ) * Revert #2051 and add a better fix for TARGET=generic with DYNAMIC_ARCH fixes #2257 without breaking #2048 again	2019-09-17 18:56:04 +02:00
Kavana Bhat	3dc6b26eff	AIX changes for Power8	2019-08-20 06:51:35 -05:00
Martin Kroeker	7c51cc8527	Merge branch 'develop' into develop	2019-03-29 19:36:29 +01:00
AbdelRauf	853a18bc17	power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself	2019-03-29 15:49:40 +00:00
Martin Kroeker	5b95534afc	Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue #2048	2019-03-09 11:21:16 +01:00
Martin Kroeker	885a3c4350	USE_TRMM on Z14 from patch provided by aarnez in #991	2019-01-31 21:18:09 +01:00
Martin Kroeker	f3fd44a731	Set USE_TRMM for all ZARCH variants to fix TRMM faults with zarch-generic fixes #1743	2018-08-28 21:34:07 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	82012b960b	Revert " Switch mips32 target to USE_TRMM to fix complex TRMM" ... as it was just a silly workaround for the issue seen in #1563, caused by #1419	2018-05-17 20:30:03 +02:00
Martin Kroeker	018f2dad27	Switch mips32 target to USE_TRMM to fix complex TRMM	2018-05-02 20:25:32 +02:00
Martin Kroeker	9c5518319a	Revert "Fix 32bit HASWELL builds"	2018-04-22 20:20:04 +02:00
Martin Kroeker	0e2cf102e1	Fix 32bit HASWELL	2017-10-24 10:07:44 +02:00
Denis Steckelmacher	c9ff735da6	Add ZEN support (tested for auto-detected static backend)	2017-03-19 15:32:50 +01:00
Zhang Xianyi	b678471d65	Merge branch 'z13' into develop Conflicts: CONTRIBUTORS.md	2017-01-09 05:52:42 -05:00
Zhang Xianyi	864e202afd	Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3	2017-01-09 05:48:09 -05:00
Kaustubh Raste	c8a7860eb3	STRSM optimized Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>	2016-05-30 21:17:00 +05:30
Werner Saar	b752858d6c	added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8	2016-03-01 07:33:56 +01:00
Zhang Xianyi	94b125255f	Merge branch 'develop' into cmake Conflicts: driver/others/memory.c	2015-10-13 04:46:08 +08:00
Martin Koehler	711ca33bc6	Improved Ximatcopy when lda==ldb. The Ximatcopy functions create a copy of the input matrix although they seem to work inplace. The new routines XIMATCOPY_K_YY perform the operations inplace if the leading dimension does not change.	2015-09-07 14:36:16 +02:00
Zhang Xianyi	f874465bb8	Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit. Disable CBLAS and LAPACK.	2015-08-10 14:10:44 -05:00
Werner Saar	9bd962f655	modified haswell parameter dgemm_unroll_n	2015-06-13 10:28:27 +02:00
Zhang Xianyi	ea7f9dacf4	Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.	2015-02-26 01:47:11 +08:00
Martin Koehler	39cc6b21d3	Add ATLAS-style ?geadd function	2015-02-16 13:46:20 +01:00
Zhang Xianyi	a85c2785ae	Refs #467 . Added generic kernel file for x86_64.	2014-11-24 15:34:48 +08:00
wernsaar	e80b144932	enabled compiling of *3M functions	2014-07-02 14:11:53 +02:00
wernsaar	be94db096c	disabled *3M functions for x86_64 platforms	2014-07-01 16:18:05 +02:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
wernsaar	cee257f384	Ref #51 : added blas extensions zomatcopy and comatcopy	2014-06-10 10:34:54 +02:00
wernsaar	7bfb3011e8	Ref #51 : added blas extension somatcopy	2014-06-09 20:21:13 +02:00
wernsaar	8c8f596238	Ref #51 : added blas extension domatcopy as not opimized reference	2014-06-09 17:11:07 +02:00

1 2

55 Commits