OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	ca31c32693	Rename "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-11 23:49:22 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Marius Hillenbrand	e115c97e05	s390x/SGEMM: adjust default P and Q to multiples of M We recently changed the register blocking for SGEMM on s390x to 16x4. However, we did not adjust Q to a multiple of 16 and thus fell back to the 8x4 kernel at each block's margin, without need. Adjust P and Q to multiples of 16 to employ the faster 16x4 kernel for complete full-sized blocks. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-08-11 12:56:46 +02:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	bd2498c886	Use POWER6 GEMM parameters on 32bit POWER8	2020-07-14 18:07:58 +02:00
Rajalakshmi Srinivasaraghavan	d23419accc	powerpc: Optimized SHGEMM kernel for POWER10 This patch introduces new optimized version of SHGEMM kernel using power10 Matrix-Multiply Assist (MMA) feature introduced in POWER ISA v3.1. This patch makes use of new POWER10 compute instructions for matrix multiplication operation. Tested on simulator and there are no new test failures.	2020-06-25 22:19:08 -05:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Martin Kroeker	f16e39554d	Change PPCG4 CGEMM_M to match kernel change	2020-06-03 09:15:29 +02:00
张丹枫	ea5bdc3f72	split cortex-a53 param to match 8x8 kernel	2020-05-20 22:34:47 +08:00
Marius Hillenbrand	1b0b4349a1	s390x/Z14: Change register blocking for SGEMM to 16x4 Change register blocking for SGEMM (and STRMM) on z14 from 8x4 to 16x4 by adjusting SGEMM_DEFAULT_UNROLL_M and choosing the appropriate copy implementations. Actually make KERNEL.Z14 more flexible, so that the change in param.h suffices. As a result, performance for SGEMM improves by around 30% on z15. On z14, FP SIMD instructions can operate on float-sized scalars in vector registers, while z13 could do that for double-sized scalars only. Thus, we can double the amount of elements of C that are held in registers in an SGEMM kernel. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 15:59:51 +02:00
Martin Kroeker	03ff213c51	Increase POWER8 ZGEMM_R and use same R values for POWER9 fixes lapack-test zger failures seen in #2299 after application of my PR #2551	2020-04-24 21:46:54 +02:00
Martin Kroeker	00172d440b	Typo fix in MIPS24K addition	2020-04-18 21:16:49 +02:00
Martin Kroeker	61bbae3ac1	Handle MIPS24K like P5600 and allow enforcing TARGET=1004K as well (omission from earlier 1004K merge and later introduction of TARGET check)	2020-04-18 21:09:32 +02:00
Martin Kroeker	a33d177430	Increase default BUFFER_SIZE on ARM, ZARCH and newer x86_64, add GEMM_R for POWER8/9 As shown in #2538, default buffersizes on some platforms were smaller than required in memory.c and the requirement could never be fulfilled for a calculated GEMM_R on PPC given the fomula used	2020-04-12 19:44:48 +02:00
Martin Kroeker	567d2760e6	Merge pull request #2520 from wjc404/develop Fix avx512 sgemm performance bug when ldc is a multiple of 1024	2020-03-30 20:15:59 +02:00
wjc404	64daad4365	Update param.h	2020-03-20 21:46:18 +00:00
Martin Kroeker	ea8eec5d17	Merge pull request #2422 from wjc404/develop Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM	2020-02-29 19:07:35 +01:00
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00
Martin Kroeker	8164fd1328	Always assume server-class cpu count for TSV110 and EMAG8180	2020-02-26 22:19:57 +01:00
Martin Kroeker	71e5669c3e	Add preliminary support for EMAG8180 ARMV8 processor	2020-02-19 18:57:26 +01:00
wjc404	b0558c11b9	Update param.h	2020-02-16 23:01:31 +08:00
wjc404	83b6be7976	Update param.h	2020-02-04 19:55:26 +08:00
wjc404	f3f969f681	Update param.h	2020-02-03 21:34:12 +08:00
Wang,Long	fbf4f48f4a	fix a few performance drop in some matrix size per data type Signed-off-by: Wang,Long <long1.wang@intel.com>	2020-01-22 15:15:04 +00:00
wjc404	1c67567008	improve skylakex paralleled sgemm performance	2020-01-13 16:26:03 +08:00
wjc404	b7b408a120	optimize AVX2 SGEMM	2020-01-06 12:16:09 +08:00
wjc404	6362c34ee6	Update param.h	2019-12-30 16:08:19 +08:00
wjc404	64639f440f	Update param.h	2019-12-27 18:06:42 +08:00
wjc404	611445c7f8	Update param.h	2019-12-23 23:44:55 +08:00
wjc404	105e26e12a	Adjust Haswell ZGEMM blocking parameters	2019-12-21 14:38:51 +08:00
wjc404	e20709e976	Update param.h	2019-11-28 19:57:50 +08:00
Martin Kroeker	6082e556cd	Use "generic" S/CGEMM unroll M on big-endian PPC970 as the respective PPC970 "altivec" kernels give wrong results when compiled for big endian	2019-11-17 15:10:26 +01:00
Martin Kroeker	4c6a457358	Merge pull request #2300 from wjc404/develop Optimize SGEMM on SKYLAKEX CPUs	2019-11-06 07:27:33 +01:00
wjc404	ae43b75a6a	Add files via upload	2019-11-02 10:09:19 +08:00
wjc404	274ff5cdb8	update sgemm_q on skylakex cpus	2019-11-01 23:59:18 +08:00
Martin Kroeker	df857551c0	Remove special parameter set for obsolete IOS/ARMV8 workaround	2019-10-25 23:07:00 +02:00
wjc404	5da9484d93	Add files via upload	2019-10-16 02:01:13 +08:00
Martin Kroeker	6b83079368	Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267 ) There is currently no simple way to query cache sizes on ARMV8, so this takes the number of cores as a trivial indication if the target is a server-class device with a big cache, or just a single-board toy or smartphone.	2019-09-25 23:13:24 +02:00
Martin Kroeker	6b6c9b1441	Merge pull request #2172 from quickwritereader/develop power9 cgemm/ctrmm. new sgemm 8x16	2019-07-01 21:06:02 +02:00
AbdelRauf	a97b301aaa	cgemm/ctrmm power9	2019-07-01 14:07:54 +00:00
pkubaj	7c7505a778	Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too.	2019-06-28 10:31:45 +00:00
AbdelRauf	cdbfb891da	new sgemm 8x16	2019-06-17 15:33:38 +00:00
AbdelRauf	d0c3543c3f	power9 zgemm ztrmm optimized	2019-06-05 20:07:16 +00:00
AbdelRauf	a469b32cf4	sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52	2019-06-04 07:11:30 +00:00
AbdelRauf	8fe794f059	improved zgemm power9 based on power8	2019-05-30 15:31:25 +00:00
AbdelRauf	628b335e83	Merge branch 'develop' of https://github.com/quickwritereader/OpenBLAS into develop	2019-04-29 08:57:44 +00:00
AbdelRauf	0f105dd8a5	sgemm/strmm	2019-04-29 08:49:50 +00:00
Martin Kroeker	7c51cc8527	Merge branch 'develop' into develop	2019-03-29 19:36:29 +01:00
AbdelRauf	853a18bc17	power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself	2019-03-29 15:49:40 +00:00
Martin Kroeker	03d7110900	Merge pull request #2042 from maomao194313/develop add TARGET support for HiSilicon tsv110 CPUs	2019-03-12 22:57:39 +01:00

1 2 3 4

183 Commits