OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	c6b1d8e7a3	fix improper function prototypes (empty parentheses)	2023-09-30 12:52:06 +02:00
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Wangyang Guo	045ed5c91d	sbgemm: fix build error in BFLOAT16 disabled	2021-09-07 23:37:08 +08:00
Wangyang Guo	8356a604f0	sbgemm: cooperlake: tuning for block params	2021-09-07 21:30:46 +08:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	85154c2e18	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:05:05 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Ashwin Sekhar T K	d50abc8903	ARM64: Move parameters from parameter.c to param.h Remove the runtime setting of P, Q, R parameters for targets ARMV8, THUNDERX2T99. Instead set them as constants in param.h at compile time.	2018-10-22 01:45:51 -07:00
Ashwin Sekhar T K	21f46a1cf2	ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8 Currently the generic ARMV8 target uses C implementations for many routines. Replace these with the neon implementations written for THUNDERX2T99 target which are upto 6x faster for certain routines.	2018-10-17 10:44:37 -07:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Denis Steckelmacher	c9ff735da6	Add ZEN support (tested for auto-detected static backend)	2017-03-19 15:32:50 +01:00
Ashwin Sekhar T K	a86474c6f7	THUNDERX2T99: Performance fix for ZGEMM	2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K	19ba133383	THUNDERX2T99: Add Optimized ZGEMM Implementation	2017-02-28 05:31:41 +00:00
Ashwin Sekhar T K	2757b49767	THUNDERX2T99: Add Optimized CGEMM Implementation	2017-01-30 17:44:26 +05:30
Ashwin Sekhar T K	f279ff4789	THUNDERX2T99: Add Optimized SGEMM Implementation	2017-01-16 21:44:33 +05:30
Zhang Xianyi	0863a0d4b4	Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch Add new targets for ARM64	2017-01-16 13:20:10 +08:00
Werner Saar	c1c5a63d3c	prepared parameter.c for UNROLL values, that are not a power of two	2017-01-11 09:50:28 +01:00
Ashwin Sekhar T K	4b55fae337	ARM64: Add Cavium THUNDERX2T99 Target	2017-01-11 11:18:40 +05:30
Ashwin Sekhar T K	0b8e876d89	VULCAN: Add optimized DGEMM implementation	2017-01-10 15:01:37 +05:30
Ashwin Sekhar T K	4713e7c47f	ARM64: Add the VULCAN Target	2017-01-10 15:01:17 +05:30
Werner Saar	78b05f6476	bugfix for EXCAVATOR and DYNAMIC_ARCH	2016-04-25 10:13:30 +02:00
Zhang Xianyi	05196a8497	Refs #716 . Only call getenv at init function.	2016-03-09 12:50:07 -05:00
Werner Saar	4319769b79	added target processor STEAMROLLER	2014-12-28 20:16:46 +08:00
wernsaar	a64fe9bcc9	added optimized sgemv_n kernel for sandybridge	2014-09-06 08:41:53 +02:00
wernsaar	2021d0f9d6	experimentally removed expensive function calls	2014-09-05 15:05:53 +02:00
wernsaar	50e99a52ea	added definitions for PILEDRIVER and HASWELL	2014-07-06 12:08:27 +02:00
Zhang Xianyi	7a8949e0ce	Merge branch 'develop' of https://github.com/TimothyGu/OpenBLAS into TimothyGu-develop Conflicts: driver/others/memory.c	2014-06-28 20:51:31 +08:00
Timothy Gu	6c2ead30f0	Remove all trailing whitespace except lapack-netlib Signed-off-by: Timothy Gu <timothygu99@gmail.com>	2014-06-27 12:05:18 -07:00
Jameson Nash	f41f03ab83	fix #394 . this cleans up some handles after using them, and doesn't disable ALL process privileges upon success	2014-06-27 12:16:57 -04:00
Zhang Xianyi	bfaaa975e6	Added BULLDOZER target. So far it uses barcelona kernels.	2012-12-07 00:53:31 +08:00
Zhang Xianyi	d3b67d0bd8	Refs #113 . Fixed the typo BOBCATE -> BOBCAT	2012-05-31 22:40:15 +08:00
Zhang Xianyi	d6cab3f37e	Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.	2012-05-31 18:17:45 +08:00
Xianyi Zhang	19a48b82cf	Init Sandybridge codes based on Nehalem.	2012-03-30 20:01:03 +08:00
Wang Qian	8163ab7e55	Change the block size on Loongson 3B.	2011-11-23 18:41:49 +00:00
Xianyi Zhang	b95ad4cfaf	Support detecting ICT Loongson-3B CPU.	2011-11-09 19:29:50 +00:00
traz	831858b883	Modify aligned address of sa and sb to improve the performance of multi-threads.	2011-09-23 20:59:48 +00:00
Xianyi Zhang	16fc083322	Refs #47 . Fixed the seting parameter bug on Loongson 3A single thread version.	2011-09-08 16:39:34 +00:00
Xianyi Zhang	4727fe8abf	Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.	2011-09-05 15:13:52 +00:00
Xianyi Zhang	342bbc3871	Import GotoBLAS2 1.13 BSD version codes.	2011-01-24 14:54:24 +00:00

40 Commits