OpenBLAS

Commit Graph

Author	SHA1	Message	Date
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	2024-09-29 09:35:21 +08:00
Mark Ryan	3b715e6162	Add autodetection for riscv64 Implement DYNAMIC_ARCH support for riscv64. Three cpu types are supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b. The two non-generic kernels require CPU support for RVV 1.0 to function correctly. Detecting that a riscv64 device supports RVV 1.0 is a little complicated as there are some boards on the market that advertise support for V via hwcap but only support RVV 0.7.1, which is not binary compatible with RVV 1.0. The approach taken is to first try hwprobe. If hwprobe is not available, we fall back to hwcap + an additional check to distinguish between RVV 1.0 and RVV 0.7.1. Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no vector. A compiler with RVV 1.0 support must be used to build OpenBLAS for riscv64 when DYNAMIC_ARCH=1. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	2024-07-15 14:24:22 +00:00
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	2024-04-10 14:23:31 +02:00
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	2024-04-09 09:03:34 -04:00
Chen Yu	8e39c05efd	Get the l2 cache size via environment variable on confidential VM The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential VMs. As a result the get_l2_size() returns the default 512M which brings performance issues. Introduce the environment variable OPENBLAS_L2_SIZE provided by the user to get the l2 cache size. Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>	2024-04-05 11:39:01 +08:00
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2023-05-19 10:48:18 +08:00
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2023-04-19 11:51:54 +02:00
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2023-04-17 15:34:12 +01:00
Martin Kroeker	38d6fb4225	Fix dependencies in builds with specified subsets of precision types	2023-02-23 23:12:06 +01:00
Martin Kroeker	5481c328e8	fix DYNAMIC_ARCH builds that use only a subset of precisions	2023-02-22 00:28:25 +01:00
Martin Kroeker	c9d78dc3b2	Remove excess initializer (leftover from rework of PR 3793)	2022-10-31 16:57:03 +01:00
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2022-10-27 14:10:26 +08:00
Honglin Zhu	843e9fd0b9	Fix typo error	2022-10-26 17:06:33 +08:00
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2022-10-26 15:09:41 +08:00
gxw	fbfe1daf6e	LoongArch64: Add DYNAMIC_ARCH support	2022-07-28 14:28:45 +08:00
Martin Kroeker	40302558ed	Remove extraneous (and wrong) definition of sbgemm_r on x86_64	2022-03-23 20:05:32 +01:00
Martin Kroeker	d9894f45d3	Define sbgemm_r to fix DYNAMIC_ARCH builds	2022-02-25 10:04:00 +01:00
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	2021-08-30 17:40:20 +08:00
Wangyang Guo	478d1086c1	Small Matrix: support DYNAMIC_ARCH build	2021-08-04 03:12:41 +00:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	10379fc83b	Use ifdef instead of if	2020-10-15 19:05:37 +02:00
Martin Kroeker	3aecafad80	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:00:55 +02:00
Martin Kroeker	6b6adf8a4a	Allow compiling only a subset of kernels for specific variable types	2020-10-11 14:52:09 +02:00
Martin Kroeker	dfbc62ef7e	Support building only a subset of types	2020-09-22 23:25:59 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	2020-08-20 19:54:29 +02:00
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	2020-08-19 14:51:09 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Rajalakshmi Srinivasaraghavan	67cc4b9e16	Fix warnings in clang and export symbol	2020-04-15 19:15:23 -05:00
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	2020-04-15 09:09:50 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
int_13h	96ad579428	add in runtime cpu detection for zarch (#2349 ) add in runtime cpu detection for zarch	2019-12-31 18:03:27 +01:00
Martin Kroeker	ccfb7ead15	Merge pull request #2072 from martin-frbg/sum Add (C)BLAS extension ?sum	2019-04-23 20:11:36 +02:00
Rashmica Gupta	bcdf1d4917	Add in runtime CPU detection for POWER.	2019-04-09 14:20:16 +10:00
Martin Kroeker	b9f4943a14	Add ?sum	2019-03-30 22:01:13 +01:00
Ashwin Sekhar T K	d5aeff636f	ARM64: Enable DYNAMIC_ARCH Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid feature in linux kernel to detect the core type at runtime (https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt). If this feature is missing in kernel, then the user should use the OPENBLAS_CORETYPE env variable to select the desired core type.	2018-10-22 01:49:35 -07:00
Ashwin Sekhar T K	e7b66cd36e	ARM64: Fix DYNAMIC_ARCH compilation for cores which dont use GEMM3M	2018-10-22 01:45:51 -07:00
Martin Kroeker	6f71c0fce4	Return a somewhat sane default value for L2 cache size if cpuid retur… (#1611 ) * Return a somewhat sane default value for L2 cache size if cpuid returned something unexpected Fixes #1610, the KVM hypervisor on Google Chromebooks returning zero for CPUID 0x80000006, causing DYNAMIC_ARCH builds of OpenBLAS to hang	2018-06-11 13:26:19 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	acf3d34bc5	Silence an unused variable warning with a cast l2 cache size is not universally needed to assign default unrolling limits, but neither putting its declaration inside an ifdef nor cloning it into all ifdef sections that need it really makes sense here.	2017-11-14 23:23:44 +01:00
Denis Steckelmacher	c9ff735da6	Add ZEN support (tested for auto-detected static backend)	2017-03-19 15:32:50 +01:00
Werner Saar	28e2fab33e	prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two	2017-01-11 11:56:50 +01:00
Werner Saar	78b05f6476	bugfix for EXCAVATOR and DYNAMIC_ARCH	2016-04-25 10:13:30 +02:00
Zhang Xianyi	69363622a8	Fix DYNAMIC_ARCH=1 bug.	2015-10-27 05:10:40 +08:00
Zhang Xianyi	ea7f9dacf4	Refs #509 . Fixed geadd building bug with DYNAMIC_ARCH=1.	2015-02-26 01:47:11 +08:00
Werner Saar	4319769b79	added target processor STEAMROLLER	2014-12-28 20:16:46 +08:00
wernsaar	7aae4a62e7	enabled use of GEMM3M functions	2014-09-20 14:27:10 +02:00

1 2

70 Commits