OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	52b71a1673	Filter out FFLAGS that flang-new from LLVM18 no longer supports (#4569 ) * Filter out FFLAGS that flang-new from LLVM18 no longer supports	2024-03-22 17:02:39 +01:00
Martin Kroeker	a0e3f77e0b	add FIXED_LIBNAME, PREFIX and SUFFIX	2024-02-15 12:17:38 +01:00
Martin Kroeker	49689fbef7	Add support for compiling SVE kernels with the NVIDIA HPC compiler	2023-08-25 17:11:04 +02:00
Martin Kroeker	ac698cedad	Add compiler options for ARM64 SVE targets in DYNAMIC_ARCH builds	2023-07-05 09:47:49 +02:00
Martin Kroeker	d2144b2981	Add NVHPC	2023-06-09 19:01:15 +02:00
Martin Kroeker	de937b3194	Add clang option to avoid running out of registers in AVX512 assembly	2023-03-17 21:22:37 +01:00
Martin Kroeker	e964ebd0d0	Add compiler option for AVX512-capable Ryzen(4)	2023-02-02 19:04:05 +01:00
Martin Kroeker	a0a4f7c447	Add -mfma to -mavx2 for clang, and add AVX2 declaration for Zen in DYNAMIC_ARCH builds	2022-09-13 22:47:00 +02:00
Martin Kroeker	85fd3c4279	Support compilation with the Cray C and Fortran compilers (#3712 ) * Add support for the Cray Fortran compiler	2022-08-04 20:42:18 +02:00
Martin Kroeker	18b19d135b	C_LAPACK: Fixes to make it compile with MSVC (#3605 ) * Fix f2c-like support functions to compile with MSVC, and re-enable C_LAPACK for MSVC in CMAKE * Add MSVC&flang build to Azure CI in order to check C_LAPACK correctness	2022-04-17 17:49:38 +02:00
Martin Kroeker	b7873605d4	Use f2c translations of LAPACK when no Fortran compiler is available (#3539 ) * Add C equivalents of the Fortran routines from Reference-LAPACK as fallbacks, and C_LAPACK variable to trigger their use	2022-04-09 22:38:58 +02:00
Rafael Cardoso Fernandes Sousa	d38110a5ce	Use CMake variables instead of as	2021-12-10 17:46:53 -06:00
Rafael Cardoso Fernandes Sousa	214fbcee15	Fix cmake for power	2021-12-09 08:28:17 -06:00
Markus Mützel	de2ed66596	cmake: Set SUFFIX64 also for NOFORTRAN	2021-11-15 08:53:52 +01:00
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Martin Kroeker	e02df9fc55	Propagate BUILD_BFLOAT16 to CFLAGS	2021-09-14 16:12:27 +02:00
Wangyang Guo	76ea8db4da	Small Matrix: enable by default for x86_64 arch If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.	2021-08-05 02:59:36 +00:00
Wangyang Guo	fee5abd84b	Small Matrix: support cmake build	2021-08-04 08:50:15 +00:00
Martin Kroeker	30f23be0f9	Rework setting of -mfma to only apply it where necessary	2021-07-22 12:00:03 +02:00
User User-User	91e2b11d3c	add to cmake listings too	2021-06-20 15:32:42 +02:00
刘雨培	725432efaa	pass NO_AVX512 macro def	2021-04-07 00:10:41 +08:00
Martin Kroeker	33b5670122	Merge pull request #3096 from martin-frbg/fixclangcmake Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows	2021-02-02 13:33:15 +01:00
Martin Kroeker	95e19e2e23	fix case in compiler name check Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>	2021-02-02 10:53:46 +01:00
Martin Kroeker	99ac042702	remove spurious lines (probably editor malfunction)	2021-02-01 21:02:53 +01:00
Martin Kroeker	774b9f8653	handle AppleClang in Cooperlake support condition	2021-02-01 20:18:53 +01:00
Martin Kroeker	eb1d2344f7	Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)	2021-02-01 19:45:25 +01:00
xoviat	b60de4447a	add cortex-m platform	2021-01-19 08:57:44 -06:00
Martin Kroeker	438a8e5624	Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds	2020-11-07 20:26:12 +01:00
Martin Kroeker	0155cd53a3	Add -msse3 where needed for DYNAMIC_ARCH builds	2020-11-03 23:45:49 +01:00
Martin Kroeker	b9bc76aec4	Add files via upload	2020-11-02 22:43:50 +01:00
Martin Kroeker	f64243ff57	Add compiler options for sse/sse2/ssse3/sse4.1	2020-10-16 10:47:06 +02:00
Martin Kroeker	e3a29f6b58	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:07:37 +02:00
Martin Kroeker	68e6823d36	Adapt for supporting only a subset of variable types	2020-10-11 15:01:32 +02:00
Martin Kroeker	e1b7123bbe	Merge pull request #2867 from Qiyu8/usimd-floatdot Optimize the performance of dot by using universal intrinsics in X86/ARM	2020-10-10 12:10:25 +02:00
Qiyu8	f32d34a015	add sse3 compiler flag	2020-10-10 10:36:15 +08:00
Martin Kroeker	a5feea6611	make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows	2020-10-04 23:01:06 +02:00
Martin Kroeker	c4aeeeb9f4	Activate all BUILD_ options if none was specified	2020-09-15 23:15:34 +02:00
Martin Kroeker	26792d2096	Copy BUILD_* directives to the compiler options to allow ifdef in tests	2020-09-13 21:47:55 +02:00
Martin Kroeker	68b1713c30	Merge pull request #2811 from martin-frbg/issue2806 Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well	2020-09-01 17:19:14 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	6876221cf3	Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead	2020-06-14 17:40:24 +02:00
Martin Kroeker	3ce469a34f	Limit optimization level to O1 for flang and add -frecursive	2020-06-09 16:11:13 +02:00
Martin Kroeker	bb12c2c854	Limit MAX_STACK_ALLOC availability to non-Wndows	2020-06-04 19:07:27 +02:00
Martin Kroeker	6e97df7b47	Add CMAKE support for MAX_STACK_ALLOC setting	2020-06-04 14:45:31 +02:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	7f0d523b42	Make BUFFER_SIZE configurable	2020-02-09 23:32:57 +01:00

1 2 3

102 Commits