OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	c4aeeeb9f4	Activate all BUILD_ options if none was specified	2020-09-15 23:15:34 +02:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Martin Kroeker	26792d2096	Copy BUILD_* directives to the compiler options to allow ifdef in tests	2020-09-13 21:47:55 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	68b1713c30	Merge pull request #2811 from martin-frbg/issue2806 Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well	2020-09-01 17:19:14 +02:00
Martin Kroeker	0a4c5c4c44	Merge pull request #2807 from martin-frbg/issue2804 Work around ARMV8 build-time cpu detection problems on non-Linux systems	2020-08-31 23:44:56 +02:00
Martin Kroeker	5feb087c05	Handle Apple labeling armv8 as arm64 rather than aarch64	2020-08-31 20:02:08 +02:00
Martin Kroeker	7c0977c267	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:53:44 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	7c1986640b	fallback from cooperlake to skylake if gcc<10	2020-08-19 20:48:39 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Martin Kroeker	430f741b30	-march=cooperlake requires gcc10	2020-08-19 17:17:53 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	cb097beba2	Merge pull request #2741 from martin-frbg/issue2739 Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel	2020-07-29 10:01:14 +02:00
Martin Kroeker	64e2e4aaf3	missing braces	2020-07-27 20:19:22 +00:00
Martin Kroeker	921ec4e9e2	Adjust A53 SGEMM parameters to reflect move to 8x8 kernel	2020-07-27 19:54:46 +00:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	9e21a100e3	Add trivial check for stdatomic.h	2020-07-20 22:52:09 +00:00
Martin Kroeker	9d000ecaa2	include CheckLanguage module	2020-07-16 22:36:35 +00:00
Martin Kroeker	a847d00366	handle missing lack of fortran compiler more gracefully	2020-07-16 22:17:39 +00:00
Martin Kroeker	6eaeb01263	Merge pull request #2658 from RajalakshmiSR/p10 powerpc: Add support for future processor	2020-06-23 00:02:37 +02:00
Martin Kroeker	6876221cf3	Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead	2020-06-14 17:40:24 +02:00
Martin Kroeker	1dd712131e	Fix spelling of flang option -Mrecursive and add -Kieee	2020-06-14 00:09:31 +02:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Martin Kroeker	3ce469a34f	Limit optimization level to O1 for flang and add -frecursive	2020-06-09 16:11:13 +02:00
Martin Kroeker	79cd69fea4	Merge pull request #2644 from martin-frbg/cmake-maxstack Add CMAKE support for MAX_STACK_ALLOC setting	2020-06-05 08:33:48 +02:00
Martin Kroeker	bb12c2c854	Limit MAX_STACK_ALLOC availability to non-Wndows	2020-06-04 19:07:27 +02:00
Martin Kroeker	6e97df7b47	Add CMAKE support for MAX_STACK_ALLOC setting	2020-06-04 14:45:31 +02:00
Martin Kroeker	4db00121dc	Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)	2020-05-31 12:39:36 +02:00
Martin Kroeker	cd10b35fe9	Handle trailing spaces and empty condition variables	2020-05-09 13:42:33 +02:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Martin Kroeker	3bd56846bb	Silence a debug message	2020-04-27 16:27:09 +02:00
Martin Kroeker	e7bbdfdf84	Have CMAKE parse conditional lines in KERNEL files Supports ifeq and ifneq, but requires both to have an else branch	2020-04-27 15:20:03 +02:00
Martin Kroeker	70869d571f	Quote include paths for getarch to protect any embedded spaces	2020-04-24 10:30:44 +02:00
Martin Kroeker	4f70512b97	Update kernel.cmake	2020-04-19 08:10:26 +02:00
Martin Kroeker	d0737b0142	Update kernel.cmake	2020-04-18 21:36:28 +02:00
Martin Kroeker	a83a59b038	Use generic kernels for ishama,shasum,shdot,shrot	2020-04-18 15:53:51 +02:00
Martin Kroeker	0a19bd813c	Use generic codes for shamax and shcopy	2020-04-18 12:52:51 +02:00
Martin Kroeker	f361de30a3	Use generic axpy.c for SHAXPY as x86 lacks saxpy.c	2020-04-18 11:07:16 +02:00
Martin Kroeker	9f6d6f6cb6	use saxpy.c instead of axpy.S for SHAXPY	2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan	22bb50fb81	cmake fixes	2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	a05243d0f2	ifort and pgfort need "recursive" for compiling LAPACK as well as shown in Reference-LAPACK issue 401 (their PR 403)	2020-04-01 15:38:07 +02:00
Martin Kroeker	8c7c1395da	Merge pull request #2521 from martin-frbg/cm-avx512 Use proper extension on the avx512 testcase filename	2020-03-22 01:03:42 +01:00
Martin Kroeker	1d9773b800	Use proper extension on the avx512 testcase filename The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)	2020-03-20 23:05:53 +01:00
Martin Kroeker	6d54c94760	Make ifort on Windows create lowercase symbols with appended underscore tentative fix for #2472	2020-03-20 01:08:10 +01:00
مهدي شينون (Mehdi Chinoune)	21f6c4b5a9	fixes #2480	2020-03-02 17:22:28 +01:00
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00

1 2 3 4 5

245 Commits