OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	eddc65c7b7	Add POWER10 support flag (unconditionally for now)	2020-10-20 01:09:49 +02:00
Martin Kroeker	f5902ab0a1	Support cross-compiling for Apple Vortex	2020-10-18 19:10:58 +02:00
Martin Kroeker	f64243ff57	Add compiler options for sse/sse2/ssse3/sse4.1	2020-10-16 10:47:06 +02:00
Martin Kroeker	786c0a3ce8	Add sse options for use of intrinics with older compilers	2020-10-16 10:41:53 +02:00
Martin Kroeker	756802df61	Merge pull request #2890 from martin-frbg/s-d-sum Revert special handling of Windows xNRM2 and enable C+intrinsics kern…	2020-10-14 09:02:03 +02:00
Martin Kroeker	75e3a92df6	Add express -mavx and -msse options (and fix a stray = for cooperlake)	2020-10-14 01:01:58 +02:00
Martin Kroeker	e3a29f6b58	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:07:37 +02:00
Martin Kroeker	68e6823d36	Adapt for supporting only a subset of variable types	2020-10-11 15:01:32 +02:00
Martin Kroeker	88928650c4	Merge pull request #2883 from martin-frbg/issue2872 Minor CMAKE fixes	2020-10-11 10:30:33 +02:00
Martin Kroeker	82a497ec5d	restore PRESCOTT default for DYNAMIC_LIST	2020-10-11 00:43:09 +02:00
Martin Kroeker	de27e4f5fb	Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined	2020-10-11 00:40:22 +02:00
Martin Kroeker	e1b7123bbe	Merge pull request #2867 from Qiyu8/usimd-floatdot Optimize the performance of dot by using universal intrinsics in X86/ARM	2020-10-10 12:10:25 +02:00
Qiyu8	f32d34a015	add sse3 compiler flag	2020-10-10 10:36:15 +08:00
Martin Kroeker	a5feea6611	make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows	2020-10-04 23:01:06 +02:00
Martin Kroeker	2367726578	Remove redundant status message	2020-09-30 23:28:49 +02:00
Martin Kroeker	c4aeeeb9f4	Activate all BUILD_ options if none was specified	2020-09-15 23:15:34 +02:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Martin Kroeker	26792d2096	Copy BUILD_* directives to the compiler options to allow ifdef in tests	2020-09-13 21:47:55 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	68b1713c30	Merge pull request #2811 from martin-frbg/issue2806 Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well	2020-09-01 17:19:14 +02:00
Martin Kroeker	0a4c5c4c44	Merge pull request #2807 from martin-frbg/issue2804 Work around ARMV8 build-time cpu detection problems on non-Linux systems	2020-08-31 23:44:56 +02:00
Martin Kroeker	5feb087c05	Handle Apple labeling armv8 as arm64 rather than aarch64	2020-08-31 20:02:08 +02:00
Martin Kroeker	7c0977c267	Add OpenMP dependency to pkgconfig file if needed	2020-08-22 13:53:44 +02:00
Martin Kroeker	bd3207b4b4	Update system.cmake	2020-08-19 22:51:10 +02:00
Martin Kroeker	b8ebfc9335	Update system.cmake	2020-08-19 22:30:19 +02:00
Martin Kroeker	7c1986640b	fallback from cooperlake to skylake if gcc<10	2020-08-19 20:48:39 +02:00
Martin Kroeker	71d33c952d	Typo fix	2020-08-19 17:44:23 +02:00
Martin Kroeker	6a3c074786	-march=cooperlake requires gcc10	2020-08-19 17:22:12 +02:00
Martin Kroeker	430f741b30	-march=cooperlake requires gcc10	2020-08-19 17:17:53 +02:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	cb097beba2	Merge pull request #2741 from martin-frbg/issue2739 Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel	2020-07-29 10:01:14 +02:00
Martin Kroeker	64e2e4aaf3	missing braces	2020-07-27 20:19:22 +00:00
Martin Kroeker	921ec4e9e2	Adjust A53 SGEMM parameters to reflect move to 8x8 kernel	2020-07-27 19:54:46 +00:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	9e21a100e3	Add trivial check for stdatomic.h	2020-07-20 22:52:09 +00:00
Martin Kroeker	9d000ecaa2	include CheckLanguage module	2020-07-16 22:36:35 +00:00
Martin Kroeker	a847d00366	handle missing lack of fortran compiler more gracefully	2020-07-16 22:17:39 +00:00
Martin Kroeker	6eaeb01263	Merge pull request #2658 from RajalakshmiSR/p10 powerpc: Add support for future processor	2020-06-23 00:02:37 +02:00
Martin Kroeker	6876221cf3	Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead	2020-06-14 17:40:24 +02:00
Martin Kroeker	1dd712131e	Fix spelling of flang option -Mrecursive and add -Kieee	2020-06-14 00:09:31 +02:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Martin Kroeker	3ce469a34f	Limit optimization level to O1 for flang and add -frecursive	2020-06-09 16:11:13 +02:00
Martin Kroeker	79cd69fea4	Merge pull request #2644 from martin-frbg/cmake-maxstack Add CMAKE support for MAX_STACK_ALLOC setting	2020-06-05 08:33:48 +02:00
Martin Kroeker	bb12c2c854	Limit MAX_STACK_ALLOC availability to non-Wndows	2020-06-04 19:07:27 +02:00
Martin Kroeker	6e97df7b47	Add CMAKE support for MAX_STACK_ALLOC setting	2020-06-04 14:45:31 +02:00
Martin Kroeker	4db00121dc	Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux)	2020-05-31 12:39:36 +02:00
Martin Kroeker	cd10b35fe9	Handle trailing spaces and empty condition variables	2020-05-09 13:42:33 +02:00
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	2020-05-01 09:58:30 +02:00
Martin Kroeker	3bd56846bb	Silence a debug message	2020-04-27 16:27:09 +02:00
Martin Kroeker	e7bbdfdf84	Have CMAKE parse conditional lines in KERNEL files Supports ifeq and ifneq, but requires both to have an else branch	2020-04-27 15:20:03 +02:00
Martin Kroeker	70869d571f	Quote include paths for getarch to protect any embedded spaces	2020-04-24 10:30:44 +02:00
Martin Kroeker	4f70512b97	Update kernel.cmake	2020-04-19 08:10:26 +02:00
Martin Kroeker	d0737b0142	Update kernel.cmake	2020-04-18 21:36:28 +02:00
Martin Kroeker	a83a59b038	Use generic kernels for ishama,shasum,shdot,shrot	2020-04-18 15:53:51 +02:00
Martin Kroeker	0a19bd813c	Use generic codes for shamax and shcopy	2020-04-18 12:52:51 +02:00
Martin Kroeker	f361de30a3	Use generic axpy.c for SHAXPY as x86 lacks saxpy.c	2020-04-18 11:07:16 +02:00
Martin Kroeker	9f6d6f6cb6	use saxpy.c instead of axpy.S for SHAXPY	2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan	22bb50fb81	cmake fixes	2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	2020-04-14 14:55:08 -05:00
Martin Kroeker	a05243d0f2	ifort and pgfort need "recursive" for compiling LAPACK as well as shown in Reference-LAPACK issue 401 (their PR 403)	2020-04-01 15:38:07 +02:00
Martin Kroeker	8c7c1395da	Merge pull request #2521 from martin-frbg/cm-avx512 Use proper extension on the avx512 testcase filename	2020-03-22 01:03:42 +01:00
Martin Kroeker	1d9773b800	Use proper extension on the avx512 testcase filename The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)	2020-03-20 23:05:53 +01:00
Martin Kroeker	6d54c94760	Make ifort on Windows create lowercase symbols with appended underscore tentative fix for #2472	2020-03-20 01:08:10 +01:00
مهدي شينون (Mehdi Chinoune)	21f6c4b5a9	fixes #2480	2020-03-02 17:22:28 +01:00
Ali Saidi	c623a965f9	Add Neoverse-N1 core The implementation is a hybird of the ARMV8 one with some of the improved TX2 rountines along with specifying -march=v8.2-a	2020-02-29 03:22:04 +00:00
Martin Kroeker	ca4f7dceff	Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake	2020-02-24 20:23:18 +01:00
Martin Kroeker	1ddf9f1067	Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake	2020-02-24 20:16:18 +01:00
Martin Kroeker	7f0d523b42	Make BUFFER_SIZE configurable	2020-02-09 23:32:57 +01:00
Martin Kroeker	8dc9fd4dfe	Add -march option for AVX512	2020-01-30 12:41:18 +01:00
Martin Kroeker	375b1875c8	[WIP] Update LAPACK to 3.9.0 (#2353 ) * Update make.inc entries for LAPACK 3.9.0 Reference-LAPACK PR 347 changed some variable names and relative paths * Update LAPACK to 3.9.0 * Add new functions from LAPACK 3.9.0 * Add new functions from LAPACK 3.9.0 * Restore LOADER command as it makes it easier to specify pthread as needed * Restore LOADER * Restore EIG/LIN prefixes in cmdbase * add binary path to lapack_testing.py call * Restore OpenMP version check * Restore OpenMP version check * Restore fix for out-of-bounds array accesses from #2096	2020-01-01 13:18:53 +01:00
Martin Kroeker	a4896b5538	Update DYNAMIC_ARCH support for ARM64 and PPC (#2332 ) * Update DYNAMIC_ARCH list of ARM64 targets for gmake * Update arm64 cpu list for runtime detection * Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets	2019-12-04 11:06:03 +01:00
k.dunikowski	8691825944	Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty	2019-10-28 08:51:05 +01:00
Martin Kroeker	eb45eb6942	Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248 ) * Fix compiler identification and option setting * Handle BINARY=32 option on X86_64 * Add xGEMM3M unroll parameters for crossbuild-target CORE2 * Replace bogus mingw64/32bit CI job with actual 32bit build mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).	2019-09-10 08:27:06 +02:00
Martin Kroeker	fde8a8e6a0	Improve cmake build behaviour with non-host cpu targets (#2246 ) 1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7 2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2 3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target for #2245	2019-09-03 22:41:17 +02:00
Martin Kroeker	1fec0570f6	Add cgemm and zgemm unroll factors for core2	2019-09-02 15:03:45 +02:00
Martin Kroeker	bf0d92a310	Add arch data for cross-compiling to CORE2 for #2235	2019-08-28 17:35:56 +02:00
Martin Kroeker	e3d846ab57	Do not use -march=native with the PGI compiler	2019-08-16 08:58:10 +02:00
Tyler Reddy	3f6ab1582a	MAINT: remove legacy CMake endif() * clean up a case where CMake endif() contained the conditional used in the if(), which is no longer needed / discouraged since our minimum required CMake version supports the modern syntax	2019-07-22 21:24:57 -06:00
Martin Kroeker	8fb76134bc	Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)	2019-07-06 15:07:15 +02:00
Martin Kroeker	04d671aae2	Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect	2019-07-06 15:05:04 +02:00
Martin Kroeker	f69a0be712	Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour)	2019-07-06 15:02:39 +02:00
Martin Kroeker	ece0bfb881	Merge pull request #2158 from martin-frbg/issue2143 Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds	2019-06-10 14:08:11 +02:00
Martin Kroeker	1f4b6a5d5d	Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.	2019-06-10 09:50:13 +02:00
Martin Kroeker	be8f70d269	Merge pull request #2157 from martin-frbg/2154-2 Add gfortran workaround for potential ABI violation	2019-06-09 12:19:08 +02:00
Martin Kroeker	e674e1c735	Update fc.cmake	2019-06-09 09:31:13 +02:00
Martin Kroeker	6ca898b63b	Add gfortran workaround for potential ABI violation for #2154	2019-06-08 23:17:03 +02:00
Michael Lass	7a9a4dbc4f	Fix detection of AVX512 capable compilers in getarch `21eda8b5` introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway.	2019-06-05 17:30:56 +02:00
Martin Kroeker	1e52572be3	Add option USE_LOCKING for single-threaded build with locking support	2019-05-15 23:19:30 +02:00
luz.paz	daf2fec12d	Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`	2019-04-29 17:03:56 -04:00
Martin Kroeker	ccfb7ead15	Merge pull request #2072 from martin-frbg/sum Add (C)BLAS extension ?sum	2019-04-23 20:11:36 +02:00
Martin Kroeker	e06b8438b4	Merge pull request #2080 from martin-frbg/issue2075 Add -lm and disable EXPRECISION support on *BSD	2019-04-02 21:40:58 +02:00
Martin Kroeker	9229d6859b	Add -lm and disable EXPRECISION support on *BSD fixes #2075	2019-04-02 09:38:18 +02:00
Martin Kroeker	d17da6c6a4	Add cmake defaults for ?sum kernels	2019-03-31 11:57:01 +02:00
Martin Kroeker	1679de5e59	Detect 32bit environment on 64bit ARM hardware for #2056, using same approach as #2058	2019-03-31 10:50:43 +02:00
Sacha	c3e30b2bc2	Change 64-bit detection as explained in #2056	2019-03-13 23:21:54 +10:00
Martin Kroeker	fd34820b99	Fix AVX512 test always returning false due to missing compiler option	2019-02-25 17:58:31 +01:00
Martin Kroeker	5952e586ce	Support DYNAMIC_LIST option in cmake e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" .. original issue was #1639	2019-02-05 23:51:40 +01:00
Martin Kroeker	58dd7e4501	Change ARMV8 target to ARMV7 for BINARY=32	2019-01-26 17:52:33 +01:00
Martin Kroeker	802f0dbde1	More fixes for cross-compiling ARM64 targets Fixed core naming for DYNAMIC_ARCH. Corrected GEMM_DEFAULT entries and added SYMV_P. Replaced outdated VULCAN define for ThunderX2T99 with ARMV8 to get basic definitions back. For issue #1908	2019-01-03 22:17:31 +01:00
Martin Kroeker	20d1aad13f	Fix missing quotes around thunderx targets	2019-01-02 20:15:35 +01:00
Martin Kroeker	e1eab96502	Merge pull request #1931 from martin-frbg/pr1921 Add -mavx2 to TARGET=HASWELL builds	2018-12-23 23:15:54 +01:00
Martin Kroeker	76b4b8980f	Use -dumpversion with gcc only	2018-12-23 19:08:19 +01:00
Martin Kroeker	49e0f485da	Add -mavx2 for TARGET=HASWELL if compiler supports and requires it	2018-12-23 17:26:09 +01:00
Martin Kroeker	26a3402773	Reflect ARMV8 target definition changes from PR1876 and create config target directory for cross-compiles.	2018-12-23 12:26:01 +01:00
Martin Kroeker	133c278ee5	Add DYNAMIC_CORE list for ARM64 cf #1908	2018-12-07 17:42:23 +01:00
Martin Kroeker	dceff5542c	Handle Android environments that identify as Linux (#1898 ) * Handle Android environments that identify as Linux termux terminal emulator does this, causing build failures through missed defines in common.h	2018-12-01 20:56:11 +01:00
Martin Kroeker	081ceb3e02	Propagate version number for openblas_get_config	2018-11-29 00:12:04 +01:00
Andrew	40cce0e353	handle cmake too	2018-11-06 09:45:49 +00:00
Martin Kroeker	2263d3906c	Merge pull request #1812 from martin-frbg/issue1806-2 Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake…	2018-10-11 21:51:31 +02:00
Martin Kroeker	81c9985c3a	Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512	2018-10-11 11:03:27 +02:00
Martin Kroeker	56ebc7b53e	Merge pull request #1808 from martin-frbg/issue1806 Add -march=skylake-avx512 to CFLAGS when the target is Skylake	2018-10-11 07:48:08 +02:00
Martin Kroeker	8a11ec19d1	Syntax fix	2018-10-10 23:47:35 +02:00
Martin Kroeker	fa53b903db	Add -march=skylake-avx512 to CFLAGS when the target is Skylake Should fix 1806 and #1801	2018-10-10 19:22:01 +02:00
Martin Kroeker	84bcdf9c66	Revert "Add -march=skylake-avx512 when required"	2018-10-10 19:15:32 +02:00
Martin Kroeker	a9b51b8448	Merge pull request #1798 from martin-frbg/cmake-avx512 Add -march=skylake-avx512 when required	2018-10-08 21:15:17 +02:00
Martin Kroeker	eba394c711	Add -march=skylake-avx512 when required fixes #1797	2018-10-08 19:18:12 +02:00
Martin Kroeker	02ef20a1e4	Merge pull request #1786 from martin-frbg/immintrin Check for Immintrin.h presence in the AVX512 compatibility test as well	2018-10-04 09:07:09 +02:00
Martin Kroeker	4c3643ed7f	Check availability of immintrin.h in the AVX512 compatibility test	2018-10-04 07:36:49 +02:00
Yuri	2349e15149	Allow to install the 'interfare64' version concurrently with the regular version	2018-09-15 21:00:03 -07:00
Martin Kroeker	b1b743f434	Merge branch 'develop' into interim033	2018-08-25 19:45:19 +02:00
Martin Kroeker	2a589c4b28	Add USE_TLS option to switch between old and new memory.c	2018-08-25 19:36:12 +02:00
Martin Kroeker	25f2d25cfe	Merge pull request #1697 from martin-frbg/issue1696 Do not treat WIndows UWB builds as cross-compiling	2018-07-25 19:55:29 +02:00
Martin Kroeker	73131fa30a	Do not treat WIndows UWB builds as cross-compiling	2018-07-24 17:46:33 +02:00
Martin Kroeker	b74aef2816	Add -march=skylake-avx512 to AVX512 compile check and suppress its output	2018-07-03 14:41:44 +02:00
Martin Kroeker	26e1cfb653	Merge pull request #1607 from martin-frbg/dynarch Move some x86_64 DYNAMIC_ARCH targets to new DYNAMIC_OLDER option	2018-06-14 16:52:55 +02:00
Martin Kroeker	02634b549b	Add template for OpenBLASConfig.cmake	2018-06-10 09:25:46 +02:00
Martin Kroeker	1cbd8f3ae4	Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option	2018-06-09 16:30:46 +02:00
Martin Kroeker	cf234a0561	Merge pull request #1589 from fenrus75/skylakex Initial support for SkylakeX / AVX512	2018-06-06 22:07:09 +02:00
Martin Kroeker	e4718b1fee	Better AVX512 test case	2018-06-06 16:51:30 +02:00
Martin Kroeker	7fb62aed7e	Check build system support for AVX512 instructions	2018-06-05 23:29:33 +02:00
Arjan van de Ven	99c7bba8e4	Initial support for SkylakeX / AVX512 This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server) target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set, which brings 2 basic things: 1) 512 bit wide SIMD (2x width of AVX2) 2) 32 SIMD registers (2x the number on AVX2) This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel to AVX512VL; more will follow later but this patch aims to get the infrastructure in place for this "later". Full performance tuning has not been done yet; with more registers and wider SIMD it's in theory possible to retune the kernels but even without that there's an interesting enough performance increase (30-40% range) with just this change.	2018-06-03 07:58:52 +00:00
Martin Kroeker	6791294312	Merge pull request #1559 from martin-frbg/buildconf Add build-time configuration options to pkgconfig file	2018-05-14 18:49:53 +02:00
Martin Kroeker	7d7564568c	Add build-time configuration options to pkgconfig file	2018-05-14 00:09:35 +02:00
Zhiyong Dang	1b83341d19	Fix race condition in blas_server_omp.c Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d	2018-04-27 17:00:42 +08:00
Sacha	f81815e48a	Fix CMake cross-compiling Without specifying thread count, NUM_THREADS would not be defined and CMake would fail. This is because core count cannot be determined when cross-compiling.	2018-02-28 10:25:25 +10:00
xoviat	038bfbb86c	CMake: Remove unused wall option when FC=flang	2018-01-26 14:09:48 -06:00
Martin Kroeker	599de9e598	Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp These were inadvertently dropped from the list in my PR #1095	2017-12-21 19:43:09 +01:00
Martin Kroeker	0dc291d3fa	Merge pull request #1377 from isuruf/threads Allow overriding NUM_THREADS in cmake	2017-12-01 16:22:35 +01:00
Isuru Fernando	e0ddd7d124	Allow overriding NUM_THREADS	2017-12-01 01:42:45 -06:00
martin	5056a044b2	fix location of lapacke_nancheck	2017-11-24 09:15:20 +01:00
martin	4054d32def	update cmake files	2017-11-24 08:15:40 +01:00
martin	2d52f0f4c3	update cmakefiles for lapack 3.8.0	2017-11-23 21:22:01 +01:00
Ian Henriksen	505dc08635	Update lapacke.cmake with routines added in LAPACK 3.7.0.	2017-11-06 14:43:33 -06:00
Ian Henriksen	61587b0670	Update lapack.cmake with additional routines from LAPACK version 3.7.0.	2017-11-06 14:41:02 -06:00
Ian Henriksen	632fc75d77	Allow using compilers other than gfortran in conjunction with MSVC or clang-cl.	2017-11-06 14:39:12 -06:00
Martin Kroeker	962b20a9bb	Optionally add ReLAPACK to LIB_COMPONENTS	2017-10-12 17:02:01 +02:00
Martin Kroeker	c7a8512d12	Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323 ) * prebuild.cmake: Put quotes around path names that may contain whitespace (Copied from alexkaratakis' PR #1295) * kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files * lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions	2017-10-09 23:34:18 +02:00
Sacha	7a867082d8	Fix open_blas.config which was never working out-of-source. Remove need for gen_config_h.exe. If OpenMP is requested, do not silently ignore when it isn't available.	2017-08-23 11:16:24 +10:00
Sacha Refshauge	47ebce4d1a	Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position.	2017-08-21 00:37:29 +10:00
Sacha Refshauge	69b560751c	Improvements to previous commit (cross-compile). Fix typos and bad if statements discovered in 0.2.20.	2017-08-20 22:50:31 +10:00

1 2 3 4 5 ...

360 Commits