Commit Graph

360 Commits

Author SHA1 Message Date
Martin Kroeker eddc65c7b7
Add POWER10 support flag (unconditionally for now) 2020-10-20 01:09:49 +02:00
Martin Kroeker f5902ab0a1
Support cross-compiling for Apple Vortex 2020-10-18 19:10:58 +02:00
Martin Kroeker f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker 786c0a3ce8
Add sse options for use of intrinics with older compilers 2020-10-16 10:41:53 +02:00
Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker 75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake) 2020-10-14 01:01:58 +02:00
Martin Kroeker e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker 68e6823d36
Adapt for supporting only a subset of variable types 2020-10-11 15:01:32 +02:00
Martin Kroeker 88928650c4
Merge pull request #2883 from martin-frbg/issue2872
Minor CMAKE fixes
2020-10-11 10:30:33 +02:00
Martin Kroeker 82a497ec5d
restore PRESCOTT default for DYNAMIC_LIST 2020-10-11 00:43:09 +02:00
Martin Kroeker de27e4f5fb
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
2020-10-11 00:40:22 +02:00
Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8 f32d34a015 add sse3 compiler flag 2020-10-10 10:36:15 +08:00
Martin Kroeker a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows 2020-10-04 23:01:06 +02:00
Martin Kroeker 2367726578
Remove redundant status message 2020-09-30 23:28:49 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker 68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker 0a4c5c4c44
Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker 5feb087c05
Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker 64e2e4aaf3
missing braces 2020-07-27 20:19:22 +00:00
Martin Kroeker 921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel 2020-07-27 19:54:46 +00:00
Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker 9e21a100e3
Add trivial check for stdatomic.h 2020-07-20 22:52:09 +00:00
Martin Kroeker 9d000ecaa2
include CheckLanguage module 2020-07-16 22:36:35 +00:00
Martin Kroeker a847d00366
handle missing lack of fortran compiler more gracefully 2020-07-16 22:17:39 +00:00
Martin Kroeker 6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker 79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker 6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker 4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker cd10b35fe9
Handle trailing spaces and empty condition variables 2020-05-09 13:42:33 +02:00
Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker 3bd56846bb
Silence a debug message 2020-04-27 16:27:09 +02:00
Martin Kroeker e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker 70869d571f
Quote include paths for getarch to protect any embedded spaces 2020-04-24 10:30:44 +02:00
Martin Kroeker 4f70512b97
Update kernel.cmake 2020-04-19 08:10:26 +02:00
Martin Kroeker d0737b0142
Update kernel.cmake 2020-04-18 21:36:28 +02:00
Martin Kroeker a83a59b038
Use generic kernels for ishama,shasum,shdot,shrot 2020-04-18 15:53:51 +02:00
Martin Kroeker 0a19bd813c
Use generic codes for shamax and shcopy 2020-04-18 12:52:51 +02:00
Martin Kroeker f361de30a3
Use generic axpy.c for SHAXPY as x86 lacks saxpy.c 2020-04-18 11:07:16 +02:00
Martin Kroeker 9f6d6f6cb6
use saxpy.c instead of axpy.S for SHAXPY 2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan 22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker 8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker 1d9773b800
Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker 6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
2020-03-20 01:08:10 +01:00
مهدي شينون (Mehdi Chinoune) 21f6c4b5a9 fixes #2480 2020-03-02 17:22:28 +01:00
Ali Saidi c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
Martin Kroeker ca4f7dceff
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake 2020-02-24 20:23:18 +01:00
Martin Kroeker 1ddf9f1067
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake 2020-02-24 20:16:18 +01:00
Martin Kroeker 7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker 8dc9fd4dfe
Add -march option for AVX512 2020-01-30 12:41:18 +01:00
Martin Kroeker 375b1875c8
[WIP] Update LAPACK to 3.9.0 (#2353)
* Update make.inc entries for LAPACK 3.9.0

Reference-LAPACK PR 347 changed some variable names and relative paths

* Update LAPACK to 3.9.0

* Add new functions from LAPACK 3.9.0

* Add new functions from LAPACK 3.9.0

* Restore LOADER command 

as it makes it easier to specify pthread as needed

* Restore LOADER

* Restore EIG/LIN prefixes in cmdbase

* add binary path to lapack_testing.py call

* Restore OpenMP version check

* Restore OpenMP version check

* Restore fix for out-of-bounds array accesses

from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
k.dunikowski 8691825944 Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty 2019-10-28 08:51:05 +01:00
Martin Kroeker eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Martin Kroeker fde8a8e6a0
Improve cmake build behaviour with non-host cpu targets (#2246)
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker 1fec0570f6
Add cgemm and zgemm unroll factors for core2 2019-09-02 15:03:45 +02:00
Martin Kroeker bf0d92a310
Add arch data for cross-compiling to CORE2
for #2235
2019-08-28 17:35:56 +02:00
Martin Kroeker e3d846ab57
Do not use -march=native with the PGI compiler 2019-08-16 08:58:10 +02:00
Tyler Reddy 3f6ab1582a MAINT: remove legacy CMake endif()
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
2019-07-22 21:24:57 -06:00
Martin Kroeker 8fb76134bc
Mingw32 needs leading underscore on object names
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
2019-07-06 15:07:15 +02:00
Martin Kroeker 04d671aae2
Make disabling DYNAMIC_ARCH on unsupported systems work
needs to be unset in the cache for the change to have any effect
2019-07-06 15:05:04 +02:00
Martin Kroeker f69a0be712
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Martin Kroeker ece0bfb881
Merge pull request #2158 from martin-frbg/issue2143
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
2019-06-10 14:08:11 +02:00
Martin Kroeker 1f4b6a5d5d
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
2019-06-10 09:50:13 +02:00
Martin Kroeker be8f70d269
Merge pull request #2157 from martin-frbg/2154-2
Add gfortran workaround for potential ABI violation
2019-06-09 12:19:08 +02:00
Martin Kroeker e674e1c735
Update fc.cmake 2019-06-09 09:31:13 +02:00
Martin Kroeker 6ca898b63b
Add gfortran workaround for potential ABI violation
for #2154
2019-06-08 23:17:03 +02:00
Michael Lass 7a9a4dbc4f Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
2019-06-05 17:30:56 +02:00
Martin Kroeker 1e52572be3
Add option USE_LOCKING for single-threaded build with locking support 2019-05-15 23:19:30 +02:00
luz.paz daf2fec12d Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Martin Kroeker e06b8438b4
Merge pull request #2080 from martin-frbg/issue2075
Add -lm and disable EXPRECISION support on *BSD
2019-04-02 21:40:58 +02:00
Martin Kroeker 9229d6859b
Add -lm and disable EXPRECISION support on *BSD
fixes #2075
2019-04-02 09:38:18 +02:00
Martin Kroeker d17da6c6a4
Add cmake defaults for ?sum kernels 2019-03-31 11:57:01 +02:00
Martin Kroeker 1679de5e59
Detect 32bit environment on 64bit ARM hardware
for #2056, using same approach as #2058
2019-03-31 10:50:43 +02:00
Sacha c3e30b2bc2
Change 64-bit detection as explained in #2056 2019-03-13 23:21:54 +10:00
Martin Kroeker fd34820b99
Fix AVX512 test always returning false due to missing compiler option 2019-02-25 17:58:31 +01:00
Martin Kroeker 5952e586ce
Support DYNAMIC_LIST option in cmake
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker 58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32 2019-01-26 17:52:33 +01:00
Martin Kroeker 802f0dbde1
More fixes for cross-compiling ARM64 targets
Fixed core naming for DYNAMIC_ARCH. Corrected GEMM_DEFAULT entries and added SYMV_P. Replaced outdated VULCAN define for ThunderX2T99 with ARMV8 to get basic definitions back. For issue #1908
2019-01-03 22:17:31 +01:00
Martin Kroeker 20d1aad13f
Fix missing quotes around thunderx targets 2019-01-02 20:15:35 +01:00
Martin Kroeker e1eab96502
Merge pull request #1931 from martin-frbg/pr1921
Add -mavx2 to TARGET=HASWELL builds
2018-12-23 23:15:54 +01:00
Martin Kroeker 76b4b8980f
Use -dumpversion with gcc only 2018-12-23 19:08:19 +01:00
Martin Kroeker 49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it 2018-12-23 17:26:09 +01:00
Martin Kroeker 26a3402773
Reflect ARMV8 target definition changes from PR1876
and create config target directory for cross-compiles.
2018-12-23 12:26:01 +01:00
Martin Kroeker 133c278ee5
Add DYNAMIC_CORE list for ARM64
cf #1908
2018-12-07 17:42:23 +01:00
Martin Kroeker dceff5542c
Handle Android environments that identify as Linux (#1898)
* Handle Android environments that identify as Linux

termux terminal emulator does this, causing build failures through missed defines in common.h
2018-12-01 20:56:11 +01:00
Martin Kroeker 081ceb3e02
Propagate version number for openblas_get_config 2018-11-29 00:12:04 +01:00
Andrew 40cce0e353 handle cmake too 2018-11-06 09:45:49 +00:00
Martin Kroeker 2263d3906c
Merge pull request #1812 from martin-frbg/issue1806-2
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake…
2018-10-11 21:51:31 +02:00
Martin Kroeker 81c9985c3a
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512 2018-10-11 11:03:27 +02:00
Martin Kroeker 56ebc7b53e
Merge pull request #1808 from martin-frbg/issue1806
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
2018-10-11 07:48:08 +02:00
Martin Kroeker 8a11ec19d1
Syntax fix 2018-10-10 23:47:35 +02:00
Martin Kroeker fa53b903db
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
Should fix 1806 and #1801
2018-10-10 19:22:01 +02:00
Martin Kroeker 84bcdf9c66
Revert "Add -march=skylake-avx512 when required" 2018-10-10 19:15:32 +02:00
Martin Kroeker a9b51b8448
Merge pull request #1798 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 when required
2018-10-08 21:15:17 +02:00
Martin Kroeker eba394c711
Add -march=skylake-avx512 when required
fixes #1797
2018-10-08 19:18:12 +02:00
Martin Kroeker 02ef20a1e4
Merge pull request #1786 from martin-frbg/immintrin
Check for Immintrin.h presence in the AVX512 compatibility test as well
2018-10-04 09:07:09 +02:00
Martin Kroeker 4c3643ed7f
Check availability of immintrin.h in the AVX512 compatibility test 2018-10-04 07:36:49 +02:00
Yuri 2349e15149
Allow to install the 'interfare64' version concurrently with the regular version 2018-09-15 21:00:03 -07:00
Martin Kroeker b1b743f434
Merge branch 'develop' into interim033 2018-08-25 19:45:19 +02:00
Martin Kroeker 2a589c4b28
Add USE_TLS option to switch between old and new memory.c 2018-08-25 19:36:12 +02:00
Martin Kroeker 25f2d25cfe
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
2018-07-25 19:55:29 +02:00
Martin Kroeker 73131fa30a
Do not treat WIndows UWB builds as cross-compiling 2018-07-24 17:46:33 +02:00
Martin Kroeker b74aef2816
Add -march=skylake-avx512 to AVX512 compile check and suppress its output 2018-07-03 14:41:44 +02:00
Martin Kroeker 26e1cfb653
Merge pull request #1607 from martin-frbg/dynarch
Move some x86_64 DYNAMIC_ARCH targets to new DYNAMIC_OLDER option
2018-06-14 16:52:55 +02:00
Martin Kroeker 02634b549b
Add template for OpenBLASConfig.cmake 2018-06-10 09:25:46 +02:00
Martin Kroeker 1cbd8f3ae4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:30:46 +02:00
Martin Kroeker cf234a0561
Merge pull request #1589 from fenrus75/skylakex
Initial support for SkylakeX / AVX512
2018-06-06 22:07:09 +02:00
Martin Kroeker e4718b1fee
Better AVX512 test case 2018-06-06 16:51:30 +02:00
Martin Kroeker 7fb62aed7e
Check build system support for AVX512 instructions 2018-06-05 23:29:33 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Martin Kroeker 6791294312
Merge pull request #1559 from martin-frbg/buildconf
Add build-time configuration options to pkgconfig file
2018-05-14 18:49:53 +02:00
Martin Kroeker 7d7564568c
Add build-time configuration options to pkgconfig file 2018-05-14 00:09:35 +02:00
Zhiyong Dang 1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Sacha f81815e48a
Fix CMake cross-compiling
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
2018-02-28 10:25:25 +10:00
xoviat 038bfbb86c
CMake: Remove unused wall option when FC=flang 2018-01-26 14:09:48 -06:00
Martin Kroeker 599de9e598
Restore LAPACKE files for Xgeqpf, Xggsvd and Xggsvp
These were inadvertently dropped from the list in my PR #1095
2017-12-21 19:43:09 +01:00
Martin Kroeker 0dc291d3fa
Merge pull request #1377 from isuruf/threads
Allow overriding NUM_THREADS in cmake
2017-12-01 16:22:35 +01:00
Isuru Fernando e0ddd7d124 Allow overriding NUM_THREADS 2017-12-01 01:42:45 -06:00
martin 5056a044b2 fix location of lapacke_nancheck 2017-11-24 09:15:20 +01:00
martin 4054d32def update cmake files 2017-11-24 08:15:40 +01:00
martin 2d52f0f4c3 update cmakefiles for lapack 3.8.0 2017-11-23 21:22:01 +01:00
Ian Henriksen 505dc08635 Update lapacke.cmake with routines added in LAPACK 3.7.0. 2017-11-06 14:43:33 -06:00
Ian Henriksen 61587b0670 Update lapack.cmake with additional routines from LAPACK version 3.7.0. 2017-11-06 14:41:02 -06:00
Ian Henriksen 632fc75d77 Allow using compilers other than gfortran in conjunction with
MSVC or clang-cl.
2017-11-06 14:39:12 -06:00
Martin Kroeker 962b20a9bb Optionally add ReLAPACK to LIB_COMPONENTS 2017-10-12 17:02:01 +02:00
Martin Kroeker c7a8512d12 Cmake fixes for DYNAMIC_ARCH builds and whitespace in path names (#1323)
* prebuild.cmake: Put quotes around path names that may contain whitespace
(Copied from alexkaratakis' PR #1295)
* kernel/CMakeLists.txt: Fix common_lapack header inclusion and DYNAMIC_ARCH generation of ?neg_tcopy and ?laswp_ncopy files
* lapack/CMakeLists.txt: Use correct template for ?laswp_(plus,minus) functions
2017-10-09 23:34:18 +02:00
Sacha 7a867082d8 Fix open_blas.config which was never working out-of-source. Remove need for gen_config_h.exe. If OpenMP is requested, do not silently ignore when it isn't available. 2017-08-23 11:16:24 +10:00
Sacha Refshauge 47ebce4d1a Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position. 2017-08-21 00:37:29 +10:00
Sacha Refshauge 69b560751c Improvements to previous commit (cross-compile).
Fix typos and bad if statements discovered in 0.2.20.
2017-08-20 22:50:31 +10:00