Commit Graph

415 Commits

Author SHA1 Message Date
Martin Kroeker abbc65cff4
Cortex X1 is only Arm8.2 2022-03-28 17:40:27 +02:00
Martin Kroeker 57dd92a662
Add initial support for ARMV9 Cortex 510/710/X1/X2 2022-03-27 15:26:42 +02:00
Markus Mützel aeb561d234 Add support for Intel Fortran compilers.
Port changes from upstream Reference-LAPACK.
2022-03-25 13:37:15 +01:00
Markus Mützel 00f44bfff7 cmake: Check if Fortran compiler is usable before enabling it. 2022-01-21 13:27:17 +01:00
Martin Kroeker a9e297e476
Fix handling of ifdef/ifndef 2022-01-09 23:31:59 +01:00
Martin Kroeker b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
jgillis ea3db69faa
Fix cmake crosscompilation for core2 target
Missing HAVE_SSE* cmake variables cause cc.cmake to forget about `-msse*` flags
2021-12-29 22:50:20 +01:00
Rafael Cardoso Fernandes Sousa d38110a5ce Use CMake variables instead of as 2021-12-10 17:46:53 -06:00
Rafael Cardoso Fernandes Sousa 214fbcee15 Fix cmake for power 2021-12-09 08:28:17 -06:00
Martin Kroeker 454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Martin Kroeker bcfbdc81b2
Merge pull request #3459 from rafaelcfsousa/fix_cmake
Fix issues when building OpenBLAS with cmake
2021-11-26 15:19:24 +01:00
Bine Brank 1af73ce38e Adapt CMake for SVE 2021-11-26 10:35:01 +01:00
Rafael Cardoso Fernandes Sousa d5c9353f1b Modify the order that cmake set the KERNEL variables (generic now is fallback) 2021-11-24 20:08:35 -06:00
Rafael Cardoso Fernandes Sousa fb891f33da Fix the cmake parser to identify more patterns 2021-11-24 14:07:28 -06:00
Martin Kroeker a3cd36acff
Add CMAKE support for cross-compiling to MIPS32 2021-11-20 17:34:28 +01:00
Markus Mützel de2ed66596 cmake: Set SUFFIX64 also for NOFORTRAN 2021-11-15 08:53:52 +01:00
Martin Kroeker 02ea3db8e7
Merge pull request #3404 from guowangy/spr-build
Initial build support for Sapphire Rapids
2021-10-17 23:05:11 +02:00
مهدي شينون (Mehdi Chinoune) efd7ac241d Fix MinGW/Clang 64 bits detection.
CMAKE_COMPILER_IS_GNUCC is only valid for GCC.
2021-10-16 08:02:27 +01:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker e02df9fc55
Propagate BUILD_BFLOAT16 to CFLAGS 2021-09-14 16:12:27 +02:00
Martin Kroeker 1c0a8a714a
Add defaults for SBGEMV kernels 2021-09-14 16:10:58 +02:00
Martin Kroeker af19cda65a
Add "recursive" option for IBM xlf compiler (#3359)
* Add correct "recursive" option for xlf (from reference-lapack issue 606)
2021-09-04 18:26:59 +02:00
Martin Kroeker bec9d9f63d
Merge pull request #3335 from guowangy/small-matrix-latest
Add GEMM optimization for small matrix and single/double kernel for skylakex
2021-08-29 22:33:33 +02:00
cianciosa 4c766cd11f Fix a small syntax error. A ( was accidently deleted. 2021-08-11 12:08:34 -04:00
cianciosa c28560129f Check the total number of arguments passed insead of if the ARGV# is defined. This fixes a problem when compling openblas as a subproject of another code. 2021-08-11 12:00:07 -04:00
Wangyang Guo 76ea8db4da Small Matrix: enable by default for x86_64 arch
If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.
2021-08-05 02:59:36 +00:00
Wangyang Guo fee5abd84b Small Matrix: support cmake build 2021-08-04 08:50:15 +00:00
gxw 0b8f7c8c10 Add cmake support for LOONGARCH64 2021-08-02 10:00:41 +08:00
Martin Kroeker 47ba85f314
Fix regex to match kernels suffixed with cpuname too 2021-07-22 17:24:15 +02:00
Martin Kroeker 30f23be0f9
Rework setting of -mfma to only apply it where necessary 2021-07-22 12:00:03 +02:00
User User-User 91e2b11d3c add to cmake listings too 2021-06-20 15:32:42 +02:00
Martin Kroeker 13fa9f737d
Modify defines for CR and RC to work around name collision on Windows 2021-06-16 12:17:25 +02:00
Martin Kroeker db50b24a4a
Add entries for the new Householder Reconstruction functions from 3.9.1 2021-05-02 19:55:15 +02:00
Martin Kroeker 40000d1f64
Add entries for Householder reconstruction functions from 3.9.1 2021-05-02 19:21:59 +02:00
刘雨培 725432efaa pass NO_AVX512 macro def 2021-04-07 00:10:41 +08:00
Jake Arkinstall d7a77091a3 Addressed issue #3100, removing an unnecessary write to the include directory 2021-02-10 12:11:17 +00:00
Martin Kroeker 33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker 95e19e2e23
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
2021-02-02 10:53:46 +01:00
Martin Kroeker 99ac042702
remove spurious lines (probably editor malfunction) 2021-02-01 21:02:53 +01:00
Martin Kroeker 774b9f8653
handle AppleClang in Cooperlake support condition 2021-02-01 20:18:53 +01:00
Martin Kroeker eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion) 2021-02-01 19:45:25 +01:00
Martin Kroeker 0cc36770f1
Merge pull request #3073 from xoviat/embedded
add embedded option
2021-01-31 18:02:41 +01:00
Martin Kroeker cb61d3b46b
Add DYNAMIC_LIST support for ARM64 2021-01-25 13:13:20 +01:00
xoviat b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Martin Kroeker 89ae305e11
Workaround for cmake having its own C_COMPILER variable 2021-01-13 12:30:26 +01:00
Martin Kroeker ec4d77c47c
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well 2020-11-13 09:16:34 +01:00
Martin Kroeker a29338aaa6
Remove extraneous quotes that caused a cmake policy warning 2020-11-07 20:27:42 +01:00
Martin Kroeker 438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds 2020-11-07 20:26:12 +01:00
Martin Kroeker 0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker a9f9354296
Fix target test 2020-11-02 23:17:46 +01:00
Martin Kroeker b9bc76aec4
Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker e5f8c2bf8a
typo fix 2020-11-01 22:25:43 +01:00
Martin Kroeker 6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target 2020-11-01 22:11:48 +01:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker eddc65c7b7
Add POWER10 support flag (unconditionally for now) 2020-10-20 01:09:49 +02:00
Martin Kroeker f5902ab0a1
Support cross-compiling for Apple Vortex 2020-10-18 19:10:58 +02:00
Martin Kroeker f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker 786c0a3ce8
Add sse options for use of intrinics with older compilers 2020-10-16 10:41:53 +02:00
Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker 75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake) 2020-10-14 01:01:58 +02:00
Martin Kroeker e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker 68e6823d36
Adapt for supporting only a subset of variable types 2020-10-11 15:01:32 +02:00
Martin Kroeker 88928650c4
Merge pull request #2883 from martin-frbg/issue2872
Minor CMAKE fixes
2020-10-11 10:30:33 +02:00
Martin Kroeker 82a497ec5d
restore PRESCOTT default for DYNAMIC_LIST 2020-10-11 00:43:09 +02:00
Martin Kroeker de27e4f5fb
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
2020-10-11 00:40:22 +02:00
Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8 f32d34a015 add sse3 compiler flag 2020-10-10 10:36:15 +08:00
Martin Kroeker a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows 2020-10-04 23:01:06 +02:00
Martin Kroeker 2367726578
Remove redundant status message 2020-09-30 23:28:49 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker 68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker 0a4c5c4c44
Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker 5feb087c05
Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker 64e2e4aaf3
missing braces 2020-07-27 20:19:22 +00:00
Martin Kroeker 921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel 2020-07-27 19:54:46 +00:00
Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker 9e21a100e3
Add trivial check for stdatomic.h 2020-07-20 22:52:09 +00:00
Martin Kroeker 9d000ecaa2
include CheckLanguage module 2020-07-16 22:36:35 +00:00
Martin Kroeker a847d00366
handle missing lack of fortran compiler more gracefully 2020-07-16 22:17:39 +00:00
Martin Kroeker 6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker 79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker 6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker 4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker cd10b35fe9
Handle trailing spaces and empty condition variables 2020-05-09 13:42:33 +02:00
Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker 3bd56846bb
Silence a debug message 2020-04-27 16:27:09 +02:00
Martin Kroeker e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00
Martin Kroeker 70869d571f
Quote include paths for getarch to protect any embedded spaces 2020-04-24 10:30:44 +02:00
Martin Kroeker 4f70512b97
Update kernel.cmake 2020-04-19 08:10:26 +02:00
Martin Kroeker d0737b0142
Update kernel.cmake 2020-04-18 21:36:28 +02:00
Martin Kroeker a83a59b038
Use generic kernels for ishama,shasum,shdot,shrot 2020-04-18 15:53:51 +02:00
Martin Kroeker 0a19bd813c
Use generic codes for shamax and shcopy 2020-04-18 12:52:51 +02:00
Martin Kroeker f361de30a3
Use generic axpy.c for SHAXPY as x86 lacks saxpy.c 2020-04-18 11:07:16 +02:00
Martin Kroeker 9f6d6f6cb6
use saxpy.c instead of axpy.S for SHAXPY 2020-04-17 22:27:58 +02:00
Rajalakshmi Srinivasaraghavan 22bb50fb81 cmake fixes 2020-04-17 13:35:17 -05:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker a05243d0f2
ifort and pgfort need "recursive" for compiling LAPACK as well
as shown in Reference-LAPACK issue 401 (their PR 403)
2020-04-01 15:38:07 +02:00
Martin Kroeker 8c7c1395da
Merge pull request #2521 from martin-frbg/cm-avx512
Use proper extension on the avx512 testcase filename
2020-03-22 01:03:42 +01:00
Martin Kroeker 1d9773b800
Use proper extension on the avx512 testcase filename
The need to call it .tmp existed only when it was generated by a tmpfile call, and the "-x c" option to tell the compiler it is actually a C source is not universally supported (this broke the test with clang-cl at least)
2020-03-20 23:05:53 +01:00
Martin Kroeker 6d54c94760
Make ifort on Windows create lowercase symbols with appended underscore
tentative fix for #2472
2020-03-20 01:08:10 +01:00
مهدي شينون (Mehdi Chinoune) 21f6c4b5a9 fixes #2480 2020-03-02 17:22:28 +01:00
Ali Saidi c623a965f9 Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
2020-02-29 03:22:04 +00:00
Martin Kroeker ca4f7dceff
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake 2020-02-24 20:23:18 +01:00
Martin Kroeker 1ddf9f1067
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake 2020-02-24 20:16:18 +01:00
Martin Kroeker 7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker 8dc9fd4dfe
Add -march option for AVX512 2020-01-30 12:41:18 +01:00
Martin Kroeker 375b1875c8
[WIP] Update LAPACK to 3.9.0 (#2353)
* Update make.inc entries for LAPACK 3.9.0

Reference-LAPACK PR 347 changed some variable names and relative paths

* Update LAPACK to 3.9.0

* Add new functions from LAPACK 3.9.0

* Add new functions from LAPACK 3.9.0

* Restore LOADER command 

as it makes it easier to specify pthread as needed

* Restore LOADER

* Restore EIG/LIN prefixes in cmdbase

* add binary path to lapack_testing.py call

* Restore OpenMP version check

* Restore OpenMP version check

* Restore fix for out-of-bounds array accesses

from #2096
2020-01-01 13:18:53 +01:00
Martin Kroeker a4896b5538
Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
2019-12-04 11:06:03 +01:00
k.dunikowski 8691825944 Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty 2019-10-28 08:51:05 +01:00
Martin Kroeker eb45eb6942
Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248)
* Fix compiler identification and option setting

* Handle BINARY=32 option on X86_64

* Add xGEMM3M unroll parameters for crossbuild-target CORE2

* Replace bogus mingw64/32bit CI job with actual 32bit build

mingw64 is not multilib-capable, so using an x86_64-mingw with BINARY=32 in the CI was not going to work anyway (but build passed while BINARY=32 was ignored).
2019-09-10 08:27:06 +02:00
Martin Kroeker fde8a8e6a0
Improve cmake build behaviour with non-host cpu targets (#2246)
1. Supply appropriate values for C/Z GEMM unroll when cross-compiling for CORE2 or ARMV7
2. Add the required xLOCAL_BUFFER_SIZE parameters for cross-compiling CORE2
3. Add -DFORCE_<target> option to getarch when building with -DTARGET=target
for #2245
2019-09-03 22:41:17 +02:00
Martin Kroeker 1fec0570f6
Add cgemm and zgemm unroll factors for core2 2019-09-02 15:03:45 +02:00
Martin Kroeker bf0d92a310
Add arch data for cross-compiling to CORE2
for #2235
2019-08-28 17:35:56 +02:00
Martin Kroeker e3d846ab57
Do not use -march=native with the PGI compiler 2019-08-16 08:58:10 +02:00
Tyler Reddy 3f6ab1582a MAINT: remove legacy CMake endif()
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
2019-07-22 21:24:57 -06:00
Martin Kroeker 8fb76134bc
Mingw32 needs leading underscore on object names
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
2019-07-06 15:07:15 +02:00
Martin Kroeker 04d671aae2
Make disabling DYNAMIC_ARCH on unsupported systems work
needs to be unset in the cache for the change to have any effect
2019-07-06 15:05:04 +02:00
Martin Kroeker f69a0be712
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Martin Kroeker ece0bfb881
Merge pull request #2158 from martin-frbg/issue2143
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
2019-06-10 14:08:11 +02:00
Martin Kroeker 1f4b6a5d5d
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
2019-06-10 09:50:13 +02:00
Martin Kroeker be8f70d269
Merge pull request #2157 from martin-frbg/2154-2
Add gfortran workaround for potential ABI violation
2019-06-09 12:19:08 +02:00
Martin Kroeker e674e1c735
Update fc.cmake 2019-06-09 09:31:13 +02:00
Martin Kroeker 6ca898b63b
Add gfortran workaround for potential ABI violation
for #2154
2019-06-08 23:17:03 +02:00
Michael Lass 7a9a4dbc4f Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
2019-06-05 17:30:56 +02:00
Martin Kroeker 1e52572be3
Add option USE_LOCKING for single-threaded build with locking support 2019-05-15 23:19:30 +02:00
luz.paz daf2fec12d Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker ccfb7ead15
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
2019-04-23 20:11:36 +02:00
Martin Kroeker e06b8438b4
Merge pull request #2080 from martin-frbg/issue2075
Add -lm and disable EXPRECISION support on *BSD
2019-04-02 21:40:58 +02:00
Martin Kroeker 9229d6859b
Add -lm and disable EXPRECISION support on *BSD
fixes #2075
2019-04-02 09:38:18 +02:00
Martin Kroeker d17da6c6a4
Add cmake defaults for ?sum kernels 2019-03-31 11:57:01 +02:00
Martin Kroeker 1679de5e59
Detect 32bit environment on 64bit ARM hardware
for #2056, using same approach as #2058
2019-03-31 10:50:43 +02:00
Sacha c3e30b2bc2
Change 64-bit detection as explained in #2056 2019-03-13 23:21:54 +10:00