Commit Graph

360 Commits

Author SHA1 Message Date
Martin Kroeker b6b024232d
Merge pull request #3508 from snadampal/v1_n2
OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics
2022-01-09 14:50:26 +01:00
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 2022-01-07 00:28:17 +00:00
jgillis ea3db69faa
Fix cmake crosscompilation for core2 target
Missing HAVE_SSE* cmake variables cause cc.cmake to forget about `-msse*` flags
2021-12-29 22:50:20 +01:00
Rafael Cardoso Fernandes Sousa d38110a5ce Use CMake variables instead of as 2021-12-10 17:46:53 -06:00
Rafael Cardoso Fernandes Sousa 214fbcee15 Fix cmake for power 2021-12-09 08:28:17 -06:00
Martin Kroeker 454edd741c
Merge pull request #3425 from binebrank/arm_sve_dgemm
Add dgemm kernel for arm64 SVE
2021-11-26 16:14:55 +01:00
Martin Kroeker bcfbdc81b2
Merge pull request #3459 from rafaelcfsousa/fix_cmake
Fix issues when building OpenBLAS with cmake
2021-11-26 15:19:24 +01:00
Bine Brank 1af73ce38e Adapt CMake for SVE 2021-11-26 10:35:01 +01:00
Rafael Cardoso Fernandes Sousa d5c9353f1b Modify the order that cmake set the KERNEL variables (generic now is fallback) 2021-11-24 20:08:35 -06:00
Rafael Cardoso Fernandes Sousa fb891f33da Fix the cmake parser to identify more patterns 2021-11-24 14:07:28 -06:00
Martin Kroeker a3cd36acff
Add CMAKE support for cross-compiling to MIPS32 2021-11-20 17:34:28 +01:00
Markus Mützel de2ed66596 cmake: Set SUFFIX64 also for NOFORTRAN 2021-11-15 08:53:52 +01:00
Martin Kroeker 02ea3db8e7
Merge pull request #3404 from guowangy/spr-build
Initial build support for Sapphire Rapids
2021-10-17 23:05:11 +02:00
مهدي شينون (Mehdi Chinoune) efd7ac241d Fix MinGW/Clang 64 bits detection.
CMAKE_COMPILER_IS_GNUCC is only valid for GCC.
2021-10-16 08:02:27 +01:00
Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 2021-10-12 01:30:40 -07:00
Martin Kroeker e02df9fc55
Propagate BUILD_BFLOAT16 to CFLAGS 2021-09-14 16:12:27 +02:00
Martin Kroeker 1c0a8a714a
Add defaults for SBGEMV kernels 2021-09-14 16:10:58 +02:00
Martin Kroeker af19cda65a
Add "recursive" option for IBM xlf compiler (#3359)
* Add correct "recursive" option for xlf (from reference-lapack issue 606)
2021-09-04 18:26:59 +02:00
Martin Kroeker bec9d9f63d
Merge pull request #3335 from guowangy/small-matrix-latest
Add GEMM optimization for small matrix and single/double kernel for skylakex
2021-08-29 22:33:33 +02:00
cianciosa 4c766cd11f Fix a small syntax error. A ( was accidently deleted. 2021-08-11 12:08:34 -04:00
cianciosa c28560129f Check the total number of arguments passed insead of if the ARGV# is defined. This fixes a problem when compling openblas as a subproject of another code. 2021-08-11 12:00:07 -04:00
Wangyang Guo 76ea8db4da Small Matrix: enable by default for x86_64 arch
If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.
2021-08-05 02:59:36 +00:00
Wangyang Guo fee5abd84b Small Matrix: support cmake build 2021-08-04 08:50:15 +00:00
gxw 0b8f7c8c10 Add cmake support for LOONGARCH64 2021-08-02 10:00:41 +08:00
Martin Kroeker 47ba85f314
Fix regex to match kernels suffixed with cpuname too 2021-07-22 17:24:15 +02:00
Martin Kroeker 30f23be0f9
Rework setting of -mfma to only apply it where necessary 2021-07-22 12:00:03 +02:00
User User-User 91e2b11d3c add to cmake listings too 2021-06-20 15:32:42 +02:00
Martin Kroeker 13fa9f737d
Modify defines for CR and RC to work around name collision on Windows 2021-06-16 12:17:25 +02:00
Martin Kroeker db50b24a4a
Add entries for the new Householder Reconstruction functions from 3.9.1 2021-05-02 19:55:15 +02:00
Martin Kroeker 40000d1f64
Add entries for Householder reconstruction functions from 3.9.1 2021-05-02 19:21:59 +02:00
刘雨培 725432efaa pass NO_AVX512 macro def 2021-04-07 00:10:41 +08:00
Jake Arkinstall d7a77091a3 Addressed issue #3100, removing an unnecessary write to the include directory 2021-02-10 12:11:17 +00:00
Martin Kroeker 33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker 95e19e2e23
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
2021-02-02 10:53:46 +01:00
Martin Kroeker 99ac042702
remove spurious lines (probably editor malfunction) 2021-02-01 21:02:53 +01:00
Martin Kroeker 774b9f8653
handle AppleClang in Cooperlake support condition 2021-02-01 20:18:53 +01:00
Martin Kroeker eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion) 2021-02-01 19:45:25 +01:00
Martin Kroeker 0cc36770f1
Merge pull request #3073 from xoviat/embedded
add embedded option
2021-01-31 18:02:41 +01:00
Martin Kroeker cb61d3b46b
Add DYNAMIC_LIST support for ARM64 2021-01-25 13:13:20 +01:00
xoviat b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Martin Kroeker 89ae305e11
Workaround for cmake having its own C_COMPILER variable 2021-01-13 12:30:26 +01:00
Martin Kroeker ec4d77c47c
Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well 2020-11-13 09:16:34 +01:00
Martin Kroeker a29338aaa6
Remove extraneous quotes that caused a cmake policy warning 2020-11-07 20:27:42 +01:00
Martin Kroeker 438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds 2020-11-07 20:26:12 +01:00
Martin Kroeker 0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker a9f9354296
Fix target test 2020-11-02 23:17:46 +01:00
Martin Kroeker b9bc76aec4
Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker e5f8c2bf8a
typo fix 2020-11-01 22:25:43 +01:00
Martin Kroeker 6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target 2020-11-01 22:11:48 +01:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker eddc65c7b7
Add POWER10 support flag (unconditionally for now) 2020-10-20 01:09:49 +02:00
Martin Kroeker f5902ab0a1
Support cross-compiling for Apple Vortex 2020-10-18 19:10:58 +02:00
Martin Kroeker f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker 786c0a3ce8
Add sse options for use of intrinics with older compilers 2020-10-16 10:41:53 +02:00
Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker 75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake) 2020-10-14 01:01:58 +02:00
Martin Kroeker e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker 68e6823d36
Adapt for supporting only a subset of variable types 2020-10-11 15:01:32 +02:00
Martin Kroeker 88928650c4
Merge pull request #2883 from martin-frbg/issue2872
Minor CMAKE fixes
2020-10-11 10:30:33 +02:00
Martin Kroeker 82a497ec5d
restore PRESCOTT default for DYNAMIC_LIST 2020-10-11 00:43:09 +02:00
Martin Kroeker de27e4f5fb
Stop DYNAMIC_ARCH build if the toplevel source contains a stray config_kernel.h from a gmake build
This is unlikely to happen in practice, but if it does, the rogue file would get included instead of the dynamically generated version for each target_core, leading to very confusing errors like "invalid operands (undefined UND and ABS sections)" in compilation of the assembly kernels as macros like PREFETCH would remain undefined
2020-10-11 00:40:22 +02:00
Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8 f32d34a015 add sse3 compiler flag 2020-10-10 10:36:15 +08:00
Martin Kroeker a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows 2020-10-04 23:01:06 +02:00
Martin Kroeker 2367726578
Remove redundant status message 2020-09-30 23:28:49 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker 68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker 0a4c5c4c44
Merge pull request #2807 from martin-frbg/issue2804
Work around ARMV8 build-time cpu detection problems on non-Linux systems
2020-08-31 23:44:56 +02:00
Martin Kroeker 5feb087c05
Handle Apple labeling armv8 as arm64 rather than aarch64 2020-08-31 20:02:08 +02:00
Martin Kroeker 7c0977c267
Add OpenMP dependency to pkgconfig file if needed 2020-08-22 13:53:44 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 7c1986640b
fallback from cooperlake to skylake if gcc<10 2020-08-19 20:48:39 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Martin Kroeker 430f741b30
-march=cooperlake requires gcc10 2020-08-19 17:17:53 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker cb097beba2
Merge pull request #2741 from martin-frbg/issue2739
Adjust A53 SGEMM parameters to reflect recent switch to 8x8 kernel
2020-07-29 10:01:14 +02:00
Martin Kroeker 64e2e4aaf3
missing braces 2020-07-27 20:19:22 +00:00
Martin Kroeker 921ec4e9e2
Adjust A53 SGEMM parameters to reflect move to 8x8 kernel 2020-07-27 19:54:46 +00:00
Ashwin Sekhar T K 4e1be0e481 ARM64: Add THUNDERX3T110 Target 2020-07-26 23:32:24 -07:00
Martin Kroeker 9e21a100e3
Add trivial check for stdatomic.h 2020-07-20 22:52:09 +00:00
Martin Kroeker 9d000ecaa2
include CheckLanguage module 2020-07-16 22:36:35 +00:00
Martin Kroeker a847d00366
handle missing lack of fortran compiler more gracefully 2020-07-16 22:17:39 +00:00
Martin Kroeker 6eaeb01263
Merge pull request #2658 from RajalakshmiSR/p10
powerpc: Add support for future processor
2020-06-23 00:02:37 +02:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 1dd712131e
Fix spelling of flang option -Mrecursive and add -Kieee 2020-06-14 00:09:31 +02:00
Rajalakshmi Srinivasaraghavan 9fe930f205 powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker 79cd69fea4
Merge pull request #2644 from martin-frbg/cmake-maxstack
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-05 08:33:48 +02:00
Martin Kroeker bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker 6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Martin Kroeker 4db00121dc
Disable EXPRECISION and add -lm on OSX (same as the BSDs and Linux) 2020-05-31 12:39:36 +02:00
Martin Kroeker cd10b35fe9
Handle trailing spaces and empty condition variables 2020-05-09 13:42:33 +02:00
Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590)
* make building the bfloat16 BLAS functions conditional on BUILD_HALF

* pass the BUILD_HALF option to gensymbol

* Pass BUILD_HALF as a compiler define for dynamic_arch builds
2020-05-01 09:58:30 +02:00
Martin Kroeker 3bd56846bb
Silence a debug message 2020-04-27 16:27:09 +02:00
Martin Kroeker e7bbdfdf84
Have CMAKE parse conditional lines in KERNEL files
Supports ifeq and ifneq, but requires both to have an else branch
2020-04-27 15:20:03 +02:00