Martin Kroeker
52b71a1673
Filter out FFLAGS that flang-new from LLVM18 no longer supports ( #4569 )
...
* Filter out FFLAGS that flang-new from LLVM18 no longer supports
2024-03-22 17:02:39 +01:00
Martin Kroeker
a0e3f77e0b
add FIXED_LIBNAME, PREFIX and SUFFIX
2024-02-15 12:17:38 +01:00
Martin Kroeker
49689fbef7
Add support for compiling SVE kernels with the NVIDIA HPC compiler
2023-08-25 17:11:04 +02:00
Martin Kroeker
ac698cedad
Add compiler options for ARM64 SVE targets in DYNAMIC_ARCH builds
2023-07-05 09:47:49 +02:00
Martin Kroeker
d2144b2981
Add NVHPC
2023-06-09 19:01:15 +02:00
Martin Kroeker
de937b3194
Add clang option to avoid running out of registers in AVX512 assembly
2023-03-17 21:22:37 +01:00
Martin Kroeker
e964ebd0d0
Add compiler option for AVX512-capable Ryzen(4)
2023-02-02 19:04:05 +01:00
Martin Kroeker
a0a4f7c447
Add -mfma to -mavx2 for clang, and add AVX2 declaration for Zen in DYNAMIC_ARCH builds
2022-09-13 22:47:00 +02:00
Martin Kroeker
85fd3c4279
Support compilation with the Cray C and Fortran compilers ( #3712 )
...
* Add support for the Cray Fortran compiler
2022-08-04 20:42:18 +02:00
Martin Kroeker
18b19d135b
C_LAPACK: Fixes to make it compile with MSVC ( #3605 )
...
* Fix f2c-like support functions to compile with MSVC, and
re-enable C_LAPACK for MSVC in CMAKE
* Add MSVC&flang build to Azure CI in order to check C_LAPACK correctness
2022-04-17 17:49:38 +02:00
Martin Kroeker
b7873605d4
Use f2c translations of LAPACK when no Fortran compiler is available ( #3539 )
...
* Add C equivalents of the Fortran routines from Reference-LAPACK as fallbacks, and C_LAPACK variable to trigger their use
2022-04-09 22:38:58 +02:00
Rafael Cardoso Fernandes Sousa
d38110a5ce
Use CMake variables instead of as
2021-12-10 17:46:53 -06:00
Rafael Cardoso Fernandes Sousa
214fbcee15
Fix cmake for power
2021-12-09 08:28:17 -06:00
Markus Mützel
de2ed66596
cmake: Set SUFFIX64 also for NOFORTRAN
2021-11-15 08:53:52 +01:00
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
2021-10-12 01:30:40 -07:00
Martin Kroeker
e02df9fc55
Propagate BUILD_BFLOAT16 to CFLAGS
2021-09-14 16:12:27 +02:00
Wangyang Guo
76ea8db4da
Small Matrix: enable by default for x86_64 arch
...
If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.
2021-08-05 02:59:36 +00:00
Wangyang Guo
fee5abd84b
Small Matrix: support cmake build
2021-08-04 08:50:15 +00:00
Martin Kroeker
30f23be0f9
Rework setting of -mfma to only apply it where necessary
2021-07-22 12:00:03 +02:00
User User-User
91e2b11d3c
add to cmake listings too
2021-06-20 15:32:42 +02:00
刘雨培
725432efaa
pass NO_AVX512 macro def
2021-04-07 00:10:41 +08:00
Martin Kroeker
33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
...
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker
95e19e2e23
fix case in compiler name check
...
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
2021-02-02 10:53:46 +01:00
Martin Kroeker
99ac042702
remove spurious lines (probably editor malfunction)
2021-02-01 21:02:53 +01:00
Martin Kroeker
774b9f8653
handle AppleClang in Cooperlake support condition
2021-02-01 20:18:53 +01:00
Martin Kroeker
eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
2021-02-01 19:45:25 +01:00
xoviat
b60de4447a
add cortex-m platform
2021-01-19 08:57:44 -06:00
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
2020-11-07 20:26:12 +01:00
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
2020-11-03 23:45:49 +01:00
Martin Kroeker
b9bc76aec4
Add files via upload
2020-11-02 22:43:50 +01:00
Martin Kroeker
f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1
2020-10-16 10:47:06 +02:00
Martin Kroeker
e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:07:37 +02:00
Martin Kroeker
68e6823d36
Adapt for supporting only a subset of variable types
2020-10-11 15:01:32 +02:00
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
...
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8
f32d34a015
add sse3 compiler flag
2020-10-10 10:36:15 +08:00
Martin Kroeker
a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows
2020-10-04 23:01:06 +02:00
Martin Kroeker
c4aeeeb9f4
Activate all BUILD_ options if none was specified
2020-09-15 23:15:34 +02:00
Martin Kroeker
26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests
2020-09-13 21:47:55 +02:00
Martin Kroeker
68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
...
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker
bd3207b4b4
Update system.cmake
2020-08-19 22:51:10 +02:00
Martin Kroeker
b8ebfc9335
Update system.cmake
2020-08-19 22:30:19 +02:00
Martin Kroeker
71d33c952d
Typo fix
2020-08-19 17:44:23 +02:00
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
2020-08-19 17:22:12 +02:00
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
...
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker
6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead
2020-06-14 17:40:24 +02:00
Martin Kroeker
3ce469a34f
Limit optimization level to O1 for flang and add -frecursive
2020-06-09 16:11:13 +02:00
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
2020-06-04 19:07:27 +02:00
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
2020-06-04 14:45:31 +02:00
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
...
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker
7f0d523b42
Make BUFFER_SIZE configurable
2020-02-09 23:32:57 +01:00