Commit Graph

86 Commits

Author SHA1 Message Date
Wangyang Guo 76ea8db4da Small Matrix: enable by default for x86_64 arch
If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.
2021-08-05 02:59:36 +00:00
Wangyang Guo fee5abd84b Small Matrix: support cmake build 2021-08-04 08:50:15 +00:00
Martin Kroeker 30f23be0f9
Rework setting of -mfma to only apply it where necessary 2021-07-22 12:00:03 +02:00
User User-User 91e2b11d3c add to cmake listings too 2021-06-20 15:32:42 +02:00
刘雨培 725432efaa pass NO_AVX512 macro def 2021-04-07 00:10:41 +08:00
Martin Kroeker 33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
2021-02-02 13:33:15 +01:00
Martin Kroeker 95e19e2e23
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
2021-02-02 10:53:46 +01:00
Martin Kroeker 99ac042702
remove spurious lines (probably editor malfunction) 2021-02-01 21:02:53 +01:00
Martin Kroeker 774b9f8653
handle AppleClang in Cooperlake support condition 2021-02-01 20:18:53 +01:00
Martin Kroeker eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion) 2021-02-01 19:45:25 +01:00
xoviat b60de4447a add cortex-m platform 2021-01-19 08:57:44 -06:00
Martin Kroeker 438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds 2020-11-07 20:26:12 +01:00
Martin Kroeker 0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds 2020-11-03 23:45:49 +01:00
Martin Kroeker b9bc76aec4
Add files via upload 2020-11-02 22:43:50 +01:00
Martin Kroeker f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker 68e6823d36
Adapt for supporting only a subset of variable types 2020-10-11 15:01:32 +02:00
Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8 f32d34a015 add sse3 compiler flag 2020-10-10 10:36:15 +08:00
Martin Kroeker a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows 2020-10-04 23:01:06 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Martin Kroeker 68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
2020-09-01 17:19:14 +02:00
Martin Kroeker bd3207b4b4
Update system.cmake 2020-08-19 22:51:10 +02:00
Martin Kroeker b8ebfc9335
Update system.cmake 2020-08-19 22:30:19 +02:00
Martin Kroeker 71d33c952d
Typo fix 2020-08-19 17:44:23 +02:00
Martin Kroeker 6a3c074786
-march=cooperlake requires gcc10 2020-08-19 17:22:12 +02:00
Chen, Guobing e740c4873d Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
2020-08-13 06:18:00 +08:00
Martin Kroeker 6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead 2020-06-14 17:40:24 +02:00
Martin Kroeker 3ce469a34f
Limit optimization level to O1 for flang and add -frecursive 2020-06-09 16:11:13 +02:00
Martin Kroeker bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows 2020-06-04 19:07:27 +02:00
Martin Kroeker 6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting 2020-06-04 14:45:31 +02:00
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker 7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker e3d846ab57
Do not use -march=native with the PGI compiler 2019-08-16 08:58:10 +02:00
Martin Kroeker f69a0be712
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Michael Lass 7a9a4dbc4f Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
2019-06-05 17:30:56 +02:00
Martin Kroeker 1e52572be3
Add option USE_LOCKING for single-threaded build with locking support 2019-05-15 23:19:30 +02:00
luz.paz daf2fec12d Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker 5952e586ce
Support DYNAMIC_LIST option in cmake
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker 58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32 2019-01-26 17:52:33 +01:00
Martin Kroeker 76b4b8980f
Use -dumpversion with gcc only 2018-12-23 19:08:19 +01:00
Martin Kroeker 49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it 2018-12-23 17:26:09 +01:00
Martin Kroeker 081ceb3e02
Propagate version number for openblas_get_config 2018-11-29 00:12:04 +01:00
Martin Kroeker 81c9985c3a
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512 2018-10-11 11:03:27 +02:00
Martin Kroeker 8a11ec19d1
Syntax fix 2018-10-10 23:47:35 +02:00
Martin Kroeker fa53b903db
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
Should fix 1806 and #1801
2018-10-10 19:22:01 +02:00
Martin Kroeker 2a589c4b28
Add USE_TLS option to switch between old and new memory.c 2018-08-25 19:36:12 +02:00
Martin Kroeker 1cbd8f3ae4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:30:46 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00