Commit Graph

54 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
2020-04-14 14:55:08 -05:00
Martin Kroeker 7f0d523b42 Make BUFFER_SIZE configurable 2020-02-09 23:32:57 +01:00
Martin Kroeker e3d846ab57
Do not use -march=native with the PGI compiler 2019-08-16 08:58:10 +02:00
Martin Kroeker f69a0be712
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
2019-07-06 15:02:39 +02:00
Michael Lass 7a9a4dbc4f Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
2019-06-05 17:30:56 +02:00
Martin Kroeker 1e52572be3
Add option USE_LOCKING for single-threaded build with locking support 2019-05-15 23:19:30 +02:00
luz.paz daf2fec12d Misc. typo fixes
Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib`
2019-04-29 17:03:56 -04:00
Martin Kroeker 5952e586ce
Support DYNAMIC_LIST option in cmake
e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639
2019-02-05 23:51:40 +01:00
Martin Kroeker 58dd7e4501
Change ARMV8 target to ARMV7 for BINARY=32 2019-01-26 17:52:33 +01:00
Martin Kroeker 76b4b8980f
Use -dumpversion with gcc only 2018-12-23 19:08:19 +01:00
Martin Kroeker 49e0f485da
Add -mavx2 for TARGET=HASWELL if compiler supports and requires it 2018-12-23 17:26:09 +01:00
Martin Kroeker 081ceb3e02
Propagate version number for openblas_get_config 2018-11-29 00:12:04 +01:00
Martin Kroeker 81c9985c3a
Use KERNEL_DEFINITIONS rather than COMMON_OPTS to pass -march=skylake-avx512 2018-10-11 11:03:27 +02:00
Martin Kroeker 8a11ec19d1
Syntax fix 2018-10-10 23:47:35 +02:00
Martin Kroeker fa53b903db
Add -march=skylake-avx512 to CFLAGS when the target is Skylake
Should fix 1806 and #1801
2018-10-10 19:22:01 +02:00
Martin Kroeker 2a589c4b28
Add USE_TLS option to switch between old and new memory.c 2018-08-25 19:36:12 +02:00
Martin Kroeker 1cbd8f3ae4
Move some DYNAMIC_ARCH targets to new DYNAMIC_OLDER option 2018-06-09 16:30:46 +02:00
Arjan van de Ven 99c7bba8e4 Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)

This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".

Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
2018-06-03 07:58:52 +00:00
Zhiyong Dang 1b83341d19 Fix race condition in blas_server_omp.c
Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d
2018-04-27 17:00:42 +08:00
Sacha f81815e48a
Fix CMake cross-compiling
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
2018-02-28 10:25:25 +10:00
Isuru Fernando e0ddd7d124 Allow overriding NUM_THREADS 2017-12-01 01:42:45 -06:00
Martin Kroeker 962b20a9bb Optionally add ReLAPACK to LIB_COMPONENTS 2017-10-12 17:02:01 +02:00
Sacha Refshauge 47ebce4d1a Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position. 2017-08-21 00:37:29 +10:00
Sacha Refshauge 69b560751c Improvements to previous commit (cross-compile).
Fix typos and bad if statements discovered in 0.2.20.
2017-08-20 22:50:31 +10:00
Sacha Refshauge 0a7a527a92 Add support for cross compiling.
Add support for not having host compiler as CMake cannot detect such a compiler.
Add support for not using getarch.
Successfully builds Android ARMV8. Any target can be added by supplying the TARGET_CORE config in prebuild.cmake.
2017-08-20 20:08:53 +10:00
Sacha Refshauge 6aac06587d Fix typos and use CMake OpenMP support. 2017-08-17 17:27:01 +10:00
7c1acc07f0 Fix bug that required fortran. Fix bug that needed CXX var. Remove redundant set vars. Fix threading detection. Do not attempt to run code if cross compiling. 2017-08-17 03:32:04 +10:00
38d273ea03 Drop some redundant vars and improve arch detection in CMake. 2017-08-17 02:04:36 +10:00
90a4dab501 Let CMake deal with build type. 2017-08-17 00:35:54 +10:00
Isuru Fernando 4260215adf Support DYNAMIC_ARCH with cmake 2017-08-01 22:25:52 +05:30
Neil Shipp 65e56cb29d Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
John Biddiscombe 053044ae4d Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
2016-05-25 09:13:28 +02:00
Zhang Xianyi 53b6023a6c Fix cmake bug on MSVC 32-bit. 2015-10-26 14:52:13 -05:00
Zhang Xianyi 309875de3c Fix cmake bug on x86 32-bit.
e.g. Build 32-bit on 64-bit Linux.
cmake -DBINARY=32
2015-10-27 02:54:53 +08:00
Zhang Xianyi f874465bb8 Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.
Disable CBLAS and LAPACK.
2015-08-10 14:10:44 -05:00
Hank Anderson 1d183dcda8 Added lapacke sources. 2015-02-25 16:51:08 -06:00
Hank Anderson 4662a0b13a Changed generate functions to iterate through a list of float types.
This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX.
2015-02-15 17:44:37 -06:00
Hank Anderson d60b49e5c5 Turned off uninizialized variable warning when compiling lapack-netlib. 2015-02-10 14:36:43 -06:00
Hank Anderson 2828f6630c Added SMP sources to COMMONOBJS. 2015-02-04 14:01:36 -06:00
Hank Anderson a0aeda6187 Added function to set defines for the object names (e.g. -DNAME=dgemm). 2015-02-04 10:37:34 -06:00
Hank Anderson 84b3d760c4 Converted rest of Makefile.system to system.cmake. 2015-02-03 16:05:01 -06:00
Hank Anderson 0beea3a5a5 Converted LAPACK flags from Makefile.system. 2015-02-03 15:33:56 -06:00
Hank Anderson 0ccfa60a53 Changed fortran compiler name to be uppercase and stripped of path/ext. 2015-02-03 15:09:37 -06:00
Hank Anderson 30be551502 Corrected fortran compiler name variables.
Fixed some typos.

Updated c_check to set ARCH and BINARY64/32.

Added version variables.
2015-02-03 14:21:22 -06:00
Hank Anderson e818ace11a Ported more of Makefile.system to CMake. 2015-02-03 13:34:41 -06:00
Hank Anderson 2d5b442f5b Ported Fortran configuration code from Makefile.system to fc.cmake. 2015-02-03 12:32:23 -06:00
Hank Anderson af11aff309 Ported C compiler settings from Makefile.system into new cmake file. 2015-02-03 12:00:49 -06:00
Hank Anderson e66aa5f3b7 Ported arch dependent settings from Makefile.system to new cmake file. 2015-02-03 11:32:20 -06:00
Hank Anderson 31cf22cb4b Ported OS settings from Makefile.system into new cmake file. 2015-02-03 11:07:58 -06:00