diff --git a/docs/user_manual.md b/docs/user_manual.md index 4d5fa9eaa..c403ab2b2 100644 --- a/docs/user_manual.md +++ b/docs/user_manual.md @@ -1,70 +1,174 @@ -## Compile the library + +This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS, +example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting +tips. Compiling OpenBLAS is optional, since you may be able to install with a +package manager. + +!!! Note BLAS API reference documentation + + The OpenBLAS documentation does not contain API reference documentation for + BLAS or LAPACK, since these are standardized APIs, the documentation for + which can be found in other places. If you want to understand every BLAS + function and definition, we recommend reading the + [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) + or the [Netlib BLAS documentation](http://netlib.org/blas/). + + OpenBLAS does contain a limited number of functions that are non-standard, + these are documented at [OpenBLAS extension functions](extensions.md). + + +## Compiling OpenBLAS + ### Normal compile - * type `make` to detect the CPU automatically. - or - * type `make TARGET=xxx` to set target CPU, e.g. `make TARGET=NEHALEM`. The full target list is in file TargetList.txt. + +The default way to build and install OpenBLAS from source is with Make: +``` +make # add `-j4` to compile in parallel with 4 processes +make install +``` + +By default, the CPU architecture is detected automatically when invoking +`make`, and the build is optimized for the detected CPU. To override the +autodetection, use the `TARGET` flag: + +``` +# `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU: +make TARGET=NEHALEM +``` +The full list of known target CPU architectures can be found in +`TargetList.txt` in the root of the repository. ### Cross compile -Please set `CC` and `FC` with the cross toolchains. Then, set `HOSTCC` with your host C compiler. At last, set `TARGET` explicitly. -Examples: +For a basic cross-compilation with Make, three steps need to be taken: -* On x86 box, compile the library for ARM Cortex-A9 linux. +- Set the `CC` and `FC` environment variables to select the cross toolchains + for C and Fortran. +- Set the `HOSTCC` environment variable to select the host C compiler (i.e. the + regular C compiler for the machine on which you are invoking the build). +- Set `TARGET` explicitly to the CPU architecture on which the produced + OpenBLAS binaries will be used. -Install only gnueabihf versions. Please check https://github.com/xianyi/OpenBLAS/issues/936#issuecomment-237596847 +#### Cross-compilation examples - make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9 - -* On X86 box, compile this library for loongson3a CPU. +Compile the library for ARM Cortex-A9 linux on an x86-64 machine +_(note: install only `gnueabihf` versions of the cross toolchain - see +[this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847) +for why_): +``` +make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9 +``` +Compile OpenBLAS for a loongson3a CPU on an x86-64 machine: ``` make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A ``` -* On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler. - +Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine: ``` make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32 ``` -### Debug version +### Building a debug version - make DEBUG=1 +Add `DEBUG=1` to your build command, e.g.: +``` +make DEBUG=1 +``` -### Install to the directory (optional) +### Install to a specific directory -Example: +!!! note - make install PREFIX=your_installation_directory + Installing to a directory is optional; it is also possible to use the shared or static + libraries directly from the build directory. -The default directory is /opt/OpenBLAS. Note that any flags passed to `make` during build should also be passed to `make install` to circumvent any install errors, i.e. some headers not being copied over correctly. +Use `make install` with the `PREFIX` flag to install to a specific directory: -For more information, please read [Installation Guide](install.md). +``` +make install PREFIX=/path/to/installation/directory +``` -## Link the library +The default directory is `/opt/OpenBLAS`. -* Link shared library +!!! important + Note that any flags passed to `make` during build should also be passed to + `make install` to circumvent any install errors, i.e. some headers not + being copied over correctly. + +For more detailed information on building/installing from source, please read +the [Installation Guide](install.md). + + +## Linking to OpenBLAS + +OpenBLAS can be used as a shared or a static library. + +### Link a shared library + +The shared library is normally called `libopenblas.so`, but not that the name +may be different as a result of build flags used or naming choices by a distro +packager (see [distributing.md] for details). To link a shared library named +`libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers, +a `-I/path/to/includedir` is needed. And unless the library is installed in a +directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags +are needed. For a source file `test.c` (e.g., the example code under _Call +CBLAS interface_ further down), the shared library can then be linked with: ``` gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas ``` -The `-Wl,-rpath,/your_path/OpenBLAS/lib` option to linker can be omitted if you ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a location that is part of the `ld.so` default search path (usually /lib,/usr/lib and /usr/local/lib). Alternatively, you can set the environment variable LD_LIBRARY_PATH to point to the folder that contains libopenblas.so. Otherwise, linking at runtime will fail with a message like `cannot open shared object file: no such file or directory` +The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you +ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in +`/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a +location that is part of the `ld.so` default search path (usually `/lib`, +`/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment +variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`. +Otherwise, the build may succeed but at runtime loading the library will fail +with a message like: +``` +cannot open shared object file: no such file or directory +``` -If the library is multithreaded, please add `-lpthread`. If the library contains LAPACK functions, please add `-lgfortran` or other Fortran libs, although if you only make calls to LAPACKE routines, i.e. your code has `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, `-lgfortran` is not needed. +More flags may be needed, depending on how OpenBLAS was built: -* Link static library +- If `libopenblas` is multi-threaded, please add `-lpthread`. +- If the library contains LAPACK functions (usually also true), please add + `-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`). + Note that if you only make calls to LAPACKE routines, i.e. your code has + `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, + then `-lgfortran` is not needed. +!!! tip Use pkg-config + + Usually a pkg-config file (e.g., `openblas.pc`) is installed together + with a `libopenblas` shared library. pkg-config is a tool that will + tell you the exact flags needed for linking. For example: + + ``` + $ pkg-config --cflags openblas + -I/usr/local/include + $ pkg-config --libs openblas + -L/usr/local/lib -lopenblas + ``` + +### Link a static library + +Linking a static library is simpler - add the path to the static OpenBLAS +library to the compile command: ``` gcc -o test test.c /your/path/libopenblas.a ``` -You can download `test.c` from https://gist.github.com/xianyi/5780018 ## Code examples ### Call CBLAS interface -This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656 + +This example shows calling `cblas_dgemm` in C: + + ```c #include #include @@ -83,14 +187,17 @@ void main() } ``` +To compile this file, save it as `test_cblas_dgemm.c` and then run: ``` -gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran +gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran ``` +will result in a `test_cblas_open` executable. ### Call BLAS Fortran interface -This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018 +This example shows calling the `dgemm` Fortran interface in C: + ```c #include "stdio.h" #include "stdlib.h" @@ -158,22 +265,41 @@ int main(int argc, char* argv[]) } ``` +To compile this file, save it as `time_dgemm.c` and then run: ``` gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread -./time_dgemm ``` +You can then run it as: `./time_dgemm `, with `m`, `n`, and `k` input +parameters to the `time_dgemm` executable. + +!!! note + + When calling the Fortran interface from C, you have to deal with symbol name + differences caused by compiler conventions. That is why the `dgemm_` function + call in the example above has a trailing underscore. This is what it looks like + when using `gcc`/`gfortran`, however such details may change for different + compilers. Hence it requires extra support code. The CBLAS interface may be + more portable when writing C code. + + When writing code that needs to be portable and work across different + platforms and compilers, the above code example is not recommended for + usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since + this problem isn't specific to OpenBLAS) functions are called in widely + used projects like Julia, SciPy, or R. + ## Troubleshooting -* Please read [Faq](faq.md) at first. -* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD. -* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code. -* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1. -* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html). -* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell. - -## BLAS reference manual - -If you want to understand every BLAS function and definition, please read [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) or [netlib.org](http://netlib.org/blas/) - -Here are [OpenBLAS extension functions](extensions.md) +* Please read the [FAQ](faq.md) first, your problem may be described there. +* Please ensure you are using a recent enough compiler, that supports the + features your CPU provides (example: GCC versions before 4.6 were known to + not support AVX kernels, and before 6.1 AVX512CD kernels). +* The number of CPU cores supported by default is <=256. On Linux x86-64, there + is experimental support for up to 1024 cores and 128 NUMA nodes if you build + the library with `BIGNUMA=1`. +* OpenBLAS does not set processor affinity by default. On Linux, you can enable + processor affinity by commenting out the line `NO_AFFINITY=1` in + `Makefile.rule`. +* On Loongson 3A, `make test` is known to fail with a `pthread_create` error + and an `EAGAIN` error code. However, it will be OK when you run the same + testcase in a shell.