docs: improvements to the User Manual
This commit is contained in:
parent
237c2c4130
commit
c1b9bb8519
|
@ -1,70 +1,174 @@
|
||||||
## Compile the library
|
|
||||||
|
This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS,
|
||||||
|
example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting
|
||||||
|
tips. Compiling OpenBLAS is optional, since you may be able to install with a
|
||||||
|
package manager.
|
||||||
|
|
||||||
|
!!! Note BLAS API reference documentation
|
||||||
|
|
||||||
|
The OpenBLAS documentation does not contain API reference documentation for
|
||||||
|
BLAS or LAPACK, since these are standardized APIs, the documentation for
|
||||||
|
which can be found in other places. If you want to understand every BLAS
|
||||||
|
function and definition, we recommend reading the
|
||||||
|
[Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation)
|
||||||
|
or the [Netlib BLAS documentation](http://netlib.org/blas/).
|
||||||
|
|
||||||
|
OpenBLAS does contain a limited number of functions that are non-standard,
|
||||||
|
these are documented at [OpenBLAS extension functions](extensions.md).
|
||||||
|
|
||||||
|
|
||||||
|
## Compiling OpenBLAS
|
||||||
|
|
||||||
### Normal compile
|
### Normal compile
|
||||||
* type `make` to detect the CPU automatically.
|
|
||||||
or
|
The default way to build and install OpenBLAS from source is with Make:
|
||||||
* type `make TARGET=xxx` to set target CPU, e.g. `make TARGET=NEHALEM`. The full target list is in file TargetList.txt.
|
```
|
||||||
|
make # add `-j4` to compile in parallel with 4 processes
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, the CPU architecture is detected automatically when invoking
|
||||||
|
`make`, and the build is optimized for the detected CPU. To override the
|
||||||
|
autodetection, use the `TARGET` flag:
|
||||||
|
|
||||||
|
```
|
||||||
|
# `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:
|
||||||
|
make TARGET=NEHALEM
|
||||||
|
```
|
||||||
|
The full list of known target CPU architectures can be found in
|
||||||
|
`TargetList.txt` in the root of the repository.
|
||||||
|
|
||||||
### Cross compile
|
### Cross compile
|
||||||
Please set `CC` and `FC` with the cross toolchains. Then, set `HOSTCC` with your host C compiler. At last, set `TARGET` explicitly.
|
|
||||||
|
|
||||||
Examples:
|
For a basic cross-compilation with Make, three steps need to be taken:
|
||||||
|
|
||||||
* On x86 box, compile the library for ARM Cortex-A9 linux.
|
- Set the `CC` and `FC` environment variables to select the cross toolchains
|
||||||
|
for C and Fortran.
|
||||||
|
- Set the `HOSTCC` environment variable to select the host C compiler (i.e. the
|
||||||
|
regular C compiler for the machine on which you are invoking the build).
|
||||||
|
- Set `TARGET` explicitly to the CPU architecture on which the produced
|
||||||
|
OpenBLAS binaries will be used.
|
||||||
|
|
||||||
Install only gnueabihf versions. Please check https://github.com/xianyi/OpenBLAS/issues/936#issuecomment-237596847
|
#### Cross-compilation examples
|
||||||
|
|
||||||
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
|
Compile the library for ARM Cortex-A9 linux on an x86-64 machine
|
||||||
|
_(note: install only `gnueabihf` versions of the cross toolchain - see
|
||||||
* On X86 box, compile this library for loongson3a CPU.
|
[this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847)
|
||||||
|
for why_):
|
||||||
|
```
|
||||||
|
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
|
||||||
|
```
|
||||||
|
|
||||||
|
Compile OpenBLAS for a loongson3a CPU on an x86-64 machine:
|
||||||
```
|
```
|
||||||
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
|
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
|
||||||
```
|
```
|
||||||
|
|
||||||
* On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.
|
Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine:
|
||||||
|
|
||||||
```
|
```
|
||||||
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
|
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
|
||||||
```
|
```
|
||||||
|
|
||||||
### Debug version
|
### Building a debug version
|
||||||
|
|
||||||
make DEBUG=1
|
Add `DEBUG=1` to your build command, e.g.:
|
||||||
|
```
|
||||||
|
make DEBUG=1
|
||||||
|
```
|
||||||
|
|
||||||
### Install to the directory (optional)
|
### Install to a specific directory
|
||||||
|
|
||||||
Example:
|
!!! note
|
||||||
|
|
||||||
make install PREFIX=your_installation_directory
|
Installing to a directory is optional; it is also possible to use the shared or static
|
||||||
|
libraries directly from the build directory.
|
||||||
|
|
||||||
The default directory is /opt/OpenBLAS. Note that any flags passed to `make` during build should also be passed to `make install` to circumvent any install errors, i.e. some headers not being copied over correctly.
|
Use `make install` with the `PREFIX` flag to install to a specific directory:
|
||||||
|
|
||||||
For more information, please read [Installation Guide](install.md).
|
```
|
||||||
|
make install PREFIX=/path/to/installation/directory
|
||||||
|
```
|
||||||
|
|
||||||
## Link the library
|
The default directory is `/opt/OpenBLAS`.
|
||||||
|
|
||||||
* Link shared library
|
!!! important
|
||||||
|
|
||||||
|
Note that any flags passed to `make` during build should also be passed to
|
||||||
|
`make install` to circumvent any install errors, i.e. some headers not
|
||||||
|
being copied over correctly.
|
||||||
|
|
||||||
|
For more detailed information on building/installing from source, please read
|
||||||
|
the [Installation Guide](install.md).
|
||||||
|
|
||||||
|
|
||||||
|
## Linking to OpenBLAS
|
||||||
|
|
||||||
|
OpenBLAS can be used as a shared or a static library.
|
||||||
|
|
||||||
|
### Link a shared library
|
||||||
|
|
||||||
|
The shared library is normally called `libopenblas.so`, but not that the name
|
||||||
|
may be different as a result of build flags used or naming choices by a distro
|
||||||
|
packager (see [distributing.md] for details). To link a shared library named
|
||||||
|
`libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers,
|
||||||
|
a `-I/path/to/includedir` is needed. And unless the library is installed in a
|
||||||
|
directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags
|
||||||
|
are needed. For a source file `test.c` (e.g., the example code under _Call
|
||||||
|
CBLAS interface_ further down), the shared library can then be linked with:
|
||||||
```
|
```
|
||||||
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
|
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
|
||||||
```
|
```
|
||||||
|
|
||||||
The `-Wl,-rpath,/your_path/OpenBLAS/lib` option to linker can be omitted if you ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a location that is part of the `ld.so` default search path (usually /lib,/usr/lib and /usr/local/lib). Alternatively, you can set the environment variable LD_LIBRARY_PATH to point to the folder that contains libopenblas.so. Otherwise, linking at runtime will fail with a message like `cannot open shared object file: no such file or directory`
|
The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you
|
||||||
|
ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in
|
||||||
|
`/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a
|
||||||
|
location that is part of the `ld.so` default search path (usually `/lib`,
|
||||||
|
`/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment
|
||||||
|
variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`.
|
||||||
|
Otherwise, the build may succeed but at runtime loading the library will fail
|
||||||
|
with a message like:
|
||||||
|
```
|
||||||
|
cannot open shared object file: no such file or directory
|
||||||
|
```
|
||||||
|
|
||||||
If the library is multithreaded, please add `-lpthread`. If the library contains LAPACK functions, please add `-lgfortran` or other Fortran libs, although if you only make calls to LAPACKE routines, i.e. your code has `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, `-lgfortran` is not needed.
|
More flags may be needed, depending on how OpenBLAS was built:
|
||||||
|
|
||||||
* Link static library
|
- If `libopenblas` is multi-threaded, please add `-lpthread`.
|
||||||
|
- If the library contains LAPACK functions (usually also true), please add
|
||||||
|
`-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`).
|
||||||
|
Note that if you only make calls to LAPACKE routines, i.e. your code has
|
||||||
|
`#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`,
|
||||||
|
then `-lgfortran` is not needed.
|
||||||
|
|
||||||
|
!!! tip Use pkg-config
|
||||||
|
|
||||||
|
Usually a pkg-config file (e.g., `openblas.pc`) is installed together
|
||||||
|
with a `libopenblas` shared library. pkg-config is a tool that will
|
||||||
|
tell you the exact flags needed for linking. For example:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ pkg-config --cflags openblas
|
||||||
|
-I/usr/local/include
|
||||||
|
$ pkg-config --libs openblas
|
||||||
|
-L/usr/local/lib -lopenblas
|
||||||
|
```
|
||||||
|
|
||||||
|
### Link a static library
|
||||||
|
|
||||||
|
Linking a static library is simpler - add the path to the static OpenBLAS
|
||||||
|
library to the compile command:
|
||||||
```
|
```
|
||||||
gcc -o test test.c /your/path/libopenblas.a
|
gcc -o test test.c /your/path/libopenblas.a
|
||||||
```
|
```
|
||||||
|
|
||||||
You can download `test.c` from https://gist.github.com/xianyi/5780018
|
|
||||||
|
|
||||||
## Code examples
|
## Code examples
|
||||||
|
|
||||||
### Call CBLAS interface
|
### Call CBLAS interface
|
||||||
This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656
|
|
||||||
|
This example shows calling `cblas_dgemm` in C:
|
||||||
|
|
||||||
|
<!-- Source: https://gist.github.com/xianyi/6930656 -->
|
||||||
```c
|
```c
|
||||||
#include <cblas.h>
|
#include <cblas.h>
|
||||||
#include <stdio.h>
|
#include <stdio.h>
|
||||||
|
@ -83,14 +187,17 @@ void main()
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
To compile this file, save it as `test_cblas_dgemm.c` and then run:
|
||||||
```
|
```
|
||||||
gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
|
gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
|
||||||
```
|
```
|
||||||
|
will result in a `test_cblas_open` executable.
|
||||||
|
|
||||||
### Call BLAS Fortran interface
|
### Call BLAS Fortran interface
|
||||||
|
|
||||||
This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018
|
This example shows calling the `dgemm` Fortran interface in C:
|
||||||
|
|
||||||
|
<!-- Source: https://gist.github.com/xianyi/5780018 -->
|
||||||
```c
|
```c
|
||||||
#include "stdio.h"
|
#include "stdio.h"
|
||||||
#include "stdlib.h"
|
#include "stdlib.h"
|
||||||
|
@ -158,22 +265,41 @@ int main(int argc, char* argv[])
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
To compile this file, save it as `time_dgemm.c` and then run:
|
||||||
```
|
```
|
||||||
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread
|
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread
|
||||||
./time_dgemm <m> <n> <k>
|
|
||||||
```
|
```
|
||||||
|
You can then run it as: `./time_dgemm <m> <n> <k>`, with `m`, `n`, and `k` input
|
||||||
|
parameters to the `time_dgemm` executable.
|
||||||
|
|
||||||
|
!!! note
|
||||||
|
|
||||||
|
When calling the Fortran interface from C, you have to deal with symbol name
|
||||||
|
differences caused by compiler conventions. That is why the `dgemm_` function
|
||||||
|
call in the example above has a trailing underscore. This is what it looks like
|
||||||
|
when using `gcc`/`gfortran`, however such details may change for different
|
||||||
|
compilers. Hence it requires extra support code. The CBLAS interface may be
|
||||||
|
more portable when writing C code.
|
||||||
|
|
||||||
|
When writing code that needs to be portable and work across different
|
||||||
|
platforms and compilers, the above code example is not recommended for
|
||||||
|
usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since
|
||||||
|
this problem isn't specific to OpenBLAS) functions are called in widely
|
||||||
|
used projects like Julia, SciPy, or R.
|
||||||
|
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
* Please read [Faq](faq.md) at first.
|
* Please read the [FAQ](faq.md) first, your problem may be described there.
|
||||||
* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
|
* Please ensure you are using a recent enough compiler, that supports the
|
||||||
* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
|
features your CPU provides (example: GCC versions before 4.6 were known to
|
||||||
* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
|
not support AVX kernels, and before 6.1 AVX512CD kernels).
|
||||||
* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
|
* The number of CPU cores supported by default is <=256. On Linux x86-64, there
|
||||||
* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
|
is experimental support for up to 1024 cores and 128 NUMA nodes if you build
|
||||||
|
the library with `BIGNUMA=1`.
|
||||||
## BLAS reference manual
|
* OpenBLAS does not set processor affinity by default. On Linux, you can enable
|
||||||
|
processor affinity by commenting out the line `NO_AFFINITY=1` in
|
||||||
If you want to understand every BLAS function and definition, please read [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) or [netlib.org](http://netlib.org/blas/)
|
`Makefile.rule`.
|
||||||
|
* On Loongson 3A, `make test` is known to fail with a `pthread_create` error
|
||||||
Here are [OpenBLAS extension functions](extensions.md)
|
and an `EAGAIN` error code. However, it will be OK when you run the same
|
||||||
|
testcase in a shell.
|
||||||
|
|
Loading…
Reference in New Issue