OpenBLAS/docs/user_manual.md

306 lines
10 KiB
Markdown

This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS,
example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting
tips. Compiling OpenBLAS is optional, since you may be able to install with a
package manager.
!!! Note BLAS API reference documentation
The OpenBLAS documentation does not contain API reference documentation for
BLAS or LAPACK, since these are standardized APIs, the documentation for
which can be found in other places. If you want to understand every BLAS
and LAPACK function and definition, we recommend reading the
[Netlib BLAS ](http://netlib.org/blas/) and [Netlib LAPACK](http://netlib.org/lapack/)
documentation.
OpenBLAS does contain a limited number of functions that are non-standard,
these are documented at [OpenBLAS extension functions](extensions.md).
## Compiling OpenBLAS
### Normal compile
The default way to build and install OpenBLAS from source is with Make:
```
make # add `-j4` to compile in parallel with 4 processes
make install
```
By default, the CPU architecture is detected automatically when invoking
`make`, and the build is optimized for the detected CPU. To override the
autodetection, use the `TARGET` flag:
```
# `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:
make TARGET=NEHALEM
```
The full list of known target CPU architectures can be found in
`TargetList.txt` in the root of the repository.
### Cross compile
For a basic cross-compilation with Make, three steps need to be taken:
- Set the `CC` and `FC` environment variables to select the cross toolchains
for C and Fortran.
- Set the `HOSTCC` environment variable to select the host C compiler (i.e. the
regular C compiler for the machine on which you are invoking the build).
- Set `TARGET` explicitly to the CPU architecture on which the produced
OpenBLAS binaries will be used.
#### Cross-compilation examples
Compile the library for ARM Cortex-A9 linux on an x86-64 machine
_(note: install only `gnueabihf` versions of the cross toolchain - see
[this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847)
for why_):
```
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
```
Compile OpenBLAS for a loongson3a CPU on an x86-64 machine:
```
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
```
Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine:
```
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
```
### Building a debug version
Add `DEBUG=1` to your build command, e.g.:
```
make DEBUG=1
```
### Install to a specific directory
!!! note
Installing to a directory is optional; it is also possible to use the shared or static
libraries directly from the build directory.
Use `make install` with the `PREFIX` flag to install to a specific directory:
```
make install PREFIX=/path/to/installation/directory
```
The default directory is `/opt/OpenBLAS`.
!!! important
Note that any flags passed to `make` during build should also be passed to
`make install` to circumvent any install errors, i.e. some headers not
being copied over correctly.
For more detailed information on building/installing from source, please read
the [Installation Guide](install.md).
## Linking to OpenBLAS
OpenBLAS can be used as a shared or a static library.
### Link a shared library
The shared library is normally called `libopenblas.so`, but not that the name
may be different as a result of build flags used or naming choices by a distro
packager (see [distributing.md] for details). To link a shared library named
`libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers,
a `-I/path/to/includedir` is needed. And unless the library is installed in a
directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags
are needed. For a source file `test.c` (e.g., the example code under _Call
CBLAS interface_ further down), the shared library can then be linked with:
```
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
```
The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you
ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in
`/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a
location that is part of the `ld.so` default search path (usually `/lib`,
`/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment
variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`.
Otherwise, the build may succeed but at runtime loading the library will fail
with a message like:
```
cannot open shared object file: no such file or directory
```
More flags may be needed, depending on how OpenBLAS was built:
- If `libopenblas` is multi-threaded, please add `-lpthread`.
- If the library contains LAPACK functions (usually also true), please add
`-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`).
Note that if you only make calls to LAPACKE routines, i.e. your code has
`#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`,
then `-lgfortran` is not needed.
!!! tip Use pkg-config
Usually a pkg-config file (e.g., `openblas.pc`) is installed together
with a `libopenblas` shared library. pkg-config is a tool that will
tell you the exact flags needed for linking. For example:
```
$ pkg-config --cflags openblas
-I/usr/local/include
$ pkg-config --libs openblas
-L/usr/local/lib -lopenblas
```
### Link a static library
Linking a static library is simpler - add the path to the static OpenBLAS
library to the compile command:
```
gcc -o test test.c /your/path/libopenblas.a
```
## Code examples
### Call CBLAS interface
This example shows calling `cblas_dgemm` in C:
<!-- Source: https://gist.github.com/xianyi/6930656 -->
```c
#include <cblas.h>
#include <stdio.h>
void main()
{
int i=0;
double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);
for(i=0; i<9; i++)
printf("%lf ", C[i]);
printf("\n");
}
```
To compile this file, save it as `test_cblas_dgemm.c` and then run:
```
gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
```
will result in a `test_cblas_open` executable.
### Call BLAS Fortran interface
This example shows calling the `dgemm` Fortran interface in C:
<!-- Source: https://gist.github.com/xianyi/5780018 -->
```c
#include "stdio.h"
#include "stdlib.h"
#include "sys/time.h"
#include "time.h"
extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);
int main(int argc, char* argv[])
{
int i;
printf("test!\n");
if(argc<4){
printf("Input Error\n");
return 1;
}
int m = atoi(argv[1]);
int n = atoi(argv[2]);
int k = atoi(argv[3]);
int sizeofa = m * k;
int sizeofb = k * n;
int sizeofc = m * n;
char ta = 'N';
char tb = 'N';
double alpha = 1.2;
double beta = 0.001;
struct timeval start,finish;
double duration;
double* A = (double*)malloc(sizeof(double) * sizeofa);
double* B = (double*)malloc(sizeof(double) * sizeofb);
double* C = (double*)malloc(sizeof(double) * sizeofc);
srand((unsigned)time(NULL));
for (i=0; i<sizeofa; i++)
A[i] = i%3+1;//(rand()%100)/10.0;
for (i=0; i<sizeofb; i++)
B[i] = i%3+1;//(rand()%100)/10.0;
for (i=0; i<sizeofc; i++)
C[i] = i%3+1;//(rand()%100)/10.0;
//#if 0
printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc);
gettimeofday(&start, NULL);
dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
gettimeofday(&finish, NULL);
duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;
double gflops = 2.0 * m *n*k;
gflops = gflops/duration*1.0e-6;
FILE *fp;
fp = fopen("timeDGEMM.txt", "a");
fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops);
fclose(fp);
free(A);
free(B);
free(C);
return 0;
}
```
To compile this file, save it as `time_dgemm.c` and then run:
```
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread
```
You can then run it as: `./time_dgemm <m> <n> <k>`, with `m`, `n`, and `k` input
parameters to the `time_dgemm` executable.
!!! note
When calling the Fortran interface from C, you have to deal with symbol name
differences caused by compiler conventions. That is why the `dgemm_` function
call in the example above has a trailing underscore. This is what it looks like
when using `gcc`/`gfortran`, however such details may change for different
compilers. Hence it requires extra support code. The CBLAS interface may be
more portable when writing C code.
When writing code that needs to be portable and work across different
platforms and compilers, the above code example is not recommended for
usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since
this problem isn't specific to OpenBLAS) functions are called in widely
used projects like Julia, SciPy, or R.
## Troubleshooting
* Please read the [FAQ](faq.md) first, your problem may be described there.
* Please ensure you are using a recent enough compiler, that supports the
features your CPU provides (example: GCC versions before 4.6 were known to
not support AVX kernels, and before 6.1 AVX512CD kernels).
* The number of CPU cores supported by default is <=256. On Linux x86-64, there
is experimental support for up to 1024 cores and 128 NUMA nodes if you build
the library with `BIGNUMA=1`.
* OpenBLAS does not set processor affinity by default. On Linux, you can enable
processor affinity by commenting out the line `NO_AFFINITY=1` in
`Makefile.rule`.
* On Loongson 3A, `make test` is known to fail with a `pthread_create` error
and an `EAGAIN` error code. However, it will be OK when you run the same
testcase in a shell.