306 lines
10 KiB
Markdown
306 lines
10 KiB
Markdown
|
|
This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS,
|
|
example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting
|
|
tips. Compiling OpenBLAS is optional, since you may be able to install with a
|
|
package manager.
|
|
|
|
!!! Note BLAS API reference documentation
|
|
|
|
The OpenBLAS documentation does not contain API reference documentation for
|
|
BLAS or LAPACK, since these are standardized APIs, the documentation for
|
|
which can be found in other places. If you want to understand every BLAS
|
|
and LAPACK function and definition, we recommend reading the
|
|
[Netlib BLAS ](http://netlib.org/blas/) and [Netlib LAPACK](http://netlib.org/lapack/)
|
|
documentation.
|
|
|
|
OpenBLAS does contain a limited number of functions that are non-standard,
|
|
these are documented at [OpenBLAS extension functions](extensions.md).
|
|
|
|
|
|
## Compiling OpenBLAS
|
|
|
|
### Normal compile
|
|
|
|
The default way to build and install OpenBLAS from source is with Make:
|
|
```
|
|
make # add `-j4` to compile in parallel with 4 processes
|
|
make install
|
|
```
|
|
|
|
By default, the CPU architecture is detected automatically when invoking
|
|
`make`, and the build is optimized for the detected CPU. To override the
|
|
autodetection, use the `TARGET` flag:
|
|
|
|
```
|
|
# `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:
|
|
make TARGET=NEHALEM
|
|
```
|
|
The full list of known target CPU architectures can be found in
|
|
`TargetList.txt` in the root of the repository.
|
|
|
|
### Cross compile
|
|
|
|
For a basic cross-compilation with Make, three steps need to be taken:
|
|
|
|
- Set the `CC` and `FC` environment variables to select the cross toolchains
|
|
for C and Fortran.
|
|
- Set the `HOSTCC` environment variable to select the host C compiler (i.e. the
|
|
regular C compiler for the machine on which you are invoking the build).
|
|
- Set `TARGET` explicitly to the CPU architecture on which the produced
|
|
OpenBLAS binaries will be used.
|
|
|
|
#### Cross-compilation examples
|
|
|
|
Compile the library for ARM Cortex-A9 linux on an x86-64 machine
|
|
_(note: install only `gnueabihf` versions of the cross toolchain - see
|
|
[this issue comment](https://github.com/OpenMathLib/OpenBLAS/issues/936#issuecomment-237596847)
|
|
for why_):
|
|
```
|
|
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
|
|
```
|
|
|
|
Compile OpenBLAS for a loongson3a CPU on an x86-64 machine:
|
|
```
|
|
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
|
|
```
|
|
|
|
Compile OpenBLAS for loongson3a CPU with the `loongcc` (based on Open64) compiler on an x86-64 machine:
|
|
```
|
|
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
|
|
```
|
|
|
|
### Building a debug version
|
|
|
|
Add `DEBUG=1` to your build command, e.g.:
|
|
```
|
|
make DEBUG=1
|
|
```
|
|
|
|
### Install to a specific directory
|
|
|
|
!!! note
|
|
|
|
Installing to a directory is optional; it is also possible to use the shared or static
|
|
libraries directly from the build directory.
|
|
|
|
Use `make install` with the `PREFIX` flag to install to a specific directory:
|
|
|
|
```
|
|
make install PREFIX=/path/to/installation/directory
|
|
```
|
|
|
|
The default directory is `/opt/OpenBLAS`.
|
|
|
|
!!! important
|
|
|
|
Note that any flags passed to `make` during build should also be passed to
|
|
`make install` to circumvent any install errors, i.e. some headers not
|
|
being copied over correctly.
|
|
|
|
For more detailed information on building/installing from source, please read
|
|
the [Installation Guide](install.md).
|
|
|
|
|
|
## Linking to OpenBLAS
|
|
|
|
OpenBLAS can be used as a shared or a static library.
|
|
|
|
### Link a shared library
|
|
|
|
The shared library is normally called `libopenblas.so`, but not that the name
|
|
may be different as a result of build flags used or naming choices by a distro
|
|
packager (see [distributing.md] for details). To link a shared library named
|
|
`libopenblas.so`, the flag `-lopenblas` is needed. To find the OpenBLAS headers,
|
|
a `-I/path/to/includedir` is needed. And unless the library is installed in a
|
|
directory that the linker searches by default, also `-L` and `-Wl,-rpath` flags
|
|
are needed. For a source file `test.c` (e.g., the example code under _Call
|
|
CBLAS interface_ further down), the shared library can then be linked with:
|
|
```
|
|
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
|
|
```
|
|
|
|
The `-Wl,-rpath,/your_path/OpenBLAS/lib` linker flag can be omitted if you
|
|
ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in
|
|
`/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a
|
|
location that is part of the `ld.so` default search path (usually `/lib`,
|
|
`/usr/lib` and `/usr/local/lib`). Alternatively, you can set the environment
|
|
variable `LD_LIBRARY_PATH` to point to the folder that contains `libopenblas.so`.
|
|
Otherwise, the build may succeed but at runtime loading the library will fail
|
|
with a message like:
|
|
```
|
|
cannot open shared object file: no such file or directory
|
|
```
|
|
|
|
More flags may be needed, depending on how OpenBLAS was built:
|
|
|
|
- If `libopenblas` is multi-threaded, please add `-lpthread`.
|
|
- If the library contains LAPACK functions (usually also true), please add
|
|
`-lgfortran` (other Fortran libraries may also be needed, e.g. `-lquadmath`).
|
|
Note that if you only make calls to LAPACKE routines, i.e. your code has
|
|
`#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`,
|
|
then `-lgfortran` is not needed.
|
|
|
|
!!! tip Use pkg-config
|
|
|
|
Usually a pkg-config file (e.g., `openblas.pc`) is installed together
|
|
with a `libopenblas` shared library. pkg-config is a tool that will
|
|
tell you the exact flags needed for linking. For example:
|
|
|
|
```
|
|
$ pkg-config --cflags openblas
|
|
-I/usr/local/include
|
|
$ pkg-config --libs openblas
|
|
-L/usr/local/lib -lopenblas
|
|
```
|
|
|
|
### Link a static library
|
|
|
|
Linking a static library is simpler - add the path to the static OpenBLAS
|
|
library to the compile command:
|
|
```
|
|
gcc -o test test.c /your/path/libopenblas.a
|
|
```
|
|
|
|
|
|
## Code examples
|
|
|
|
### Call CBLAS interface
|
|
|
|
This example shows calling `cblas_dgemm` in C:
|
|
|
|
<!-- Source: https://gist.github.com/xianyi/6930656 -->
|
|
```c
|
|
#include <cblas.h>
|
|
#include <stdio.h>
|
|
|
|
void main()
|
|
{
|
|
int i=0;
|
|
double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
|
|
double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
|
|
double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
|
|
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);
|
|
|
|
for(i=0; i<9; i++)
|
|
printf("%lf ", C[i]);
|
|
printf("\n");
|
|
}
|
|
```
|
|
|
|
To compile this file, save it as `test_cblas_dgemm.c` and then run:
|
|
```
|
|
gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
|
|
```
|
|
will result in a `test_cblas_open` executable.
|
|
|
|
### Call BLAS Fortran interface
|
|
|
|
This example shows calling the `dgemm` Fortran interface in C:
|
|
|
|
<!-- Source: https://gist.github.com/xianyi/5780018 -->
|
|
```c
|
|
#include "stdio.h"
|
|
#include "stdlib.h"
|
|
#include "sys/time.h"
|
|
#include "time.h"
|
|
|
|
extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);
|
|
|
|
int main(int argc, char* argv[])
|
|
{
|
|
int i;
|
|
printf("test!\n");
|
|
if(argc<4){
|
|
printf("Input Error\n");
|
|
return 1;
|
|
}
|
|
|
|
int m = atoi(argv[1]);
|
|
int n = atoi(argv[2]);
|
|
int k = atoi(argv[3]);
|
|
int sizeofa = m * k;
|
|
int sizeofb = k * n;
|
|
int sizeofc = m * n;
|
|
char ta = 'N';
|
|
char tb = 'N';
|
|
double alpha = 1.2;
|
|
double beta = 0.001;
|
|
|
|
struct timeval start,finish;
|
|
double duration;
|
|
|
|
double* A = (double*)malloc(sizeof(double) * sizeofa);
|
|
double* B = (double*)malloc(sizeof(double) * sizeofb);
|
|
double* C = (double*)malloc(sizeof(double) * sizeofc);
|
|
|
|
srand((unsigned)time(NULL));
|
|
|
|
for (i=0; i<sizeofa; i++)
|
|
A[i] = i%3+1;//(rand()%100)/10.0;
|
|
|
|
for (i=0; i<sizeofb; i++)
|
|
B[i] = i%3+1;//(rand()%100)/10.0;
|
|
|
|
for (i=0; i<sizeofc; i++)
|
|
C[i] = i%3+1;//(rand()%100)/10.0;
|
|
//#if 0
|
|
printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc);
|
|
gettimeofday(&start, NULL);
|
|
dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
|
|
gettimeofday(&finish, NULL);
|
|
|
|
duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;
|
|
double gflops = 2.0 * m *n*k;
|
|
gflops = gflops/duration*1.0e-6;
|
|
|
|
FILE *fp;
|
|
fp = fopen("timeDGEMM.txt", "a");
|
|
fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops);
|
|
fclose(fp);
|
|
|
|
free(A);
|
|
free(B);
|
|
free(C);
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
To compile this file, save it as `time_dgemm.c` and then run:
|
|
```
|
|
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread
|
|
```
|
|
You can then run it as: `./time_dgemm <m> <n> <k>`, with `m`, `n`, and `k` input
|
|
parameters to the `time_dgemm` executable.
|
|
|
|
!!! note
|
|
|
|
When calling the Fortran interface from C, you have to deal with symbol name
|
|
differences caused by compiler conventions. That is why the `dgemm_` function
|
|
call in the example above has a trailing underscore. This is what it looks like
|
|
when using `gcc`/`gfortran`, however such details may change for different
|
|
compilers. Hence it requires extra support code. The CBLAS interface may be
|
|
more portable when writing C code.
|
|
|
|
When writing code that needs to be portable and work across different
|
|
platforms and compilers, the above code example is not recommended for
|
|
usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since
|
|
this problem isn't specific to OpenBLAS) functions are called in widely
|
|
used projects like Julia, SciPy, or R.
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
* Please read the [FAQ](faq.md) first, your problem may be described there.
|
|
* Please ensure you are using a recent enough compiler, that supports the
|
|
features your CPU provides (example: GCC versions before 4.6 were known to
|
|
not support AVX kernels, and before 6.1 AVX512CD kernels).
|
|
* The number of CPU cores supported by default is <=256. On Linux x86-64, there
|
|
is experimental support for up to 1024 cores and 128 NUMA nodes if you build
|
|
the library with `BIGNUMA=1`.
|
|
* OpenBLAS does not set processor affinity by default. On Linux, you can enable
|
|
processor affinity by commenting out the line `NO_AFFINITY=1` in
|
|
`Makefile.rule`.
|
|
* On Loongson 3A, `make test` is known to fail with a `pthread_create` error
|
|
and an `EAGAIN` error code. However, it will be OK when you run the same
|
|
testcase in a shell.
|