88 lines
3.6 KiB
Markdown
88 lines
3.6 KiB
Markdown
RELAPACK Configuration
|
|
======================
|
|
|
|
ReLAPACK has two configuration files: `make.inc`, which is included by the
|
|
Makefile, and `config.h` which is included in the source files.
|
|
|
|
|
|
Build and Testing Environment
|
|
-----------------------------
|
|
The build environment (compiler and flags) and the test configuration (linker
|
|
flags for BLAS and LAPACK) are specified in `make.inc`. The test matrix size
|
|
and error bounds are defined in `test/config.h`.
|
|
|
|
The library `librelapack.a` is compiled by invoking `make`. The tests are
|
|
performed by either `make test` or calling `make` in the test folder.
|
|
|
|
|
|
BLAS/LAPACK complex function interfaces
|
|
---------------------------------------
|
|
For BLAS and LAPACK functions that return a complex number, there exist two
|
|
conflicting (FORTRAN compiler dependent) calling conventions: either the result
|
|
is returned as a `struct` of two floating point numbers or an additional first
|
|
argument with a pointer to such a `struct` is used. By default ReLAPACK uses
|
|
the former (which is what gfortran uses), but it can switch to the latter by
|
|
setting `COMPLEX_FUNCTIONS_AS_ROUTINES` (or explicitly the BLAS and LAPACK
|
|
specific counterparts) to `1` in `config.h`.
|
|
|
|
**For MKL, `COMPLEX_FUNCTIONS_AS_ROUTINES` must be set to `1`.**
|
|
|
|
(Using the wrong convention will break `ctrsyl` and `ztrsyl` and the test cases
|
|
will segfault or return errors on the order of 1 or larger.)
|
|
|
|
|
|
BLAS extension `xgemmt`
|
|
-----------------------
|
|
The LDL decompositions require a general matrix-matrix product that updates only
|
|
a triangular matrix called `xgemmt`. If the BLAS implementation linked against
|
|
provides such a routine, set the flag `HAVE_XGEMMT` to `1` in `config.h`;
|
|
otherwise, ReLAPACK uses its own recursive implementation of these kernels.
|
|
|
|
`xgemmt` is provided by MKL.
|
|
|
|
|
|
Routine Selection
|
|
-----------------
|
|
ReLAPACK's routines are named `RELAPACK_X` (e.g., `RELAPACK_dgetrf`). If the
|
|
corresponding `INCLUDE_X` flag in `config.h` (e.g., `INCLUDE_DGETRF`) is set to
|
|
`1`, ReLAPACK additionally provides a wrapper under the LAPACK name (e.g.,
|
|
`dgetrf_`). By default, wrappers for all routines are enabled.
|
|
|
|
|
|
Crossover Size
|
|
--------------
|
|
The crossover size determines below which matrix sizes ReLAPACK's recursive
|
|
algorithms switch to LAPACK's unblocked routines to avoid tiny BLAS Level 3
|
|
routines. The crossover size is set in `config.h` and can be chosen either
|
|
globally for the entire library, by operation, or individually by routine.
|
|
|
|
|
|
Allowing Temporary Buffers
|
|
--------------------------
|
|
Two of ReLAPACK's routines make use of temporary buffers, which are allocated
|
|
and freed within ReLAPACK. Setting `ALLOW_MALLOC` (or one of the routine
|
|
specific counterparts) to 0 in `config.h` will disable these buffers. The
|
|
affected routines are:
|
|
|
|
* `xsytrf`: The LDL decomposition requires a buffer of size n^2 / 2. As in
|
|
LAPACK, this size can be queried by setting `lWork = -1` and the passed
|
|
buffer will be used if it is large enough; only if it is not, a local buffer
|
|
will be allocated.
|
|
|
|
The advantage of this mechanism is that ReLAPACK will seamlessly work even
|
|
with codes that statically provide too little memory instead of breaking
|
|
them.
|
|
|
|
* `xsygst`: The reduction of a real symmetric-definite generalized eigenproblem
|
|
to standard form can use an auxiliary buffer of size n^2 / 2 to avoid
|
|
redundant computations. It thereby performs about 30% less FLOPs than
|
|
LAPACK.
|
|
|
|
|
|
FORTRAN symbol names
|
|
--------------------
|
|
ReLAPACK is commonly linked to BLAS and LAPACK with standard FORTRAN interfaces.
|
|
Since these libraries usually have an underscore to their symbol names, ReLAPACK
|
|
has configuration switches in `config.h` to adjust the corresponding routine
|
|
names.
|