diff --git a/docs/extensions.md b/docs/extensions.md index 3502f6487..483b00928 100644 --- a/docs/extensions.md +++ b/docs/extensions.md @@ -1,4 +1,9 @@ -* BLAS-like extensions +OpenBLAS for the most part contains implementations of the reference (Netlib) +BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions +are also provided however, which mostly can be seen as "BLAS extensions". +This page documents those non-standard APIs. + +## BLAS-like extensions | Routine | Data Types | Description | | ------------- |:------------- | :---------------| @@ -9,20 +14,26 @@ | ?geadd | s,d,c,z | matrix add | | ?gemmt | s,d,c,z | gemm but only a triangular part updated| -* BLAS-like and Conversion functions for bfloat16 (available when OpenBLAS was compiled with BUILD_BFLOAT16=1) - * `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding - * `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding - * `void cblas_sbf16tos` converts a bfloat16 array to an array of floats - * `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles - * `float cblas_sbdot` computes the dot product of two bfloat16 arrays - * `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16 - * `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 -* Utility functions - * openblas_get_num_threads - * openblas_set_num_threads - * `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores") - * `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading - * `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell` - * `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the cpu affinity mask of the given thread to the provided cpuset. (Only available under Linux, with semantics identical to pthread_setaffinity_np) +## bfloat16 functionality + +BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was compiled with `BUILD_BFLOAT16=1`): + +* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding +* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding +* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats +* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles +* `float cblas_sbdot` computes the dot product of two bfloat16 arrays +* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16 +* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 + +## Utility functions + +* `openblas_get_num_threads` +* `openblas_set_num_threads` +* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores") +* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading +* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell` +* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread + to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.