diff --git a/docs/install/index.html b/docs/install/index.html
index c1a77389a..84c75ae12 100644
--- a/docs/install/index.html
+++ b/docs/install/index.html
@@ -317,31 +317,13 @@
     <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
       
         <li class="md-nav__item">
-  <a href="#quick-install" class="md-nav__link">
+  <a href="#installing-with-a-package-manager" class="md-nav__link">
     <span class="md-ellipsis">
-      Quick install
+      Installing with a package manager
     </span>
   </a>
   
-</li>
-      
-        <li class="md-nav__item">
-  <a href="#source" class="md-nav__link">
-    <span class="md-ellipsis">
-      Source
-    </span>
-  </a>
-  
-</li>
-      
-        <li class="md-nav__item">
-  <a href="#platforms" class="md-nav__link">
-    <span class="md-ellipsis">
-      Platforms
-    </span>
-  </a>
-  
-    <nav class="md-nav" aria-label="Platforms">
+    <nav class="md-nav" aria-label="Installing with a package manager">
       <ul class="md-nav__list">
         
           <li class="md-nav__item">
@@ -351,21 +333,6 @@
     </span>
   </a>
   
-    <nav class="md-nav" aria-label="Linux">
-      <ul class="md-nav__list">
-        
-          <li class="md-nav__item">
-  <a href="#precompiled-packages" class="md-nav__link">
-    <span class="md-ellipsis">
-      Precompiled packages
-    </span>
-  </a>
-  
-</li>
-        
-      </ul>
-    </nav>
-  
 </li>
         
           <li class="md-nav__item">
@@ -375,22 +342,73 @@
     </span>
   </a>
   
-    <nav class="md-nav" aria-label="Windows">
-      <ul class="md-nav__list">
+</li>
         
           <li class="md-nav__item">
-  <a href="#precompiled-packages_1" class="md-nav__link">
+  <a href="#macos" class="md-nav__link">
     <span class="md-ellipsis">
-      Precompiled packages
+      macOS
     </span>
   </a>
   
 </li>
         
           <li class="md-nav__item">
-  <a href="#visual-studio" class="md-nav__link">
+  <a href="#freebsd" class="md-nav__link">
     <span class="md-ellipsis">
-      Visual Studio
+      FreeBSD
+    </span>
+  </a>
+  
+</li>
+        
+      </ul>
+    </nav>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#building-from-source" class="md-nav__link">
+    <span class="md-ellipsis">
+      Building from source
+    </span>
+  </a>
+  
+    <nav class="md-nav" aria-label="Building from source">
+      <ul class="md-nav__list">
+        
+          <li class="md-nav__item">
+  <a href="#linux-and-macos" class="md-nav__link">
+    <span class="md-ellipsis">
+      Linux and macOS
+    </span>
+  </a>
+  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#windows_1" class="md-nav__link">
+    <span class="md-ellipsis">
+      Windows
+    </span>
+  </a>
+  
+    <nav class="md-nav" aria-label="Windows">
+      <ul class="md-nav__list">
+        
+          <li class="md-nav__item">
+  <a href="#visual-studio-native-windows-abi" class="md-nav__link">
+    <span class="md-ellipsis">
+      Visual Studio &amp; native Windows ABI
+    </span>
+  </a>
+  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#mingw-gnu-abi" class="md-nav__link">
+    <span class="md-ellipsis">
+      MinGW &amp; GNU ABI
     </span>
   </a>
   
@@ -406,42 +424,9 @@
 </li>
         
           <li class="md-nav__item">
-  <a href="#mingw-import-library" class="md-nav__link">
+  <a href="#generating-an-import-library" class="md-nav__link">
     <span class="md-ellipsis">
-      MinGW import library
-    </span>
-  </a>
-  
-</li>
-        
-      </ul>
-    </nav>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#mac-osx" class="md-nav__link">
-    <span class="md-ellipsis">
-      Mac OSX
-    </span>
-  </a>
-  
-    <nav class="md-nav" aria-label="Mac OSX">
-      <ul class="md-nav__list">
-        
-          <li class="md-nav__item">
-  <a href="#precompiled-packages_2" class="md-nav__link">
-    <span class="md-ellipsis">
-      Precompiled packages
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-on-apple-m1" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build on Apple M1
+      Generating an import library
     </span>
   </a>
   
@@ -463,72 +448,18 @@
       <ul class="md-nav__list">
         
           <li class="md-nav__item">
-  <a href="#prerequisites_1" class="md-nav__link">
+  <a href="#building-for-armv7" class="md-nav__link">
     <span class="md-ellipsis">
-      Prerequisites
+      Building for ARMV7
     </span>
   </a>
   
 </li>
         
           <li class="md-nav__item">
-  <a href="#building-with-android-ndk-using-clang-compiler" class="md-nav__link">
+  <a href="#building-for-armv8" class="md-nav__link">
     <span class="md-ellipsis">
-      Building with android NDK using clang compiler
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-armv7-with-clang" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build ARMV7 with clang
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-armv8-with-clang" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build ARMV8 with clang
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#alternative-script-which-was-tested-on-osx-with-ndk2136528147" class="md-nav__link">
-    <span class="md-ellipsis">
-      Alternative script which was tested on OSX with NDK(21.3.6528147)
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#building-openblas-with-very-old-gcc-based-versions-of-the-ndk-without-fortran" class="md-nav__link">
-    <span class="md-ellipsis">
-      Building OpenBLAS with very old gcc-based versions of the NDK, without Fortran
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#building-openblas-with-fortran" class="md-nav__link">
-    <span class="md-ellipsis">
-      Building OpenBLAS with Fortran
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#linking-openblas-0219-and-earlier-for-armv7" class="md-nav__link">
-    <span class="md-ellipsis">
-      Linking OpenBLAS (0.2.19 and earlier) for ARMV7
+      Building for ARMV8
     </span>
   </a>
   
@@ -558,7 +489,7 @@
 </li>
         
           <li class="md-nav__item">
-  <a href="#freebsd" class="md-nav__link">
+  <a href="#freebsd_1" class="md-nav__link">
     <span class="md-ellipsis">
       FreeBSD
     </span>
@@ -775,31 +706,13 @@
     <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
       
         <li class="md-nav__item">
-  <a href="#quick-install" class="md-nav__link">
+  <a href="#installing-with-a-package-manager" class="md-nav__link">
     <span class="md-ellipsis">
-      Quick install
+      Installing with a package manager
     </span>
   </a>
   
-</li>
-      
-        <li class="md-nav__item">
-  <a href="#source" class="md-nav__link">
-    <span class="md-ellipsis">
-      Source
-    </span>
-  </a>
-  
-</li>
-      
-        <li class="md-nav__item">
-  <a href="#platforms" class="md-nav__link">
-    <span class="md-ellipsis">
-      Platforms
-    </span>
-  </a>
-  
-    <nav class="md-nav" aria-label="Platforms">
+    <nav class="md-nav" aria-label="Installing with a package manager">
       <ul class="md-nav__list">
         
           <li class="md-nav__item">
@@ -809,21 +722,6 @@
     </span>
   </a>
   
-    <nav class="md-nav" aria-label="Linux">
-      <ul class="md-nav__list">
-        
-          <li class="md-nav__item">
-  <a href="#precompiled-packages" class="md-nav__link">
-    <span class="md-ellipsis">
-      Precompiled packages
-    </span>
-  </a>
-  
-</li>
-        
-      </ul>
-    </nav>
-  
 </li>
         
           <li class="md-nav__item">
@@ -833,22 +731,73 @@
     </span>
   </a>
   
-    <nav class="md-nav" aria-label="Windows">
-      <ul class="md-nav__list">
+</li>
         
           <li class="md-nav__item">
-  <a href="#precompiled-packages_1" class="md-nav__link">
+  <a href="#macos" class="md-nav__link">
     <span class="md-ellipsis">
-      Precompiled packages
+      macOS
     </span>
   </a>
   
 </li>
         
           <li class="md-nav__item">
-  <a href="#visual-studio" class="md-nav__link">
+  <a href="#freebsd" class="md-nav__link">
     <span class="md-ellipsis">
-      Visual Studio
+      FreeBSD
+    </span>
+  </a>
+  
+</li>
+        
+      </ul>
+    </nav>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#building-from-source" class="md-nav__link">
+    <span class="md-ellipsis">
+      Building from source
+    </span>
+  </a>
+  
+    <nav class="md-nav" aria-label="Building from source">
+      <ul class="md-nav__list">
+        
+          <li class="md-nav__item">
+  <a href="#linux-and-macos" class="md-nav__link">
+    <span class="md-ellipsis">
+      Linux and macOS
+    </span>
+  </a>
+  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#windows_1" class="md-nav__link">
+    <span class="md-ellipsis">
+      Windows
+    </span>
+  </a>
+  
+    <nav class="md-nav" aria-label="Windows">
+      <ul class="md-nav__list">
+        
+          <li class="md-nav__item">
+  <a href="#visual-studio-native-windows-abi" class="md-nav__link">
+    <span class="md-ellipsis">
+      Visual Studio &amp; native Windows ABI
+    </span>
+  </a>
+  
+</li>
+        
+          <li class="md-nav__item">
+  <a href="#mingw-gnu-abi" class="md-nav__link">
+    <span class="md-ellipsis">
+      MinGW &amp; GNU ABI
     </span>
   </a>
   
@@ -864,42 +813,9 @@
 </li>
         
           <li class="md-nav__item">
-  <a href="#mingw-import-library" class="md-nav__link">
+  <a href="#generating-an-import-library" class="md-nav__link">
     <span class="md-ellipsis">
-      MinGW import library
-    </span>
-  </a>
-  
-</li>
-        
-      </ul>
-    </nav>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#mac-osx" class="md-nav__link">
-    <span class="md-ellipsis">
-      Mac OSX
-    </span>
-  </a>
-  
-    <nav class="md-nav" aria-label="Mac OSX">
-      <ul class="md-nav__list">
-        
-          <li class="md-nav__item">
-  <a href="#precompiled-packages_2" class="md-nav__link">
-    <span class="md-ellipsis">
-      Precompiled packages
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-on-apple-m1" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build on Apple M1
+      Generating an import library
     </span>
   </a>
   
@@ -921,72 +837,18 @@
       <ul class="md-nav__list">
         
           <li class="md-nav__item">
-  <a href="#prerequisites_1" class="md-nav__link">
+  <a href="#building-for-armv7" class="md-nav__link">
     <span class="md-ellipsis">
-      Prerequisites
+      Building for ARMV7
     </span>
   </a>
   
 </li>
         
           <li class="md-nav__item">
-  <a href="#building-with-android-ndk-using-clang-compiler" class="md-nav__link">
+  <a href="#building-for-armv8" class="md-nav__link">
     <span class="md-ellipsis">
-      Building with android NDK using clang compiler
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-armv7-with-clang" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build ARMV7 with clang
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#build-armv8-with-clang" class="md-nav__link">
-    <span class="md-ellipsis">
-      Build ARMV8 with clang
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#alternative-script-which-was-tested-on-osx-with-ndk2136528147" class="md-nav__link">
-    <span class="md-ellipsis">
-      Alternative script which was tested on OSX with NDK(21.3.6528147)
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#building-openblas-with-very-old-gcc-based-versions-of-the-ndk-without-fortran" class="md-nav__link">
-    <span class="md-ellipsis">
-      Building OpenBLAS with very old gcc-based versions of the NDK, without Fortran
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#building-openblas-with-fortran" class="md-nav__link">
-    <span class="md-ellipsis">
-      Building OpenBLAS with Fortran
-    </span>
-  </a>
-  
-</li>
-        
-          <li class="md-nav__item">
-  <a href="#linking-openblas-0219-and-earlier-for-armv7" class="md-nav__link">
-    <span class="md-ellipsis">
-      Linking OpenBLAS (0.2.19 and earlier) for ARMV7
+      Building for ARMV8
     </span>
   </a>
   
@@ -1016,7 +878,7 @@
 </li>
         
           <li class="md-nav__item">
-  <a href="#freebsd" class="md-nav__link">
+  <a href="#freebsd_1" class="md-nav__link">
     <span class="md-ellipsis">
       FreeBSD
     </span>
@@ -1057,512 +919,630 @@
 
 
 <h1 id="install-openblas">Install OpenBLAS</h1>
+<p>OpenBLAS can be installed through package managers or from source. If you only
+want to use OpenBLAS rather than make changes to it, we recommend installing a
+pre-built binary package with your package manager of choice.</p>
+<p>This page contains an overview of installing with package managers as well as
+from source. For the latter, see <a href="#building-from-source">further down on this page</a>.</p>
+<h2 id="installing-with-a-package-manager">Installing with a package manager</h2>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>Lists of precompiled packages are not comprehensive, is not meant to validate nor endorse a particular third-party build over others, and may not always lead to the newest version</p>
+<p>Almost every package manager provides OpenBLAS packages; the list on this
+page is not comprehensive. If your package manager of choice isn't shown
+here, please search its package database for <code>openblas</code> or <code>libopenblas</code>.</p>
 </div>
-<h2 id="quick-install">Quick install</h2>
-<p>Precompiled packages have recently become available for a number of platforms through their normal installation procedures, so for users of desktop devices at least, the instructions below are mostly relevant when you want to try the most recent development snapshot from git. See your platform's relevant "Precompiled packages" section.</p>
-<p>The <a href="https://github.com/conda-forge">Conda-Forge</a> project maintains packages for the conda package manager at <a href="https://github.com/conda-forge/openblas-feedstock">https://github.com/conda-forge/openblas-feedstock</a>.</p>
-<h2 id="source">Source</h2>
-<p>Download the latest <a href="https://github.com/xianyi/OpenBLAS/releases">stable version</a> from release page.</p>
-<h2 id="platforms">Platforms</h2>
 <h3 id="linux">Linux</h3>
-<p>Just type <code>make</code> to compile the library.</p>
-<p>Notes:</p>
-<ul>
-<li>OpenBLAS doesn't support g77. Please use gfortran or other Fortran compilers. e.g. <code>make FC=gfortran</code>.</li>
-<li>When building in an emulator (KVM,QEMU etc.) make sure that the combination of CPU features exposed to
-  the virtual environment matches that of an existing CPU to allow detection of the cpu model to succeed. 
-  (With qemu, this can be done by passing <code>-cpu host</code> or a supported model name at invocation)</li>
-</ul>
-<h4 id="precompiled-packages">Precompiled packages</h4>
-<h5 id="debianubuntumintkali">Debian/Ubuntu/Mint/Kali</h5>
-<p>OpenBLAS package is available in default repositories and can act as default BLAS in system</p>
-<p>Example installation commands:
-<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>apt<span class="w"> </span>update
-$<span class="w"> </span>apt<span class="w"> </span>search<span class="w"> </span>openblas
+<p>On Linux, OpenBLAS can be installed with the system package manager, or with a
+package manager like <a href="https://docs.conda.io/en/latest/">Conda</a>
+(or alternative package managers for the conda-forge ecosystem, like
+<a href="https://mamba.readthedocs.io/en/latest/">Mamba</a>,
+<a href="https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html">Micromamba</a>,
+or <a href="https://pixi.sh/latest/#windows-installer">Pixi</a>),
+<a href="https://spack.io/">Spack</a>, or <a href="https://nixos.org/">Nix</a>. For the latter set of
+tools, the package name in all cases is <code>openblas</code>. Since package management in
+quite a few of these tools is declarative (i.e., managed by adding <code>openblas</code>
+to a metadata file describing the dependencies for your project or
+environment), we won't attempt to give detailed instructions for these tools here.</p>
+<p>Linux distributions typically split OpenBLAS up in two packages: one containing
+the library itself (typically named <code>openblas</code> or <code>libopenblas</code>), and one containing headers,
+pkg-config and CMake files (typically named the same as the package for the
+library with <code>-dev</code> or <code>-devel</code> appended; e.g., <code>openblas-devel</code>). Please keep
+in mind that if you want to install OpenBLAS in order to use it directly in
+your own project, you will need to install both of those packages.</p>
+<p>Distro-specific installation commands:</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="1:4"><input checked="checked" id="__tabbed_1_1" name="__tabbed_1" type="radio" /><input id="__tabbed_1_2" name="__tabbed_1" type="radio" /><input id="__tabbed_1_3" name="__tabbed_1" type="radio" /><input id="__tabbed_1_4" name="__tabbed_1" type="radio" /><div class="tabbed-labels"><label for="__tabbed_1_1">Debian/Ubuntu/Mint/Kali</label><label for="__tabbed_1_2">openSUSE/SLE</label><label for="__tabbed_1_3">Fedora/CentOS/RHEL</label><label for="__tabbed_1_4">Arch/Manjaro/Antergos</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<p><div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>apt<span class="w"> </span>update
 $<span class="w"> </span>sudo<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>libopenblas-dev
-$<span class="w"> </span>sudo<span class="w"> </span>update-alternatives<span class="w"> </span>--config<span class="w"> </span>libblas.so.3
 </code></pre></div>
- Alternatively, if distributor's package proves unsatisfactory, you may try latest version of OpenBLAS, <a href="../faq/#debianlts">Following guide in OpenBLAS FAQ</a></p>
-<h5 id="opensusesle">openSuSE/SLE</h5>
-<p>Recent OpenSUSE versions include OpenBLAS in default repositories and also permit OpenBLAS to act as replacement of system-wide BLAS.</p>
-<p>Example installation commands:
-<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>zypper<span class="w"> </span>ref
-$<span class="w"> </span>zypper<span class="w"> </span>se<span class="w"> </span>openblas
-$<span class="w"> </span>sudo<span class="w"> </span>zypper<span class="w"> </span><span class="k">in</span><span class="w"> </span>openblas-devel
-$<span class="w"> </span>sudo<span class="w"> </span>update-alternatives<span class="w"> </span>--config<span class="w"> </span>libblas.so.3
+OpenBLAS can be configured as the default BLAS through the <code>update-alternatives</code> mechanism:</p>
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>update-alternatives<span class="w"> </span>--config<span class="w"> </span>libblas.so.3
 </code></pre></div>
-Should you be using older OpenSUSE or SLE that provides no OpenBLAS, you can attach optional or experimental openSUSE repository as a new package source to acquire recent build of OpenBLAS following <a href="https://software.opensuse.org/package/openblas">instructions on openSUSE software site</a></p>
-<h5 id="fedoracentosrhel">Fedora/CentOS/RHEL</h5>
-<p>Fedora provides OpenBLAS in default installation repositories.</p>
-<p>To install it try following:
-<div class="highlight"><pre><span></span><code>$<span class="w"> </span>dnf<span class="w"> </span>search<span class="w"> </span>openblas
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>zypper<span class="w"> </span>refresh
+$<span class="w"> </span>sudo<span class="w"> </span>zypper<span class="w"> </span>install<span class="w"> </span>openblas-devel
+</code></pre></div>
+<p>OpenBLAS can be configured as the default BLAS through the <code>update-alternatives</code> mechanism:
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>update-alternatives<span class="w"> </span>--config<span class="w"> </span>libblas.so.3
+</code></pre></div></p>
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>dnf<span class="w"> </span>check-update
 $<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>openblas-devel
 </code></pre></div>
-For CentOS/RHEL/Scientific Linux packages are provided via <a href="https://fedoraproject.org/wiki/EPEL">Fedora EPEL repository</a></p>
-<p>After adding repository and repository keys installation is pretty straightforward:
-<div class="highlight"><pre><span></span><code>$<span class="w"> </span>yum<span class="w"> </span>search<span class="w"> </span>openblas
-$<span class="w"> </span>yum<span class="w"> </span>install<span class="w"> </span>openblas-devel
+<div class="admonition warning">
+<p class="admonition-title">Warning</p>
+<p>Fedora does not ship the pkg-config files for OpenBLAS. Instead, it wants you to
+link against <a href="https://www.mpi-magdeburg.mpg.de/projects/flexiblas">FlexiBLAS</a> (which
+uses OpenBLAS by default as its backend on Fedora), which you can install with:</p>
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>flexiblas-devel
 </code></pre></div>
-No alternatives mechanism is provided for BLAS, and packages in system repositories are linked against NetLib BLAS or ATLAS BLAS libraries. You may wish to re-package RPMs to use OpenBLAS instead <a href="https://fedoraproject.org/wiki/How_to_create_an_RPM_package">as described here</a></p>
-<h5 id="mageia">Mageia</h5>
-<p>Mageia offers ATLAS and NetLIB LAPACK in base repositories.
-You can build your own OpenBLAS replacement, and once installed in /opt
-TODO: populate /usr/lib64 /usr/include accurately to replicate netlib with update-alternatives</p>
-<h5 id="archmanjaroantergos">Arch/Manjaro/Antergos</h5>
+</div>
+<p>For CentOS and RHEL, OpenBLAS packages are provided via the <a href="https://fedoraproject.org/wiki/EPEL">Fedora EPEL repository</a>.
+After adding that repository and its repository keys, you can install
+<code>openblas-devel</code> with either <code>dnf</code> or <code>yum</code>.</p>
+</div>
+<div class="tabbed-block">
 <div class="highlight"><pre><span></span><code>$<span class="w"> </span>sudo<span class="w"> </span>pacman<span class="w"> </span>-S<span class="w"> </span>openblas
 </code></pre></div>
+</div>
+</div>
+</div>
 <h3 id="windows">Windows</h3>
-<p>The precompiled binaries available with each release (in <a href="https://github.com/xianyi/OpenBLAS/releases">https://github.com/xianyi/OpenBLAS/releases</a>) are
-created with MinGW using an option list of 
-"NUM_THREADS=64 TARGET=GENERIC DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 CONSISTENT_FPCSR=1" - they should work on
-any x86 or x86_64 computer. The zip archive contains the include files, static and dll libraries as well
-as configuration files for getting them found via CMAKE or pkgconfig - just create a suitable folder for
-your OpenBLAS installation and unzip it there. (Note that you will need to edit the provided openblas.pc
-and OpenBLASConfig.cmake to reflect the installation path on your computer, as distributed they have "win"
-or "win64" reflecting the local paths on the system they were built on). Some programs will expect the DLL
-name to be lapack.dll, blas.dll, or (in the case of the statistics package "R") even Rblas.dll to act as a
-direct replacement for whatever other implementation of BLAS and LAPACK they use by default. Just copy the
-openblas.dll to the desired name(s).
-Note that the provided binaries are built with INTERFACE64=0, meaning they use standard 32bit integers for
-array indexing and the like (as is the default for most if not all BLAS and LAPACK implementations). If the
-documentation of whatever program you are using with OpenBLAS mentions 64bit integers (INTERFACE64=1) for
-addressing huge matrix sizes, you will need to build OpenBLAS from source (or open an issue ticket to make
-the demand for such a precompiled build known).</p>
-<h4 id="precompiled-packages_1">Precompiled packages</h4>
-<ul>
-<li><a href="http://sourceforge.net/projects/openblas/files">http://sourceforge.net/projects/openblas/files</a></li>
-<li><a href="https://www.nuget.org/packages?q=openblas">https://www.nuget.org/packages?q=openblas</a></li>
-</ul>
-<h4 id="visual-studio">Visual Studio</h4>
-<p>As of OpenBLAS v0.2.15, we support MinGW and Visual Studio (using CMake to generate visual studio solution files &ndash; note that you will need at least version 3.11 of CMake for linking to work correctly) to build OpenBLAS on Windows.</p>
-<p>Note that you need a Fortran compiler if you plan to build and use the LAPACK functions included with OpenBLAS. The sections below describe using either <code>flang</code> as an add-on to clang/LLVM or <code>gfortran</code> as part of MinGW for this purpose. If you want to use the Intel Fortran compiler <code>ifort</code> for this, be sure to also use the Intel C compiler <code>icc</code> for building the C parts, as the ABI imposed by <code>ifort</code> is incompatible with <code>msvc</code>.</p>
-<h5 id="1-native-msvc-abi">1. Native (MSVC) ABI</h5>
-<p>A fully-optimized OpenBLAS that can be statically or dynamically linked to your application can currently be built for the 64-bit architecture with the LLVM compiler infrastructure. We're going to use Miniconda3 to grab all of the tools we need, since some of them are in an experimental status. Before you begin, you'll need to have Microsoft Visual Studio 2015 or newer installed.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="2:3"><input checked="checked" id="__tabbed_2_1" name="__tabbed_2" type="radio" /><input id="__tabbed_2_2" name="__tabbed_2" type="radio" /><input id="__tabbed_2_3" name="__tabbed_2" type="radio" /><div class="tabbed-labels"><label for="__tabbed_2_1">Conda-forge</label><label for="__tabbed_2_2">vcpkg</label><label for="__tabbed_2_3">OpenBLAS releases</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<p>OpenBLAS can be installed with <code>conda</code> (or <code>mamba</code>, <code>micromamba</code>, or
+<code>pixi</code>) from conda-forge:
+<div class="highlight"><pre><span></span><code>conda install openblas
+</code></pre></div></p>
+<p>Conda-forge provides a method for switching the default BLAS implementation
+used by all packages. To use that for OpenBLAS, install <code>libblas=*=*openblas</code>
+(see <a href="https://conda-forge.org/docs/maintainer/knowledge_base/#switching-blas-implementation">the docs on this mechanism</a>
+for more details).</p>
+</div>
+<div class="tabbed-block">
+<p>OpenBLAS can be installed with vcpkg:
+<div class="highlight"><pre><span></span><code># In classic mode:
+vcpkg install openblas
+
+# Or in manifest mode:
+vcpkg add port openblas
+</code></pre></div></p>
+</div>
+<div class="tabbed-block">
+<p>Windows is the only platform for which binaries are made available by the
+OpenBLAS project itself. They can be downloaded from the GitHub
+Releases](https://github.com/OpenMathLib/OpenBLAS/releases) page. These
+binaries are built with MinGW, using the following build options:
+<div class="highlight"><pre><span></span><code>NUM_THREADS=64 TARGET=GENERIC DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 CONSISTENT_FPCSR=1 INTERFACE=0
+</code></pre></div>
+There are separate packages for x86-64 and x86. The zip archive contains
+the include files, static and shared libraries, as well as configuration
+files for getting them found via CMake or pkg-config. To use these
+binaries, create a suitable folder for your OpenBLAS installation and unzip
+the <code>.zip</code> bundle there (note that you will need to edit the provided
+<code>openblas.pc</code> and <code>OpenBLASConfig.cmake</code> to reflect the installation path
+on your computer, as distributed they have "win" or "win64" reflecting the
+local paths on the system they were built on).</p>
+<p>Note that the same binaries can be downloaded
+<a href="http://sourceforge.net/projects/openblas/files">from SourceForge</a>; this is
+mostly of historical interest.</p>
+</div>
+</div>
+</div>
+<h3 id="macos">macOS</h3>
+<p>To install OpenBLAS with a package manager on macOS, run:</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="3:3"><input checked="checked" id="__tabbed_3_1" name="__tabbed_3" type="radio" /><input id="__tabbed_3_2" name="__tabbed_3" type="radio" /><input id="__tabbed_3_3" name="__tabbed_3" type="radio" /><div class="tabbed-labels"><label for="__tabbed_3_1">Homebrew</label><label for="__tabbed_3_2">MacPorts</label><label for="__tabbed_3_3">Conda-forge</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code>%<span class="w"> </span>brew<span class="w"> </span>install<span class="w"> </span>openblas
+</code></pre></div>
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code>%<span class="w"> </span>sudo<span class="w"> </span>port<span class="w"> </span>install<span class="w"> </span>OpenBLAS-devel
+</code></pre></div>
+</div>
+<div class="tabbed-block">
+<div class="highlight"><pre><span></span><code>%<span class="w"> </span>conda<span class="w"> </span>install<span class="w"> </span>openblas
+</code></pre></div>
+<p>Conda-forge provides a method for switching the default BLAS implementation
+used by all packages. To use that for OpenBLAS, install <code>libblas=*=*openblas</code>
+(see <a href="https://conda-forge.org/docs/maintainer/knowledge_base/#switching-blas-implementation">the docs on this mechanism</a>
+for more details).</p>
+</div>
+</div>
+</div>
+<h3 id="freebsd">FreeBSD</h3>
+<p>You can install OpenBLAS from the FreeBSD <a href="https://www.freebsd.org/ports/index.html">Ports collection</a>:
+<div class="highlight"><pre><span></span><code>pkg install openblas
+</code></pre></div></p>
+<h2 id="building-from-source">Building from source</h2>
+<p>We recommend download the latest <a href="https://github.com/OpenMathLib/OpenBLAS/releases">stable version</a>
+from the GitHub Releases page, or checking it out from a git tag, rather than a
+dev version from the <code>develop</code> branch.</p>
+<div class="admonition tip">
+<p class="admonition-title">Tip</p>
+<p>The User manual contains <a href="../user_manual/#compiling-openblas">a section with detailed information on compiling OpenBLAS</a>,
+including how to customize builds and how to cross-compile. Please read
+that documentation first. This page contains only platform-specific build
+information, and assumes you already understand the general build system
+invocations to build OpenBLAS, with the specific build options you want to
+control multi-threading and other non-platform-specific behavior).</p>
+</div>
+<h3 id="linux-and-macos">Linux and macOS</h3>
+<p>Ensure you have C and Fortran compilers installed, then simply type <code>make</code> to compile the library.
+There are no other build dependencies, nor unusual platform-specific
+environment variables to set or other system setup to do.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>When building in an emulator (KVM, QEMU, etc.), please make sure that the combination of CPU features exposed to
+the virtual environment matches that of an existing CPU to allow detection of the CPU model to succeed.
+(With <code>qemu</code>, this can be done by passing <code>-cpu host</code> or a supported model name at invocation).</p>
+</div>
+<h3 id="windows_1">Windows</h3>
+<p>We support building OpenBLAS with either MinGW or Visual Studio on Windows.
+Using MSVC will yield an OpenBLAS build with the Windows platform-native ABI.
+Using MinGW will yield a different ABI. We'll describe both methods in detail
+in this section, since the process for each is quite different.</p>
+<h4 id="visual-studio-native-windows-abi">Visual Studio &amp; native Windows ABI</h4>
+<p>For Visual Studio, you can use CMake to generate Visual Studio solution files;
+note that you will need at least CMake 3.11 for linking to work correctly).</p>
+<p>Note that you need a Fortran compiler if you plan to build and use the LAPACK
+functions included with OpenBLAS. The sections below describe using either
+<code>flang</code> as an add-on to clang/LLVM or <code>gfortran</code> as part of MinGW for this
+purpose. If you want to use the Intel Fortran compiler (<code>ifort</code> or <code>ifx</code>) for
+this, be sure to also use the Intel C compiler (<code>icc</code> or <code>icx</code>) for building
+the C parts, as the ABI imposed by <code>ifort</code> is incompatible with MSVC</p>
+<p>A fully-optimized OpenBLAS that can be statically or dynamically linked to your
+application can currently be built for the 64-bit architecture with the LLVM
+compiler infrastructure. We're going to use <a href="https://docs.anaconda.com/miniconda/">Miniconda3</a>
+to grab all of the tools we need, since some of them are in an experimental
+status. Before you begin, you'll need to have Microsoft Visual Studio 2015 or
+newer installed.</p>
 <ol>
-<li>Install Miniconda3 for 64 bits using <code>winget install --id Anaconda.Miniconda3</code> or easily download from <a href="https://docs.conda.io/en/latest/miniconda.html">conda.io</a>.</li>
-<li>Open the "Anaconda Command Prompt," now available in the Start Menu, or at <code>%USERPROFILE%\miniconda3\shell\condabin\conda-hook.ps1</code>.</li>
-<li>In that command prompt window, use <code>cd</code> to change to the directory where you want to build OpenBLAS</li>
-<li>Now install all of the tools we need:</li>
-</ol>
-<div class="highlight"><pre><span></span><code>conda update -n base conda
+<li>Install Miniconda3 for 64-bit Windows using <code>winget install --id Anaconda.Miniconda3</code>,
+   or easily download from <a href="https://docs.conda.io/en/latest/miniconda.html">conda.io</a>.</li>
+<li>Open the "Anaconda Command Prompt" now available in the Start Menu, or at <code>%USERPROFILE%\miniconda3\shell\condabin\conda-hook.ps1</code>.</li>
+<li>In that command prompt window, use <code>cd</code> to change to the directory where you want to build OpenBLAS.</li>
+<li>Now install all of the tools we need:
+   <div class="highlight"><pre><span></span><code>conda update -n base conda
 conda config --add channels conda-forge
 conda install -y cmake flang clangdev perl libflang ninja
-</code></pre></div>
-<ol>
-<li>Still in the Anaconda Command Prompt window, activate the MSVC environment for 64 bits with <code>vcvarsall x64</code>. On Windows 11 with Visual Studio 2022, this would be done by invoking:</li>
-</ol>
+</code></pre></div></li>
+<li>
+<p>Still in the Anaconda Command Prompt window, activate the 64-bit MSVC environment with <code>vcvarsall x64</code>.
+    On Windows 11 with Visual Studio 2022, this would be done by invoking:</p>
 <div class="highlight"><pre><span></span><code><span class="s2">&quot;c:\Program Files\Microsoft Visual Studio\2022\Preview\vc\Auxiliary\Build\vcvars64.bat&quot;</span>
 </code></pre></div>
-<p>With VS2019, the command should be the same &ndash; except for the year number, obviously. For other/older versions of MSVC,
-   the VS documentation or a quick search on the web should turn up the exact wording you need.</p>
-<p>Confirm that the environment is active by typing <code>link</code> &ndash; this should return a long list of possible options for the <code>link</code> command. If it just 
-   returns "command not found" or similar, review and retype the call to vcvars64.bat.
-   <strong>NOTE:</strong> if you are working from a Visual Studio Command prompt window instead (so that you do not have to do the vcvars call), you need to invoke 
-   <code>conda activate</code> so that CONDA_PREFIX etc. get set up correctly before proceeding to step 6. Failing to do so will lead to link errors like 
-   libflangmain.lib not getting found later in the build.</p>
-<ol>
-<li>Now configure the project with CMake. Starting in the project directory, execute the following:</li>
-</ol>
-<div class="highlight"><pre><span></span><code>set &quot;LIB=%CONDA_PREFIX%\Library\lib;%LIB%&quot;
+<p>With VS2019, the command should be the same (except for the year number of course).
+For other versions of MSVC, please check the Visual Studio documentation for
+exactly how to invoke the <code>vcvars64.bat</code> script.</p>
+<p>Confirm that the environment is active by typing <code>link</code>. This should return
+a long list of possible options for the <code>link</code> command. If it just returns
+<em>"command not found"</em> or similar, review and retype the call to <code>vcvars64.bat</code>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>if you are working from a Visual Studio command prompt window instead
+(so that you do not have to do the <code>vcvars</code> call), you need to invoke
+<code>conda activate</code> so that <code>CONDA_PREFIX</code> etc. get set up correctly before
+proceeding to step 6. Failing to do so will lead to link errors like
+<code>libflangmain.lib</code> not getting found later in the build.</p>
+</div>
+</li>
+<li>
+<p>Now configure the project with CMake. Starting in the project directory, execute the following:
+    <div class="highlight"><pre><span></span><code>set &quot;LIB=%CONDA_PREFIX%\Library\lib;%LIB%&quot;
 set &quot;CPATH=%CONDA_PREFIX%\Library\include;%CPATH%&quot;
 mkdir build
 cd build
 cmake .. -G &quot;Ninja&quot; -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_C_COMPILER=clang-cl -DCMAKE_Fortran_COMPILER=flang -DCMAKE_MT=mt -DBUILD_WITHOUT_LAPACK=no -DNOFORTRAN=0 -DDYNAMIC_ARCH=ON -DCMAKE_BUILD_TYPE=Release
-</code></pre></div>
-<p>You may want to add further options in the <code>cmake</code> command here &ndash; for instance, the default only produces a static .lib version of the library. If you would rather have a DLL, add -DBUILD_SHARED_LIBS=ON above. Note that this step only creates some command files and directories, the actual build happens next.</p>
-<ol>
-<li>Build the project:</li>
-</ol>
+</code></pre></div></p>
+<p>You may want to add further options in the <code>cmake</code> command here. For
+instance, the default only produces a static <code>.lib</code> version of the library.
+If you would rather have a DLL, add <code>-DBUILD_SHARED_LIBS=ON</code> above. Note that
+this step only creates some command files and directories, the actual build
+happens next.</p>
+</li>
+<li>
+<p>Build the project:</p>
 <p><div class="highlight"><pre><span></span><code>cmake --build . --config Release
 </code></pre></div>
-   This step will create the OpenBLAS library in the "lib" directory, and various build-time tests in the <code>test</code>, <code>ctest</code> and <code>openblas_utest</code> directories. However it will not separate the header files you might need for building your own programs from those used internally. To put all relevant files in a more convenient arrangement, run the next step.</p>
-<ol>
-<li>Install all relevant files created by the build</li>
-</ol>
+This step will create the OpenBLAS library in the <code>lib</code> directory, and
+various build-time tests in the <code>test</code>, <code>ctest</code> and <code>openblas_utest</code>
+directories. However it will not separate the header files you might need
+for building your own programs from those used internally. To put all
+relevant files in a more convenient arrangement, run the next step.</p>
+</li>
+<li>
+<p>Install all relevant files created by the build:</p>
 <p><div class="highlight"><pre><span></span><code>cmake --install . --prefix c:\opt -v
 </code></pre></div>
-   This will copy all files that are needed for building and running your own programs with OpenBLAS to the given location, creating appropriate subdirectories for the individual kinds of files. In the case of "C:\opt" as given above, this would be C:\opt\include\openblas for the header files, 
-   C:\opt\bin for the libopenblas.dll and C:\opt\lib for the static library. C:\opt\share holds various support files that enable other cmake-based build scripts to find OpenBLAS automatically.</p>
-<h6 id="visual-studio-2017-c2017-standard">Visual studio 2017+ (C++2017 standard)</h6>
-<p>In newer visual studio versions, Microsoft has changed <a href="https://docs.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support?view=msvc-170#types-used-in-complex-math">how it handles complex types</a>. Even when using a precompiled version of OpenBLAS, you might need to define <code>LAPACK_COMPLEX_CUSTOM</code> in order to define complex types properly for MSVC. For example, some variant of the following might help:</p>
-<div class="highlight"><pre><span></span><code>#if defined(_MSC_VER)
-    #include &lt;complex.h&gt;
-    #define LAPACK_COMPLEX_CUSTOM
-    #define lapack_complex_float _Fcomplex
-    #define lapack_complex_double _Dcomplex
-#endif
+This will copy all files that are needed for building and running your own
+programs with OpenBLAS to the given location, creating appropriate
+subdirectories for the individual kinds of files. In the case of <code>C:\opt</code> as
+given above, this would be:</p>
+<ul>
+<li><code>C:\opt\include\openblas</code> for the header files, </li>
+<li><code>C:\opt\bin</code> for the <code>libopenblas.dll</code> shared library,</li>
+<li><code>C:\opt\lib</code> for the static library, and</li>
+<li><code>C:\opt\share</code> holds various support files that enable other cmake-based
+  build scripts to find OpenBLAS automatically.</li>
+</ul>
+</li>
+</ol>
+<div class="admonition tip">
+<p class="admonition-title">Change in complex types for Visual Studio 2017 and up</p>
+<p>In newer Visual Studio versions, Microsoft has changed
+<a href="https://docs.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support?view=msvc-170#types-used-in-complex-math">how it handles complex types</a>.
+Even when using a precompiled version of OpenBLAS, you might need to define
+<code>LAPACK_COMPLEX_CUSTOM</code> in order to define complex types properly for MSVC.
+For example, some variant of the following might help:</p>
+<div class="highlight"><pre><span></span><code><span class="cp">#if defined(_MSC_VER)</span>
+<span class="w">    </span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;complex.h&gt;</span>
+<span class="w">    </span><span class="cp">#define LAPACK_COMPLEX_CUSTOM</span>
+<span class="w">    </span><span class="cp">#define lapack_complex_float _Fcomplex</span>
+<span class="w">    </span><span class="cp">#define lapack_complex_double _Dcomplex</span>
+<span class="cp">#endif</span>
 </code></pre></div>
-<p>For reference, see https://github.com/xianyi/OpenBLAS/issues/3661, https://github.com/Reference-LAPACK/lapack/issues/683, and https://stackoverflow.com/questions/47520244/using-openblas-lapacke-in-visual-studio.</p>
-<h6 id="cmake-and-visual-studio">CMake and Visual Studio</h6>
-<p>To build OpenBLAS for the 32-bit architecture, you'll need to use the builtin Visual Studio compilers.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>This method may produce binaries which demonstrate significantly lower performance than those built with the other methods. (The Visual Studio compiler does not support the dialect of assembly used in the cpu-specific optimized files, so only the "generic" TARGET which is
-written in pure C will get built. For the same reason it is not possible (and not necessary) to use -DDYNAMIC_ARCH=ON in a Visual Studio build) You may consider building for the 32-bit architecture using the GNU (MinGW) ABI.</p>
+<p>For reference, see
+<a href="https://github.com/OpenMathLib/OpenBLAS/issues/3661">openblas#3661</a>,
+<a href="https://github.com/Reference-LAPACK/lapack/issues/683">lapack#683</a>, and
+<a href="https://stackoverflow.com/questions/47520244/using-openblas-lapacke-in-visual-studio">this Stack Overflow question</a>.</p>
 </div>
-<h6 id="1-install-cmake-at-windows"># 1. Install CMake at Windows</h6>
-<h6 id="2-use-cmake-to-generate-visual-studio-solution-files"># 2. Use CMake to generate Visual Studio solution files</h6>
+<div class="admonition warning">
+<p class="admonition-title">Building 32-bit binaries with MSVC</p>
+<p>This method may produce binaries which demonstrate significantly lower
+performance than those built with the other methods. The Visual Studio
+compiler does not support the dialect of assembly used in the cpu-specific
+optimized files, so only the "generic" <code>TARGET</code> which is written in pure C
+will get built. For the same reason it is not possible (and not necessary)
+to use <code>-DDYNAMIC_ARCH=ON</code> in a Visual Studio build. You may consider
+building for the 32-bit architecture using the GNU (MinGW) ABI instead.</p>
+</div>
+<h5 id="cmake-visual-studio-integration">CMake &amp; Visual Studio integration</h5>
+<p>To generate Visual Studio solution files, ensure CMake is installed and then run:
 <div class="highlight"><pre><span></span><code># Do this from Powershell so cmake can find visual studio
 cmake -G &quot;Visual Studio 14 Win64&quot; -DCMAKE_BUILD_TYPE=Release .
-</code></pre></div>
-<h6 id="build-the-solution-at-visual-studio">Build the solution at Visual Studio</h6>
-<p>Note that this step depends on perl, so you'll need to install perl for windows, and put perl on your path so VS can start perl (http://stackoverflow.com/questions/3051049/active-perl-installation-on-windows-operating-system).</p>
-<p>Step 2 will build the OpenBLAS solution, open it in VS, and build the projects. Note that the dependencies do not seem to be automatically configured: if you try to build libopenblas directly, it will fail with a message saying that some .obj files aren't found, but if you build the projects libopenblas depends on before building libopenblas, the build will succeed.</p>
+</code></pre></div></p>
+<p>To then build OpenBLAS using those solution files from within Visual Studio, we
+also need Perl. Please install it and ensure it's on the <code>PATH</code> (see, e.g.,
+<a href="http://stackoverflow.com/questions/3051049/active-perl-installation-on-windows-operating-system">this Stack Overflow question for how</a>).</p>
+<p>If you build from within Visual Studio, the dependencies may not be
+automatically configured: if you try to build <code>libopenblas</code> directly, it may
+fail with a message saying that some <code>.obj</code> files aren't found. If this
+happens, you can work around the problem by building the projects that
+<code>libopenblas</code> depends on before building <code>libopenblas</code> itself.</p>
 <h6 id="build-openblas-for-universal-windows-platform">Build OpenBLAS for Universal Windows Platform</h6>
-<p>OpenBLAS can be built for use on the <a href="https://en.wikipedia.org/wiki/Universal_Windows_Platform">Universal Windows Platform</a> using a two step process since commit <a href="https://github.com/xianyi/OpenBLAS/commit/c66b842d66c5516e52804bf5a0544d18b1da1b44">c66b842</a>.</p>
-<h6 id="1-follow-steps-1-and-2-above-to-build-the-visual-studio-solution-files-for-windows-this-builds-the-helper-executables-which-are-required-when-building-the-openblas-visual-studio-solution-files-for-uwp-in-step-2"># 1. Follow steps 1 and 2 above to build the Visual Studio solution files for Windows. This builds the helper executables which are required when building the OpenBLAS Visual Studio solution files for UWP in step 2.</h6>
-<h6 id="2-remove-the-generated-cmakecachetxt-and-cmakefiles-directory-from-the-openblas-source-directory-and-re-run-cmake-with-the-following-options"># 2. Remove the generated CMakeCache.txt and CMakeFiles directory from the OpenBLAS source directory and re-run CMake with the following options:</h6>
-<div class="highlight"><pre><span></span><code># do this to build UWP compatible solution files
+<p>OpenBLAS can be built targeting <a href="https://en.wikipedia.org/wiki/Universal_Windows_Platform">Universal Windows Platform</a>
+(UWP) like this:</p>
+<ol>
+<li>Follow the steps above to build the Visual Studio solution files for
+    Windows. This builds the helper executables which are required when building
+    the OpenBLAS Visual Studio solution files for UWP in step 2.</li>
+<li>
+<p>Remove the generated <code>CMakeCache.txt</code> and the <code>CMakeFiles</code> directory from
+    the OpenBLAS source directory, then re-run CMake with the following options:</p>
+<p><div class="highlight"><pre><span></span><code># do this to build UWP compatible solution files
 cmake -G &quot;Visual Studio 14 Win64&quot; -DCMAKE_SYSTEM_NAME=WindowsStore -DCMAKE_SYSTEM_VERSION=&quot;10.0&quot; -DCMAKE_SYSTEM_PROCESSOR=AMD64 -DVS_WINRT_COMPONENT=TRUE -DCMAKE_BUILD_TYPE=Release .
 </code></pre></div>
-<h6 id="build-the-solution-with-visual-studio"># Build the solution with Visual Studio</h6>
-<p>This will build the OpenBLAS binaries with the required settings for use with UWP.</p>
-<h5 id="2-gnu-mingw-abi">2. GNU (MinGW) ABI</h5>
-<p>The resulting library can be used in Visual Studio, but it can only be linked dynamically. This configuration has not been thoroughly tested and should be considered experimental.</p>
-<h6 id="incompatible-x86-calling-conventions">Incompatible x86 calling conventions</h6>
-<p>Due to incompatibilities between the calling conventions of MinGW and Visual Studio you will need to make the following modifications ( <strong>32-bit only</strong> ):</p>
-<ol>
-<li>Use the newer GCC 4.7.0. The older GCC (&lt;4.7.0) has an ABI incompatibility for returning aggregate structures larger than 8 bytes with MSVC. </li>
-</ol>
-<h6 id="build-openblas-on-windows-os">Build OpenBLAS on Windows OS</h6>
-<ol>
-<li>Install the MinGW (GCC) compiler suite, either 32-bit (http://www.mingw.org/) or 64-bit (http://mingw-w64.sourceforge.net/). Be sure to install its gfortran package as well (unless you really want to build the BLAS part of OpenBLAS only) and check that gcc and gfortran are the same version &ndash; mixing compilers from different sources or release versions can lead to strange error messages in the linking stage. In addition, please install MSYS with MinGW.</li>
-<li>Build OpenBLAS in the MSYS shell. Usually, you can just type "make". OpenBLAS will detect the compiler and CPU automatically. </li>
-<li>After the build is complete, OpenBLAS will generate the static library "libopenblas.a" and the shared dll library "libopenblas.dll" in the folder. You can type "make PREFIX=/your/installation/path install" to install the library to a certain location.</li>
+3.  Now build the solution with Visual Studio.</p>
+</li>
 </ol>
+<h4 id="mingw-gnu-abi">MinGW &amp; GNU ABI</h4>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>We suggest using official MinGW or MinGW-w64 compilers. A user reported that s/he met <code>Unhandled exception</code> by other compiler suite. https://groups.google.com/forum/#!topic/openblas-users/me2S4LkE55w</p>
+<p>The resulting library from building with MinGW as described below can be
+used in Visual Studio, but it can only be linked dynamically. This
+configuration has not been thoroughly tested and should be considered
+experimental.</p>
 </div>
-<p>Note also that older versions of the alternative builds of mingw-w64 available through http://www.msys2.org may contain a defect that leads to a compilation failure accompanied by the error message
-<div class="highlight"><pre><span></span><code>&lt;command-line&gt;:0:4: error: expected identifier or &#39;(&#39; before numeric constant
-</code></pre></div>
-If you encounter this, please upgrade your msys2 setup or see https://github.com/xianyi/OpenBLAS/issues/1503 for a workaround.</p>
-<h6 id="generate-import-library-before-0210-version">Generate import library (before 0.2.10 version)</h6>
+<p>To build OpenBLAS on Windows with MinGW:</p>
 <ol>
-<li>First, you will need to have the <code>lib.exe</code> tool in the Visual Studio command prompt. </li>
-<li>Open the command prompt and type <code>cd OPENBLAS_TOP_DIR/exports</code>, where OPENBLAS_TOP_DIR is the main folder of your OpenBLAS installation.</li>
-<li>For a 32-bit library, type <code>lib /machine:i386 /def:libopenblas.def</code>. For 64-bit, type <code>lib /machine:X64 /def:libopenblas.def</code>.</li>
-<li>This will generate the import library "libopenblas.lib" and the export library "libopenblas.exp" in OPENBLAS_TOP_DIR/exports. Although these two files have the same name, they are totally different.</li>
+<li>Install the MinGW (GCC) compiler suite, either the 32-bit
+    [MinGW]((http://www.mingw.org/) or the 64-bit
+    <a href="http://mingw-w64.sourceforge.net/">MinGW-w64</a> toolchain. Be sure to install
+    its <code>gfortran</code> package as well (unless you really want to build the BLAS part
+    of OpenBLAS only) and check that <code>gcc</code> and <code>gfortran</code> are the same version.
+    In addition, please install MSYS2 with MinGW.</li>
+<li>Build OpenBLAS in the MSYS2 shell. Usually, you can just type <code>make</code>.
+    OpenBLAS will detect the compiler and CPU automatically. </li>
+<li>After the build is complete, OpenBLAS will generate the static library
+    <code>libopenblas.a</code> and the shared library <code>libopenblas.dll</code> in the folder. You
+    can type <code>make PREFIX=/your/installation/path install</code> to install the
+    library to a certain location.</li>
 </ol>
-<h6 id="generate-import-library-0210-and-after-version">Generate import library (0.2.10 and after version)</h6>
+<p>Note that OpenBLAS will generate the import library <code>libopenblas.dll.a</code> for
+<code>libopenblas.dll</code> by default.</p>
+<p>If you want to generate Windows-native PDB files from a MinGW build, you can
+use the <a href="https://github.com/rainers/cv2pdb">cv2pdb</a> tool to do so.</p>
+<p>To then use the built OpenBLAS shared library in Visual Studio:</p>
 <ol>
-<li>OpenBLAS already generated the import library "libopenblas.dll.a" for "libopenblas.dll".</li>
+<li>Copy the import library (<code>OPENBLAS_TOP_DIR/libopenblas.dll.a</code>) and the
+    shared library (<code>libopenblas.dll</code>) into the same folder (this must be the
+    folder of your project that is going to use the BLAS library. You may need
+    to add <code>libopenblas.dll.a</code> to the linker input list: <code>properties-&gt;Linker-&gt;Input</code>).</li>
+<li>Please follow the Visual Studio documentation about using third-party .dll
+    libraries, and make sure to link against a library for the correct
+    architecture.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></li>
+<li>If you need CBLAS, you should include <code>cblas.h</code> in
+    <code>/your/installation/path/include</code> in Visual Studio. Please see
+    <a href="http://github.com/OpenMathLib/OpenBLAS/issues/95">openblas#95</a> for more details.</li>
 </ol>
-<h6 id="generate-windows-native-pdb-files-from-gccgfortran-build">generate windows native PDB files from gcc/gfortran build</h6>
-<p>Tool to do so is available at https://github.com/rainers/cv2pdb</p>
-<h6 id="use-openblas-dll-library-in-visual-studio">Use OpenBLAS .dll library in Visual Studio</h6>
-<ol>
-<li>Copy the import library (before 0.2.10: "OPENBLAS_TOP_DIR/exports/libopenblas.lib", 0.2.10 and after: "OPENBLAS_TOP_DIR/libopenblas.dll.a") and .dll library "libopenblas.dll" into the same folder(The folder of your project that is going to use the BLAS library. You may need to add the libopenblas.dll.a to the linker input list: properties-&gt;Linker-&gt;Input).</li>
-<li>Please follow the documentation about using third-party .dll libraries in MS Visual Studio 2008 or 2010.  Make sure to link against a library for the correct architecture.  For example, you may receive an error such as "The application was unable to start correctly (0xc000007b)" which typically indicates a mismatch between 32/64-bit libraries.</li>
-</ol>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>If you need CBLAS, you should include cblas.h in /your/installation/path/include in Visual Studio.  Please read <a href="http://github.com/xianyi/OpenBLAS/issues/95">this page</a>.</p>
-</div>
-<h6 id="limitations">Limitations</h6>
+<div class="admonition info">
+<p class="admonition-title">Limitations of using the MinGW build within Visual Studio</p>
 <ul>
-<li>Both static and dynamic linking are supported with MinGW.  With Visual Studio, however, only dynamic linking is supported and so you should use the import library.</li>
-<li>Debugging from Visual Studio does not work because MinGW and Visual Studio have incompatible formats for debug information (PDB vs. DWARF/STABS).  You should either debug with GDB on the command-line or with a visual frontend, for instance <a href="http://www.eclipse.org/cdt/">Eclipse</a> or <a href="http://qt.nokia.com/products/developer-tools/">Qt Creator</a>.</li>
+<li>Both static and dynamic linking are supported with MinGW.  With Visual
+  Studio, however, only dynamic linking is supported and so you should use
+  the import library.</li>
+<li>Debugging from Visual Studio does not work because MinGW and Visual
+  Studio have incompatible formats for debug information (PDB vs.
+  DWARF/STABS).  You should either debug with GDB on the command line or
+  with a visual frontend, for instance <a href="http://www.eclipse.org/cdt/">Eclipse</a> or
+  <a href="http://qt.nokia.com/products/developer-tools/">Qt Creator</a>.</li>
 </ul>
+</div>
 <h4 id="windows-on-arm">Windows on Arm</h4>
-<h5 id="prerequisites">Prerequisites</h5>
-<p>Following tools needs to be installed</p>
-<h6 id="1-download-and-install-clang-for-windows-on-arm">1. Download and install clang for windows on arm</h6>
-<p>Find the latest LLVM build for WoA from <a href="https://releases.llvm.org/">LLVM release page</a></p>
-<p>E.g: LLVM 12 build for WoA64 can be found <a href="https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.0/LLVM-12.0.0-woa64.exe">here</a></p>
-<p>Run the LLVM installer and ensure that LLVM is added to environment PATH.</p>
-<h6 id="2-download-and-install-classic-flang-for-windows-on-arm">2. Download and install classic flang for windows on arm</h6>
-<p>Classic flang is the only available FORTRAN compiler for windows on arm for now and a pre-release build can be found <a href="https://github.com/kaadam/flang/releases/tag/v0.1">here</a></p>
-<p>There is no installer for classic flang and the zip package can be extracted and the path needs to be added to environment PATH.</p>
-<p>E.g: on PowerShell</p>
-<div class="highlight"><pre><span></span><code>$env:Path += &quot;;C:\flang_woa\bin&quot;
-</code></pre></div>
-<h5 id="build">Build</h5>
-<p>The following steps describe how to build the static library for OpenBLAS with and without LAPACK</p>
-<h6 id="1-build-openblas-static-library-with-blas-and-lapack-routines-with-make">1. Build OpenBLAS static library with BLAS and LAPACK routines with Make</h6>
-<p>Following command can be used to build OpenBLAS static library with BLAS and LAPACK routines</p>
+<p>The following tools needs to be installed to build for Windows on Arm (WoA):</p>
+<ul>
+<li>Clang for Windows on Arm.
+    Find the latest LLVM build for WoA from <a href="https://releases.llvm.org/">LLVM release page</a>.
+    E.g: LLVM 12 build for WoA64 can be found <a href="https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.0/LLVM-12.0.0-woa64.exe">here</a>
+    Run the LLVM installer and ensure that LLVM is added to environment PATH.</li>
+<li>Download and install classic Flang for Windows on Arm.
+    Classic Flang is the only available Fortran compiler for Windows on Arm for now.
+    A pre-release build can be found <a href="https://github.com/kaadam/flang/releases/tag/v0.1">here</a>
+    There is no installer for classic flang and the zip package can be
+    extracted and the path needs to be added to environment <code>PATH</code>.
+    E.g., in PowerShell:
+    <div class="highlight"><pre><span></span><code>$env:Path += &quot;;C:\flang_woa\bin&quot;
+</code></pre></div></li>
+</ul>
+<p>The following steps describe how to build the static library for OpenBLAS with and without LAPACK:</p>
+<ol>
+<li>
+<p>Build OpenBLAS static library with BLAS and LAPACK routines with Make:</p>
 <div class="highlight"><pre><span></span><code>$<span class="w"> </span>make<span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;clang-cl&quot;</span><span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span><span class="s2">&quot;clang-cl&quot;</span><span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="s2">&quot;llvm-ar&quot;</span><span class="w"> </span><span class="nv">BUILD_WITHOUT_LAPACK</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">NOFORTRAN</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">DYNAMIC_ARCH</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>ARMV8<span class="w"> </span><span class="nv">ARCH</span><span class="o">=</span>arm64<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">64</span><span class="w"> </span><span class="nv">USE_OPENMP</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">PARALLEL</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">RANLIB</span><span class="o">=</span><span class="s2">&quot;llvm-ranlib&quot;</span><span class="w"> </span><span class="nv">MAKE</span><span class="o">=</span>make<span class="w"> </span><span class="nv">F_COMPILER</span><span class="o">=</span>FLANG<span class="w"> </span><span class="nv">FC</span><span class="o">=</span>FLANG<span class="w"> </span><span class="nv">FFLAGS_NOOPT</span><span class="o">=</span><span class="s2">&quot;-march=armv8-a -cpp&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="s2">&quot;-march=armv8-a -cpp&quot;</span><span class="w"> </span><span class="nv">NEED_PIC</span><span class="o">=</span><span class="m">0</span><span class="w"> </span><span class="nv">HOSTARCH</span><span class="o">=</span>arm64<span class="w"> </span>libs<span class="w"> </span>netlib
 </code></pre></div>
-<h6 id="2-build-static-library-with-blas-routines-using-cmake">2. Build static library with BLAS routines using CMake</h6>
-<p>Classic flang has compatibility issues with cmake hence only BLAS routines can be compiled with CMake</p>
+</li>
+<li>
+<p>Build static library with BLAS routines using CMake:</p>
+<p>Classic Flang has compatibility issues with CMake, hence only BLAS routines can be compiled with CMake:</p>
 <div class="highlight"><pre><span></span><code>$<span class="w"> </span>mkdir<span class="w"> </span>build
 $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build
 $<span class="w"> </span>cmake<span class="w"> </span>..<span class="w">  </span>-G<span class="w"> </span>Ninja<span class="w"> </span>-DCMAKE_C_COMPILER<span class="o">=</span>clang<span class="w"> </span>-DBUILD_WITHOUT_LAPACK<span class="o">=</span><span class="m">1</span><span class="w"> </span>-DNOFORTRAN<span class="o">=</span><span class="m">1</span><span class="w"> </span>-DDYNAMIC_ARCH<span class="o">=</span><span class="m">0</span><span class="w"> </span>-DTARGET<span class="o">=</span>ARMV8<span class="w"> </span>-DARCH<span class="o">=</span>arm64<span class="w"> </span>-DBINARY<span class="o">=</span><span class="m">64</span><span class="w"> </span>-DUSE_OPENMP<span class="o">=</span><span class="m">0</span><span class="w"> </span>-DCMAKE_SYSTEM_PROCESSOR<span class="o">=</span>ARM64<span class="w"> </span>-DCMAKE_CROSSCOMPILING<span class="o">=</span><span class="m">1</span><span class="w"> </span>-DCMAKE_SYSTEM_NAME<span class="o">=</span>Windows
 $<span class="w"> </span>cmake<span class="w"> </span>--build<span class="w"> </span>.<span class="w"> </span>--config<span class="w"> </span>Release
 </code></pre></div>
-<h6 id="getarchexe-execution-error"><code>getarch.exe</code> execution error</h6>
-<p>If you notice that platform-specific headers by <code>getarch.exe</code> are not generated correctly, It could be due to a known debug runtime DLL issue for arm64 platforms. Please check out <a href="https://linaro.atlassian.net/wiki/spaces/WOAR/pages/28677636097/Debug+run-time+DLL+issue#Workaround">link</a> for the workaround.</p>
-<h4 id="mingw-import-library">MinGW import library</h4>
-<p>Microsoft Windows has this thing called "import libraries". You don't need it in MinGW because the <code>ld</code> linker from GNU Binutils is smart, but you may still want it for whatever reason.</p>
-<h5 id="make-the-def">Make the <code>.def</code></h5>
-<p>Import libraries are compiled from a list of what symbols to use, <code>.def</code>. This should be already in your <code>exports</code> directory: <code>cd OPENBLAS_TOP_DIR/exports</code>.</p>
-<h5 id="making-a-mingw-import-library">Making a MinGW import library</h5>
-<p>MinGW import libraries have the suffix <code>.a</code>, same as static libraries. (It's actually more common to do <code>.dll.a</code>...)</p>
-<p>You need to first prepend <code>libopenblas.def</code> with a line <code>LIBRARY libopenblas.dll</code>:</p>
-<pre><code>cat &lt;(echo "LIBRARY libopenblas.dll") libopenblas.def &gt; libopenblas.def.1
+</li>
+</ol>
+<div class="admonition tip">
+<p class="admonition-title"><code>getarch.exe</code> execution error</p>
+<p>If you notice that platform-specific headers by <code>getarch.exe</code> are not
+generated correctly, this could be due to a known debug runtime DLL issue for
+arm64 platforms. Please check out <a href="https://linaro.atlassian.net/wiki/spaces/WOAR/pages/28677636097/Debug+run-time+DLL+issue#Workaround">this page</a>
+for a workaround.</p>
+</div>
+<h4 id="generating-an-import-library">Generating an import library</h4>
+<p>Microsoft Windows has this thing called "import libraries". You need it for
+MSVC; you don't need it for MinGW because the <code>ld</code> linker is smart enough -
+however, you may still want it for some reason, so we'll describe the process
+for both MSVC and MinGW.</p>
+<p>Import libraries are compiled from a list of what symbols to use, which are
+contained in a <code>.def</code> file. A <code>.def</code> file should be already be present in the
+<code>exports</code> directory under the top-level OpenBLAS directory after you've run a build.
+In your shell, move to this directory: <code>cd exports</code>.</p>
+<div class="tabbed-set tabbed-alternate" data-tabs="4:2"><input checked="checked" id="__tabbed_4_1" name="__tabbed_4" type="radio" /><input id="__tabbed_4_2" name="__tabbed_4" type="radio" /><div class="tabbed-labels"><label for="__tabbed_4_1">MSVC</label><label for="__tabbed_4_2">MinGW</label></div>
+<div class="tabbed-content">
+<div class="tabbed-block">
+<p>Unlike MinGW, MSVC absolutely requires an import library. Now the C ABI of
+MSVC and MinGW are actually identical, so linking is actually okay (any
+incompatibility in the C ABI would be a bug).</p>
+<p>The import libraries of MSVC have the suffix <code>.lib</code>. They are generated
+from a <code>.def</code> file using MSVC's <code>lib.exe</code>. See <a href="use_visual_studio.md#generate-import-library-before-0210-version">the MSVC instructions</a>.</p>
+</div>
+<div class="tabbed-block">
+<p>MinGW import libraries have the suffix <code>.a</code>, just like static libraries.
+Our goal is to produce the file <code>libopenblas.dll.a</code>.</p>
+<p>You need to first insert a line <code>LIBRARY libopenblas.dll</code> in <code>libopenblas.def</code>:
+<div class="highlight"><pre><span></span><code>cat &lt;(echo &quot;LIBRARY libopenblas.dll&quot;) libopenblas.def &gt; libopenblas.def.1
 mv libopenblas.def.1 libopenblas.def
-</code></pre>
-<p>Now it probably looks like:</p>
-<pre><code>LIBRARY libopenblas.dll
+</code></pre></div></p>
+<p>Now the <code>.def</code> file probably looks like:
+<div class="highlight"><pre><span></span><code>LIBRARY libopenblas.dll
 EXPORTS
    caxpy=caxpy_  @1
    caxpy_=caxpy_  @2
        ...
-</code></pre>
-<p>Then, generate the import library: <code>dlltool -d libopenblas.def -l libopenblas.a</code></p>
-<p>Again, there is basically <strong>no point</strong> in making an import library for use in MinGW. It actually slows down linking.</p>
-<h5 id="making-a-msvc-import-library">Making a MSVC import library</h5>
-<p>Unlike MinGW, MSVC absolutely requires an import library. Now the C ABI of MSVC and MinGW are actually identical, so linking is actually okay. (Any incompatibility in the C ABI would be a bug.)</p>
-<p>The import libraries of MSVC have the suffix <code>.lib</code>. They are generated from a <code>.def</code> file using MSVC's <code>lib.exe</code>. See <a href="use_visual_studio.md#generate-import-library-before-0210-version">the MSVC instructions</a>.</p>
-<h5 id="notes">Notes</h5>
-<ul>
-<li>Always remember that MinGW is <strong>not the same</strong> as MSYS2 or Cygwin. MSYS2 and Cygwin are full POSIX environments with a lot of magic such as <code>fork()</code> and its own <code>malloc()</code>. MinGW, which builds on the normal Microsoft C Runtime, has none of that. Be clear about which one you are building for.</li>
-</ul>
-<h3 id="mac-osx">Mac OSX</h3>
-<p>If your CPU is Sandy Bridge, please use Clang version 3.1 and above. The Clang 3.0 will generate the wrong AVX binary code of OpenBLAS.</p>
-<h4 id="precompiled-packages_2">Precompiled packages</h4>
-<p><a href="https://www.macports.org/ports.php?by=name&amp;substr=openblas">https://www.macports.org/ports.php?by=name&amp;substr=openblas</a></p>
-<p><code>brew install openblas</code></p>
-<p>or using the conda package manager from
-<a href="https://github.com/conda-forge/miniforge#download">https://github.com/conda-forge/miniforge#download</a>
-(which also has packages for the new M1 cpu)</p>
-<p><code>conda install openblas</code></p>
-<h4 id="build-on-apple-m1">Build on Apple M1</h4>
-<p>On newer versions of Xcode and on arm64, you might need to compile with a newer macOS target (11.0) than the default (10.8) with <code>MACOSX_DEPLOYMENT_TARGET=11.0</code>, or switch your command-line tools to use an older SDK (e.g., <a href="https://developer.apple.com/download/all/?q=Xcode%2013">13.1</a>).</p>
-<ul>
-<li>without Fortran compiler （cannot build LAPACK)
-  <div class="highlight"><pre><span></span><code>$<span class="w"> </span>make<span class="w"> </span><span class="nv">CC</span><span class="o">=</span>cc<span class="w"> </span><span class="nv">NOFORTRAN</span><span class="o">=</span><span class="m">1</span>
-</code></pre></div></li>
-<li>with Fortran compiler (you could <code>brew install gfortran</code>) https://github.com/xianyi/OpenBLAS/issues/3032
-  <div class="highlight"><pre><span></span><code>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">MACOSX_DEPLOYMENT_TARGET</span><span class="o">=</span><span class="m">11</span>.0
-$<span class="w"> </span>make<span class="w"> </span><span class="nv">CC</span><span class="o">=</span>cc<span class="w"> </span><span class="nv">FC</span><span class="o">=</span>gfortran
-</code></pre></div></li>
-</ul>
+</code></pre></div>
+Then, generate the import library: <code>dlltool -d libopenblas.def -l libopenblas.dll.a</code></p>
+<p><em>Again, there is basically <strong>no point</strong> in making an import library for use in MinGW. It actually slows down linking.</em></p>
+</div>
+</div>
+</div>
 <h3 id="android">Android</h3>
-<h4 id="prerequisites_1">Prerequisites</h4>
-<p>In addition to the Android NDK, you will need both Perl and a C compiler on the build host as these are currently
-required by the OpenBLAS build environment.</p>
-<h4 id="building-with-android-ndk-using-clang-compiler">Building with android NDK using clang compiler</h4>
-<p>Around version 11 Android NDKs stopped supporting gcc, so you would need to use clang to compile OpenBLAS. clang is supported from OpenBLAS 0.2.20 version onwards. See below sections on how to build with clang for ARMV7 and ARMV8 targets. The same basic principles as described below for ARMV8 should also apply to building an x86 or x86_64 version (substitute something like NEHALEM for the target instead of ARMV8 and replace all the aarch64 in the toolchain paths obviously)
-"Historic" notes:
-Since version 19 the default toolchain is provided as a standalone toolchain, so building one yourself following <a href="http://developer.android.com/ndk/guides/standalone_toolchain.html">building a standalone toolchain</a> should no longer be necessary.
-If you want to use static linking with an old NDK version older than about r17, you need to choose an API level below 23 currently due to NDK bug 272 (https://github.com/android-ndk/ndk/issues/272 , the libc.a lacks a definition of stderr) that will probably be fixed in r17 of the NDK.</p>
-<h4 id="build-armv7-with-clang">Build ARMV7 with clang</h4>
-<p><div class="highlight"><pre><span></span><code>## Set path to ndk-bundle
-export NDK_BUNDLE_DIR=/path/to/ndk-bundle
+<p>To build OpenBLAS for Android, you will need the following tools installed on your machine:</p>
+<ul>
+<li><a href="https://developer.android.com/ndk/">The Android NDK</a></li>
+<li>Perl</li>
+<li>Clang compiler on the build machine</li>
+</ul>
+<p>The next two sections below describe how to build with Clang for ARMV7 and
+ARMV8 targets, respectively. The same basic principles as described below for
+ARMV8 should also apply to building an x86 or x86-64 version (substitute
+something like <code>NEHALEM</code> for the target instead of <code>ARMV8</code>, and replace all the
+<code>aarch64</code> in the toolchain paths with <code>x86</code> or <code>x96_64</code> as appropriate).</p>
+<div class="admonition info">
+<p class="admonition-title">Historic note</p>
+<p>Since NDK version 19, the default toolchain is provided as a standalone
+toolchain, so building one yourself following
+<a href="http://developer.android.com/ndk/guides/standalone_toolchain.html">building a standalone toolchain</a>
+should no longer be necessary.</p>
+</div>
+<h4 id="building-for-armv7">Building for ARMV7</h4>
+<div class="highlight"><pre><span></span><code><span class="c1"># Set path to ndk-bundle</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">NDK_BUNDLE_DIR</span><span class="o">=</span>/path/to/ndk-bundle
 
-## Set the PATH to contain paths to clang and arm-linux-androideabi-* utilities
-export PATH=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
+<span class="c1"># Set the PATH to contain paths to clang and arm-linux-androideabi-* utilities</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">PATH</span><span class="o">=</span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span>/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:<span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span>/toolchains/llvm/prebuilt/linux-x86_64/bin:<span class="nv">$PATH</span>
 
-## Set LDFLAGS so that the linker finds the appropriate libgcc
-export LDFLAGS=&quot;-L${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x&quot;
+<span class="c1"># Set LDFLAGS so that the linker finds the appropriate libgcc</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">&quot;-L</span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x&quot;</span>
 
-## Set the clang cross compile flags
-export CLANG_FLAGS=&quot;-target arm-linux-androideabi -marm -mfpu=vfp -mfloat-abi=softfp --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/&quot;
+<span class="c1"># Set the clang cross compile flags</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">CLANG_FLAGS</span><span class="o">=</span><span class="s2">&quot;-target arm-linux-androideabi -marm -mfpu=vfp -mfloat-abi=softfp --sysroot </span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/platforms/android-23/arch-arm -gcc-toolchain </span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/&quot;</span>
 
-#OpenBLAS Compile
-make TARGET=ARMV7 ONLY_CBLAS=1 AR=ar CC=&quot;clang ${CLANG_FLAGS}&quot; HOSTCC=gcc ARM_SOFTFP_ABI=1 -j4
+<span class="c1">#OpenBLAS Compile</span>
+make<span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>ARMV7<span class="w"> </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">AR</span><span class="o">=</span>ar<span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;clang </span><span class="si">${</span><span class="nv">CLANG_FLAGS</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">ARM_SOFTFP_ABI</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>-j4
 </code></pre></div>
-On a Mac, it may also be necessary to give the complete path to the <code>ar</code> utility in the make command above, like so:
-<div class="highlight"><pre><span></span><code>AR=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc-ar
+<p>On macOS, it may also be necessary to give the complete path to the <code>ar</code>
+utility in the make command above, like so:
+<div class="highlight"><pre><span></span><code><span class="nv">AR</span><span class="o">=</span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span>/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc-ar
 </code></pre></div>
-otherwise you may get a linker error complaining about a "malformed archive header name at 8" when the native OSX ar command was invoked instead.</p>
-<h4 id="build-armv8-with-clang">Build ARMV8 with clang</h4>
-<p><div class="highlight"><pre><span></span><code>## Set path to ndk-bundle
-export NDK_BUNDLE_DIR=/path/to/ndk-bundle/
+otherwise you may get a linker error complaining like <code>malformed archive header
+name at 8</code> when the native macOS <code>ar</code> command was invoked instead.</p>
+<h4 id="building-for-armv8">Building for ARMV8</h4>
+<p><div class="highlight"><pre><span></span><code><span class="c1"># Set path to ndk-bundle</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">NDK_BUNDLE_DIR</span><span class="o">=</span>/path/to/ndk-bundle/
 
-## Export PATH to contain directories of clang and aarch64-linux-android-* utilities
-export PATH=${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
+<span class="c1"># Export PATH to contain directories of clang and aarch64-linux-android-* utilities</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">PATH</span><span class="o">=</span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span>/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/:<span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span>/toolchains/llvm/prebuilt/linux-x86_64/bin:<span class="nv">$PATH</span>
 
-## Setup LDFLAGS so that loader can find libgcc and pass -lm for sqrt
-export LDFLAGS=&quot;-L${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x -lm&quot;
+<span class="c1"># Setup LDFLAGS so that loader can find libgcc and pass -lm for sqrt</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">&quot;-L</span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x -lm&quot;</span>
 
-## Setup the clang cross compile options
-export CLANG_FLAGS=&quot;-target aarch64-linux-android --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm64 -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/&quot;
+<span class="c1"># Setup the clang cross compile options</span>
+<span class="nb">export</span><span class="w"> </span><span class="nv">CLANG_FLAGS</span><span class="o">=</span><span class="s2">&quot;-target aarch64-linux-android --sysroot </span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/platforms/android-23/arch-arm64 -gcc-toolchain </span><span class="si">${</span><span class="nv">NDK_BUNDLE_DIR</span><span class="si">}</span><span class="s2">/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/&quot;</span>
 
-## Compile
-make TARGET=ARMV8 ONLY_CBLAS=1 AR=ar CC=&quot;clang ${CLANG_FLAGS}&quot; HOSTCC=gcc -j4
+<span class="c1"># Compile</span>
+make<span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>ARMV8<span class="w"> </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">AR</span><span class="o">=</span>ar<span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;clang </span><span class="si">${</span><span class="nv">CLANG_FLAGS</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span>-j4
 </code></pre></div>
-Note: Using TARGET=CORTEXA57 in place of ARMV8 will pick up better optimized routines. Implementations for CORTEXA57 target is compatible with all other armv8 targets.</p>
-<p>Note: For NDK 23b, something as simple as 
-<div class="highlight"><pre><span></span><code>export PATH=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
-make HOSTCC=gcc CC=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang ONLY_CBLAS=1 TARGET=ARMV8
+Note: using <code>TARGET=CORTEXA57</code> in place of <code>ARMV8</code> will pick up better
+optimized routines. Implementations for the <code>CORTEXA57</code> target are compatible
+with all other <code>ARMV8</code> targets.</p>
+<p>Note: for NDK 23b, something as simple as:
+<div class="highlight"><pre><span></span><code><span class="nb">export</span><span class="w"> </span><span class="nv">PATH</span><span class="o">=</span>/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/:<span class="nv">$PATH</span>
+make<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CC</span><span class="o">=</span>/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang<span class="w"> </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>ARMV8
 </code></pre></div>
 appears to be sufficient on Linux.</p>
-<h4 id="alternative-script-which-was-tested-on-osx-with-ndk2136528147">Alternative script which was tested on OSX with NDK(21.3.6528147)</h4>
-<p>This script will build openblas for 3 architecture (ARMV7,ARMV8,X86) and put them with <code>sudo make install</code> to <code>/opt/OpenBLAS/lib</code>
-<div class="highlight"><pre><span></span><code>export NDK=YOUR_PATH_TO_SDK/Android/sdk/ndk/21.3.6528147
-export TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/darwin-x86_64
+<details class="note">
+<summary>Alternative build script for 3 architectures</summary>
+<p>This script will build OpenBLAS for 3 architecture (<code>ARMV7</code>, <code>ARMV8</code>, <code>X86</code>) and install them to <code>/opt/OpenBLAS/lib</code>.
+It was tested on macOS with NDK version 21.3.6528147.</p>
+<p><div class="highlight"><pre><span></span><code><span class="nb">export</span><span class="w"> </span><span class="nv">NDK</span><span class="o">=</span>YOUR_PATH_TO_SDK/Android/sdk/ndk/21.3.6528147
+<span class="nb">export</span><span class="w"> </span><span class="nv">TOOLCHAIN</span><span class="o">=</span><span class="nv">$NDK</span>/toolchains/llvm/prebuilt/darwin-x86_64
 
-make clean
-make \
-    TARGET=ARMV7 \
-    ONLY_CBLAS=1 \
-    CC=&quot;$TOOLCHAIN&quot;/bin/armv7a-linux-androideabi21-clang \
-    AR=&quot;$TOOLCHAIN&quot;/bin/arm-linux-androideabi-ar \
-    HOSTCC=gcc \
-    ARM_SOFTFP_ABI=1 \
-    -j4
-sudo make install
+make<span class="w"> </span>clean
+make<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">TARGET</span><span class="o">=</span>ARMV7<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/armv7a-linux-androideabi21-clang<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">AR</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/arm-linux-androideabi-ar<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">ARM_SOFTFP_ABI</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>-j4
+sudo<span class="w"> </span>make<span class="w"> </span>install
 
-make clean
-make \
-    TARGET=CORTEXA57 \
-    ONLY_CBLAS=1 \
-    CC=$TOOLCHAIN/bin/aarch64-linux-android21-clang \
-    AR=$TOOLCHAIN/bin/aarch64-linux-android-ar \
-    HOSTCC=gcc \
-    -j4
-sudo make install
+make<span class="w"> </span>clean
+make<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">TARGET</span><span class="o">=</span>CORTEXA57<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$TOOLCHAIN</span>/bin/aarch64-linux-android21-clang<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$TOOLCHAIN</span>/bin/aarch64-linux-android-ar<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="se">\</span>
+-j4
+sudo<span class="w"> </span>make<span class="w"> </span>install
 
-make clean
-make \
-    TARGET=ATOM \
-    ONLY_CBLAS=1 \
-    CC=&quot;$TOOLCHAIN&quot;/bin/i686-linux-android21-clang \
-    AR=&quot;$TOOLCHAIN&quot;/bin/i686-linux-android-ar \
-    HOSTCC=gcc \
-    ARM_SOFTFP_ABI=1 \
-    -j4
-sudo make install
+make<span class="w"> </span>clean
+make<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">TARGET</span><span class="o">=</span>ATOM<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/i686-linux-android21-clang<span class="w"> </span><span class="se">\</span>
+<span class="nv">AR</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/i686-linux-android-ar<span class="w"> </span><span class="se">\</span>
+<span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="se">\</span>
+<span class="nv">ARM_SOFTFP_ABI</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+-j4
+sudo<span class="w"> </span>make<span class="w"> </span>install
 
-## This will build for x86_64 
-make clean
-make \
-    TARGET=ATOM BINARY=64\
-    ONLY_CBLAS=1 \
-    CC=&quot;$TOOLCHAIN&quot;/bin/x86_64-linux-android21-clang \
-    AR=&quot;$TOOLCHAIN&quot;/bin/x86_64-linux-android-ar \
-    HOSTCC=gcc \
-    ARM_SOFTFP_ABI=1 \
-    -j4
-sudo make install
+<span class="c1">## This will build for x86_64 </span>
+make<span class="w"> </span>clean
+make<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span><span class="nv">TARGET</span><span class="o">=</span>ATOM<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">64</span><span class="se">\</span>
+<span class="nv">ONLY_CBLAS</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="nv">CC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/x86_64-linux-android21-clang<span class="w"> </span><span class="se">\</span>
+<span class="nv">AR</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$TOOLCHAIN</span><span class="s2">&quot;</span>/bin/x86_64-linux-android-ar<span class="w"> </span><span class="se">\</span>
+<span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="se">\</span>
+<span class="nv">ARM_SOFTFP_ABI</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+-j4
+sudo<span class="w"> </span>make<span class="w"> </span>install
 </code></pre></div>
-Also you can find full list of target architectures in <a href="https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt">TargetsList.txt</a></p>
-<hr />
-<p>anything below this line should be irrelevant nowadays unless you need to perform software archeology</p>
-<hr />
-<h4 id="building-openblas-with-very-old-gcc-based-versions-of-the-ndk-without-fortran">Building OpenBLAS with very old gcc-based versions of the NDK, without Fortran</h4>
-<p>The prebuilt Android NDK toolchains do not include Fortran, hence parts like LAPACK cannot be built. You can still build OpenBLAS without it. For instructions on how to build OpenBLAS with Fortran, see the <a href="#building-openblas-with-fortran">next section</a>.</p>
-<p>To use easily the prebuilt toolchains, follow <a href="http://developer.android.com/ndk/guides/standalone_toolchain.html">building a standalone toolchain</a> for your desired architecture.
-This would be <code>arm-linux-androideabi-gcc-4.9</code> for ARMV7 and <code>aarch64-linux-android-gcc-4.9</code> for ARMV8.</p>
-<p>You can build OpenBLAS (0.2.19 and earlier) with:
-<div class="highlight"><pre><span></span><code>## Add the toolchain to your path
-export PATH=/path/to/standalone-toolchain/bin:$PATH
-
-## Build without Fortran for ARMV7
-make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1 libs
-## Build without Fortran for ARMV8
-make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1 libs
-</code></pre></div></p>
-<p>Since we are cross-compiling, we make the <code>libs</code> recipe, not <code>all</code>. Otherwise you will get errors when trying to link/run tests as versions up to and including 0.2.19 cannot build a shared library for Android. </p>
-<p>From 0.2.20 on, you should leave off the "libs" to get a full build, and you may want to use the softfp ABI instead of the deprecated hardfp one on ARMV7 so you would use 
-<div class="highlight"><pre><span></span><code>## Add the toolchain to your path
-export PATH=/path/to/standalone-toolchain/bin:$PATH
-
-## Build without Fortran for ARMV7
-make TARGET=ARMV7 ARM_SOFTFP_ABI=1 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1
-## Build without Fortran for ARMV8
-make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1
-</code></pre></div></p>
-<p>If you get an error about stdio.h not being found, you need to specify your sysroot in the CFLAGS argument to <code>make</code> like
-<code>CFLAGS=--sysroot=$NDK/platforms/android-16/arch-arm</code>
-When you are done, install OpenBLAS into the desired directory. Be sure to also use all command line options
-here that you specified for building, otherwise errors may occur as it tries to install things you did not build:
-<div class="highlight"><pre><span></span><code>make PREFIX=/path/to/install-dir TARGET=... install
-</code></pre></div></p>
-<h4 id="building-openblas-with-fortran">Building OpenBLAS with Fortran</h4>
-<p>Instructions on how to build the GNU toolchains with Fortran can be found <a href="https://github.com/buffer51/android-gfortran">here</a>. The <a href="https://github.com/buffer51/android-gfortran/releases">Releases section</a> provides prebuilt versions, use the standalone one.</p>
-<p>You can build OpenBLAS with:
-<div class="highlight"><pre><span></span><code>## Add the toolchain to your path
-export PATH=/path/to/standalone-toolchain-with-fortran/bin:$PATH
-
-## Build with Fortran for ARMV7
-make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc FC=arm-linux-androideabi-gfortran libs
-## Build with LAPACK for ARMV8
-make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc FC=aarch64-linux-android-gfortran libs
-</code></pre></div></p>
-<p>As mentioned above you can leave off the <code>libs</code> argument here when building 0.2.20 and later, and you may want to add ARM_SOFTFP_ABI=1 when building for ARMV7.</p>
-<h4 id="linking-openblas-0219-and-earlier-for-armv7">Linking OpenBLAS (0.2.19 and earlier) for ARMV7</h4>
-<p>If you are using <code>ndk-build</code>, you need to set the ABI to hard floating points in your Application.mk:
-<div class="highlight"><pre><span></span><code>APP_ABI := armeabi-v7a-hard
-</code></pre></div></p>
-<p>This will set the appropriate flags for you. If you are not using <code>ndk-build</code>, you will want to add the following flags:
-<div class="highlight"><pre><span></span><code>TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
-TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
-</code></pre></div></p>
-<p>From 0.2.20 on, it is also possible to build for the softfp ABI by specifying ARM_SOFTFP_ABI=1 during the build.
-In that case, also make sure that all your dependencies are compiled with -mfloat-abi=softfp as well, as mixing
-"hard" and "soft" floating point ABIs in a program will make it crash.</p>
+You can find full list of target architectures in <a href="https://github.com/OpenMathLib/OpenBLAS/blob/develop/TargetList.txt">TargetList.txt</a></p>
+</details>
 <h3 id="iphoneios">iPhone/iOS</h3>
-<p>As none of the current developers uses iOS, the following instructions are what was found to work in our Azure CI setup, but as far as we know this builds a fully working OpenBLAS for this platform.</p>
+<p>As none of the current developers uses iOS, the following instructions are what
+was found to work in our Azure CI setup, but as far as we know this builds a
+fully working OpenBLAS for this platform.</p>
 <p>Go to the directory where you unpacked OpenBLAS,and enter the following commands:
-<div class="highlight"><pre><span></span><code>     CC=/Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
+<div class="highlight"><pre><span></span><code><span class="nv">CC</span><span class="o">=</span>/Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
 
-CFLAGS= -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch arm64 -miphoneos-version-min=10.0
+<span class="nv">CFLAGS</span><span class="o">=</span><span class="w"> </span>-O2<span class="w"> </span>-Wno-macro-redefined<span class="w"> </span>-isysroot<span class="w"> </span>/Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk<span class="w"> </span>-arch<span class="w"> </span>arm64<span class="w"> </span>-miphoneos-version-min<span class="o">=</span><span class="m">10</span>.0
 
-make TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1
+make<span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>ARMV8<span class="w"> </span><span class="nv">DYNAMIC_ARCH</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">NUM_THREADS</span><span class="o">=</span><span class="m">32</span><span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>clang<span class="w"> </span><span class="nv">NOFORTRAN</span><span class="o">=</span><span class="m">1</span>
 </code></pre></div>
-Adjust MIN_IOS_VERSION as necessary for your installation, e.g. change the version number
+Adjust <code>MIN_IOS_VERSION</code> as necessary for your installation. E.g., change the version number
 to the minimum iOS version you want to target and execute this file to build the library.</p>
 <h3 id="mips">MIPS</h3>
-<p>For mips targets you will need latest toolchains
-P5600 - MTI GNU/Linux Toolchain
-I6400, P6600 - IMG GNU/Linux Toolchain</p>
-<p>The download link is below
-(http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-03/downloads.html)</p>
-<p>You can use following commandlines for builds</p>
-<pre><code>IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
-IMG_GCC_PREFIX=mips-img-linux-gnu
-IMG_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}
+<p>For MIPS targets you will need latest toolchains:</p>
+<ul>
+<li>P5600 - MTI GNU/Linux Toolchain</li>
+<li>I6400, P6600 - IMG GNU/Linux Toolchain</li>
+</ul>
+<p>You can use following commandlines for builds:</p>
+<div class="highlight"><pre><span></span><code><span class="nv">IMG_TOOLCHAIN_DIR</span><span class="o">={</span>full<span class="w"> </span>IMG<span class="w"> </span>GNU/Linux<span class="w"> </span>Toolchain<span class="w"> </span>path<span class="w"> </span>including<span class="w"> </span><span class="s2">&quot;bin&quot;</span><span class="w"> </span>directory<span class="w"> </span>--<span class="w"> </span><span class="k">for</span><span class="w"> </span>example,<span class="w"> </span>/opt/linux_toolchain/bin<span class="o">}</span>
+<span class="nv">IMG_GCC_PREFIX</span><span class="o">=</span>mips-img-linux-gnu
+<span class="nv">IMG_TOOLCHAIN</span><span class="o">=</span><span class="si">${</span><span class="nv">IMG_TOOLCHAIN_DIR</span><span class="si">}</span>/<span class="si">${</span><span class="nv">IMG_GCC_PREFIX</span><span class="si">}</span>
 
-I6400 Build (n32):
-make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400
+<span class="c1"># I6400 Build (n32):</span>
+make<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">32</span><span class="w"> </span><span class="nv">BINARY32</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-gcc<span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ar<span class="w"> </span><span class="nv">FC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$IMG_TOOLCHAIN</span><span class="s2">-gfortran -EL -mabi=n32&quot;</span><span class="w"> </span><span class="nv">RANLIB</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ranlib<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-EL&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>I6400
 
-I6400 Build (n64):
-make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400
+<span class="c1"># I6400 Build (n64):</span>
+make<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">64</span><span class="w"> </span><span class="nv">BINARY64</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-gcc<span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ar<span class="w"> </span><span class="nv">FC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$IMG_TOOLCHAIN</span><span class="s2">-gfortran -EL&quot;</span><span class="w"> </span><span class="nv">RANLIB</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ranlib<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-EL&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>I6400
 
-P6600 Build (n32):
-make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P6600
+<span class="c1"># P6600 Build (n32):</span>
+make<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">32</span><span class="w"> </span><span class="nv">BINARY32</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-gcc<span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ar<span class="w"> </span><span class="nv">FC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$IMG_TOOLCHAIN</span><span class="s2">-gfortran -EL -mabi=n32&quot;</span><span class="w"> </span><span class="nv">RANLIB</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ranlib<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-EL&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>P6600
 
-P6600 Build (n64):
-make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS="$CFLAGS" LDFLAGS="$CFLAGS" TARGET=P6600
+<span class="c1"># P6600 Build (n64):</span>
+make<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">64</span><span class="w"> </span><span class="nv">BINARY64</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-gcc<span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ar<span class="w"> </span><span class="nv">FC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$IMG_TOOLCHAIN</span><span class="s2">-gfortran -EL&quot;</span><span class="w"> </span><span class="nv">RANLIB</span><span class="o">=</span><span class="nv">$IMG_TOOLCHAIN</span>-ranlib<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-EL&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$CFLAGS</span><span class="s2">&quot;</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$CFLAGS</span><span class="s2">&quot;</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>P6600
 
-MTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
-MTI_GCC_PREFIX=mips-mti-linux-gnu
-MTI_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}
+<span class="nv">MTI_TOOLCHAIN_DIR</span><span class="o">={</span>full<span class="w"> </span>MTI<span class="w"> </span>GNU/Linux<span class="w"> </span>Toolchain<span class="w"> </span>path<span class="w"> </span>including<span class="w"> </span><span class="s2">&quot;bin&quot;</span><span class="w"> </span>directory<span class="w"> </span>--<span class="w"> </span><span class="k">for</span><span class="w"> </span>example,<span class="w"> </span>/opt/linux_toolchain/bin<span class="o">}</span>
+<span class="nv">MTI_GCC_PREFIX</span><span class="o">=</span>mips-mti-linux-gnu
+<span class="nv">MTI_TOOLCHAIN</span><span class="o">=</span><span class="si">${</span><span class="nv">IMG_TOOLCHAIN_DIR</span><span class="si">}</span>/<span class="si">${</span><span class="nv">IMG_GCC_PREFIX</span><span class="si">}</span>
 
-P5600 Build:
+<span class="c1"># P5600 Build:</span>
 
-make BINARY=32 BINARY32=1 CC=$MTI_TOOLCHAIN-gcc AR=$MTI_TOOLCHAIN-ar FC="$MTI_TOOLCHAIN-gfortran -EL"    RANLIB=$MTI_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P5600
-</code></pre>
-<h3 id="freebsd">FreeBSD</h3>
-<p>You will need to install the following tools from the FreeBSD ports tree:
-* lang/gcc [1]
-* lang/perl5.12
-* ftp/curl
-* devel/gmake
-* devel/patch</p>
-<p>To compile run the command:</p>
-<pre><code>$ gmake CC=gcc46 FC=gfortran46
-</code></pre>
-<p>Note that you need to build with GNU make and manually specify the compiler, otherwhise gcc 4.2 from the base system would be used.</p>
-<p>[1]: <a href="http://www.bsdunix.ch/serendipity/index.php?/archives/345-Removal-of-Fortran-from-the-FreeBSD-base-system.html">Removal of Fortran from the FreeBSD base system</a></p>
-<div class="highlight"><pre><span></span><code>pkg install openblas
+make<span class="w"> </span><span class="nv">BINARY</span><span class="o">=</span><span class="m">32</span><span class="w"> </span><span class="nv">BINARY32</span><span class="o">=</span><span class="m">1</span><span class="w"> </span><span class="nv">CC</span><span class="o">=</span><span class="nv">$MTI_TOOLCHAIN</span>-gcc<span class="w"> </span><span class="nv">AR</span><span class="o">=</span><span class="nv">$MTI_TOOLCHAIN</span>-ar<span class="w"> </span><span class="nv">FC</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$MTI_TOOLCHAIN</span><span class="s2">-gfortran -EL&quot;</span><span class="w">    </span><span class="nv">RANLIB</span><span class="o">=</span><span class="nv">$MTI_TOOLCHAIN</span>-ranlib<span class="w"> </span><span class="nv">HOSTCC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-EL&quot;</span><span class="w"> </span><span class="nv">FFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">LDFLAGS</span><span class="o">=</span><span class="nv">$CFLAGS</span><span class="w"> </span><span class="nv">TARGET</span><span class="o">=</span>P5600
 </code></pre></div>
-<p>see <a href="https://www.freebsd.org/ports/index.html">https://www.freebsd.org/ports/index.html</a></p>
+<h3 id="freebsd_1">FreeBSD</h3>
+<p>You will need to install the following tools from the FreeBSD ports tree:</p>
+<ul>
+<li>lang/gcc</li>
+<li>lang/perl5.12</li>
+<li>ftp/curl</li>
+<li>devel/gmake</li>
+<li>devel/patch</li>
+</ul>
+<p>To compile run the command:
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>gmake<span class="w"> </span><span class="nv">CC</span><span class="o">=</span>gcc<span class="w"> </span><span class="nv">FC</span><span class="o">=</span>gfortran
+</code></pre></div></p>
 <h3 id="cortex-m">Cortex-M</h3>
-<p>Cortex-M is a widely used microcontroller that is present in a variety of industrial and consumer electronics.
-A common variant of the Cortex-M is the STM32F4xx series. Here, we will give instructions for building for
-the STM32F4xx.</p>
-<p>First, install the embedded arm gcc compiler from the arm website. Then, create the following toolchain file and build as follows.</p>
-<div class="highlight"><pre><span></span><code><span class="c"># cmake .. -G Ninja -DCMAKE_C_COMPILER=arm-none-eabi-gcc -DCMAKE_TOOLCHAIN_FILE:PATH=&quot;toolchain.cmake&quot; -DNOFORTRAN=1 -DTARGET=ARMV5 -DEMBEDDED=1</span>
-
-<span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_SYSTEM_NAME</span><span class="w"> </span><span class="s">Generic</span><span class="p">)</span>
+<p>Cortex-M is a widely used microcontroller that is present in a variety of
+industrial and consumer electronics. A common variant of the Cortex-M is the
+<code>STM32F4xx</code> series. Here, we will give instructions for building for that
+series.</p>
+<p>First, install the embedded Arm GCC compiler from the Arm website. Then, create
+the following <code>toolchain.cmake</code> file:</p>
+<div class="highlight"><pre><span></span><code><span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_SYSTEM_NAME</span><span class="w"> </span><span class="s">Generic</span><span class="p">)</span>
 <span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_SYSTEM_PROCESSOR</span><span class="w"> </span><span class="s">arm</span><span class="p">)</span>
 
 <span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_C_COMPILER</span><span class="w"> </span><span class="s2">&quot;arm-none-eabi-gcc.exe&quot;</span><span class="p">)</span>
@@ -1575,13 +1555,29 @@ the STM32F4xx.</p>
 <span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_FIND_ROOT_PATH_MODE_INCLUDE</span><span class="w"> </span><span class="s">ONLY</span><span class="p">)</span>
 <span class="nb">set</span><span class="p">(</span><span class="s">CMAKE_FIND_ROOT_PATH_MODE_PACKAGE</span><span class="w"> </span><span class="s">ONLY</span><span class="p">)</span>
 </code></pre></div>
-<p>In your embedded application, the following functions need to be provided for OpenBLAS to work correctly:</p>
+<p>Then build OpenBLAS with:
+<div class="highlight"><pre><span></span><code>$<span class="w"> </span>cmake<span class="w"> </span>..<span class="w"> </span>-G<span class="w"> </span>Ninja<span class="w"> </span>-DCMAKE_C_COMPILER<span class="o">=</span>arm-none-eabi-gcc<span class="w"> </span>-DCMAKE_TOOLCHAIN_FILE:PATH<span class="o">=</span><span class="s2">&quot;toolchain.cmake&quot;</span><span class="w"> </span>-DNOFORTRAN<span class="o">=</span><span class="m">1</span><span class="w"> </span>-DTARGET<span class="o">=</span>ARMV5<span class="w"> </span>-DEMBEDDED<span class="o">=</span><span class="m">1</span>
+</code></pre></div></p>
+<p>In your embedded application, the following functions need to be provided for OpenBLAS to work correctly:
 <div class="highlight"><pre><span></span><code><span class="kt">void</span><span class="w"> </span><span class="nf">free</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">ptr</span><span class="p">);</span>
 <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="nf">malloc</span><span class="p">(</span><span class="kt">size_t</span><span class="w"> </span><span class="n">size</span><span class="p">);</span>
-</code></pre></div>
+</code></pre></div></p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<p>If you are developing for an embedded platform, it is your responsibility to make sure that the device has sufficient memory for malloc calls. <a href="https://github.com/embeddedartistry/libmemory">Libmemory</a> provides one implementation of malloc for embedded platforms.</p>
+<p>If you are developing for an embedded platform, it is your responsibility
+to make sure that the device has sufficient memory for <code>malloc</code> calls.
+<a href="https://github.com/embeddedartistry/libmemory">Libmemory</a>
+provides one implementation of <code>malloc</code> for embedded platforms.</p>
+</div>
+<div class="footnote">
+<hr />
+<ol>
+<li id="fn:1">
+<p>If the OpenBLAS DLLs are not linked correctly, you may see an error like
+   <em>"The application was unable to start correctly (0xc000007b)"</em>, which typically
+   indicates a mismatch between 32-bit and 64-bit libraries.&#160;<a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">&#8617;</a></p>
+</li>
+</ol>
 </div>
 
 
@@ -1605,7 +1601,7 @@ the STM32F4xx.</p>
     <span class="md-icon" title="Last update">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21 13.1c-.1 0-.3.1-.4.2l-1 1 2.1 2.1 1-1c.2-.2.2-.6 0-.8l-1.3-1.3c-.1-.1-.2-.2-.4-.2m-1.9 1.8-6.1 6V23h2.1l6.1-6.1-2.1-2M12.5 7v5.2l4 2.4-1 1L11 13V7h1.5M11 21.9c-5.1-.5-9-4.8-9-9.9C2 6.5 6.5 2 12 2c5.3 0 9.6 4.1 10 9.3-.3-.1-.6-.2-1-.2s-.7.1-1 .2C19.6 7.2 16.2 4 12 4c-4.4 0-8 3.6-8 8 0 4.1 3.1 7.5 7.1 7.9l-.1.2v1.8Z"/></svg>
     </span>
-    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">March 12, 2024</span>
+    <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">July 4, 2024</span>
   </span>
 
     
diff --git a/docs/search/search_index.json b/docs/search/search_index.json
index c91045b7f..639afd268 100644
--- a/docs/search/search_index.json
+++ b/docs/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#introduction","title":"Introduction","text":"<p>OpenBLAS is an optimized Basic Linear Algebra Subprograms (BLAS) library based on GotoBLAS2 1.13 BSD version.</p> <p>OpenBLAS implements low-level routines for performing linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. OpenBLAS makes these routines available on multiple platforms, covering server, desktop and mobile operating systems, as well as different architectures including x86, ARM, MIPS, PPC, RISC-V, and zarch.</p> <p>The old GotoBLAS documentation can be found on GitHub.</p>"},{"location":"#license","title":"License","text":"<p>OpenBLAS is licensed under the 3-clause BSD license. The full license can be found on GitHub.</p>"},{"location":"about/","title":"About","text":""},{"location":"about/#mailing-list","title":"Mailing list","text":"<p>We have a GitHub discussions forum to discuss usage and development of OpenBLAS. We also have a Google group for users and a Google group for development of OpenBLAS. </p>"},{"location":"about/#acknowledgements","title":"Acknowledgements","text":"<p>This work was or is partially supported by the following grants, contracts and institutions:</p> <ul> <li>Research and Development of Compiler System and Toolchain for Domestic CPU, National S&amp;T Major Projects: Core Electronic Devices, High-end General Chips and Fundamental Software (No.2009ZX01036-001-002)</li> <li>National High-tech R&amp;D Program of China (Grant No.2012AA010903)</li> <li>PerfXLab</li> <li>Chan Zuckerberg Initiative's Essential Open Source Software for Science program:<ul> <li>Cycle 1 grant: Strengthening NumPy's foundations - growing beyond code (2019-2020)</li> <li>Cycle 3 grant: Improving usability and sustainability for NumPy and OpenBLAS (2020-2021)</li> </ul> </li> <li>Sovereign Tech Fund funding: Keeping high performance linear algebra computation accessible and open for all (2023-2024)</li> </ul> <p>Over the course of OpenBLAS development, a number of donations were received. You can read OpenBLAS's statement of receipts and disbursement and cash balance in this Google doc (covers 2013-2016). A list of backers is available in BACKERS.md in the main repo.</p>"},{"location":"about/#donations","title":"Donations","text":"<p>We welcome hardware donations, including the latest CPUs and motherboards.</p>"},{"location":"about/#open-source-users-of-openblas","title":"Open source users of OpenBLAS","text":"<p>Prominent open source users of OpenBLAS include:</p> <ul> <li>Julia - a high-level, high-performance dynamic programming language for technical computing</li> <li>NumPy - the fundamental package for scientific computing with Python</li> <li>SciPy - fundamental algorithms for scientific computing in Python</li> <li>R - a free software environment for statistical computing and graphics</li> <li>OpenCV - the world's biggest computer vision library</li> </ul> <p>OpenBLAS is packaged in most major Linux distros, as well as general and numerical computing-focused packaging ecosystems like Nix, Homebrew, Spack and conda-forge.</p> <p>OpenBLAS is used directly by libraries written in C, C++ and Fortran (and probably other languages), and directly by end users in those languages.</p>"},{"location":"about/#publications","title":"Publications","text":""},{"location":"about/#2013","title":"2013","text":"<ul> <li>Wang Qian, Zhang Xianyi, Zhang Yunquan, Qing Yi,  AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs, In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver CO, November 2013. [pdf]</li> </ul>"},{"location":"about/#2012","title":"2012","text":"<ul> <li>Zhang Xianyi, Wang Qian, Zhang Yunquan, Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17-19 Dec. 2012.</li> </ul>"},{"location":"build_system/","title":"Build system","text":"<p>This page describes the Make-based build, which is the default/authoritative build method. Note that the OpenBLAS repository also supports building with CMake (not described here) - that generally works and is tested, however there may be small differences between the Make and CMake builds.</p> <p>Warning</p> <p>This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.</p>"},{"location":"build_system/#makefile-dep-graph","title":"Makefile dep graph","text":"<pre><code>Makefile                                                        \n|                                                               \n|-----  Makefile.system # !!! this is included by many of the Makefiles in the subdirectories !!!\n|       |\n|       |=====  Makefile.prebuild # This is triggered (not included) once by Makefile.system \n|       |       |                 # and runs before any of the actual library code is built.\n|       |       |                 # (builds and runs the \"getarch\" tool for cpu identification,\n|       |       |                 # runs the compiler detection scripts c_check and f_check) \n|       |       |\n|       |       -----  (Makefile.conf) [ either this or Makefile_kernel.conf is generated ] \n|       |       |                            { Makefile.system#L243 }\n|       |       -----  (Makefile_kernel.conf) [ temporary Makefile.conf during DYNAMIC_ARCH builds ]\n|       |\n|       |-----  Makefile.rule # defaults for build options that can be given on the make command line\n|       |\n|       |-----  Makefile.$(ARCH) # architecture-specific compiler options and OpenBLAS buffer size values\n|\n|~~~~~ exports/\n|\n|~~~~~ test/\n|\n|~~~~~ utest/  \n|\n|~~~~~ ctest/\n|\n|~~~~~ cpp_thread_test/\n|\n|~~~~~ kernel/\n|\n|~~~~~ ${SUBDIRS}\n|\n|~~~~~ ${BLASDIRS}\n|\n|~~~~~ ${NETLIB_LAPACK_DIR}{,/timing,/testing/{EIG,LIN}}\n|\n|~~~~~ relapack/\n</code></pre>"},{"location":"build_system/#important-variables","title":"Important Variables","text":"<p>Most of the tunable variables are found in Makefile.rule, along with their detailed descriptions. Most of the variables are detected automatically in Makefile.prebuild, if they are not set in the environment.</p>"},{"location":"build_system/#cpu-related","title":"CPU related","text":"<pre><code>ARCH         - Target architecture (eg. x86_64)\nTARGET       - Target CPU architecture, in case of DYNAMIC_ARCH=1 means library will not be usable on less capable CPUs\nTARGET_CORE  - TARGET_CORE will override TARGET internally during each cpu-specific cycle of the build for DYNAMIC_ARCH\nDYNAMIC_ARCH - For building library for multiple TARGETs (does not lose any optimizations, but increases library size)\nDYNAMIC_LIST - optional user-provided subset of the DYNAMIC_CORE list in Makefile.system\n</code></pre>"},{"location":"build_system/#toolchain-related","title":"Toolchain related","text":"<pre><code>CC                 - TARGET C compiler used for compilation (can be cross-toolchains)\nFC                 - TARGET Fortran compiler used for compilation (can be cross-toolchains, set NOFORTRAN=1 if used cross-toolchain has no fortran compiler)\nAR, AS, LD, RANLIB - TARGET toolchain helpers used for compilation (can be cross-toolchains)\n\nHOSTCC             - compiler of build machine, needed to create proper config files for target architecture\nHOST_CFLAGS        - flags for build machine compiler\n</code></pre>"},{"location":"build_system/#library-related","title":"Library related","text":"<pre><code>BINARY          - 32/64 bit library\n\nBUILD_SHARED    - Create shared library\nBUILD_STATIC    - Create static library\n\nQUAD_PRECISION  - enable support for IEEE quad precision [ largely unimplemented leftover from GotoBLAS, do not use ]\nEXPRECISION     - Obsolete option to use float80 of SSE on BSD-like systems\nINTERFACE64     - Build with 64bit integer representations to support large array index values [ incompatible with standard API ]\n\nBUILD_SINGLE    - build the single-precision real functions of BLAS [and optionally LAPACK] \nBUILD_DOUBLE    - build the double-precision real functions\nBUILD_COMPLEX   - build the single-precision complex functions\nBUILD_COMPLEX16 - build the double-precision complex functions\n(all four types are included in the build by default when none was specifically selected)\n\nBUILD_BFLOAT16  - build the \"half precision brainfloat\" real functions \n\nUSE_THREAD      - Use a multithreading backend (default to pthread)\nUSE_LOCKING     - implement locking for thread safety even when USE_THREAD is not set (so that the singlethreaded library can\n                  safely be called from multithreaded programs)\nUSE_OPENMP      - Use OpenMP as multithreading backend\nNUM_THREADS     - define this to the maximum number of parallel threads you expect to need (defaults to the number of cores in the build cpu)\nNUM_PARALLEL    - define this to the number of OpenMP instances that your code may use for parallel calls into OpenBLAS (default 1,see below)\n</code></pre> <p>OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads. For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how many threads need to be supported on the target system(s).</p> <p>With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads. In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call. So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting <code>NUM_PARALLEL</code> to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available.</p>"},{"location":"ci/","title":"CI jobs","text":"Arch Target CPU OS Build system XComp to C Compiler Fortran Compiler threading DYN_ARCH INT64 Libraries CI Provider CPU  count x86_64 Intel 32bit Windows CMAKE/VS2015 - mingw6.3 - pthreads - - static Appveyor x86_64 Intel Windows CMAKE/VS2015 - mingw5.3 - pthreads - - static Appveyor x86_64 Intel Centos5 gmake - gcc 4.8 gfortran pthreads + - both Azure x86_64 SDE (SkylakeX) Ubuntu CMAKE - gcc gfortran pthreads - - both Azure x86_64 Haswell/ SkylakeX Windows CMAKE/VS2017 - VS2017 - - - static Azure x86_64 \" Windows mingw32-make - gcc gfortran list - both Azure x86_64 \" Windows CMAKE/Ninja - LLVM - - - static Azure x86_64 \" Windows CMAKE/Ninja - LLVM flang - - static Azure x86_64 \" Windows CMAKE/Ninja - VS2022 flang* - - static Azure x86_64 \" macOS11 gmake - gcc-10 gfortran OpenMP + - both Azure x86_64 \" macOS11 gmake - gcc-10 gfortran none - - both Azure x86_64 \" macOS12 gmake - gcc-12 gfortran pthreads - - both Azure x86_64 \" macOS11 gmake - llvm - OpenMP + - both Azure x86_64 \" macOS11 CMAKE - llvm - OpenMP no_avx512 - static Azure x86_64 \" macOS11 CMAKE - gcc-10 gfortran pthreads list - shared Azure x86_64 \" macOS11 gmake - llvm ifort pthreads - - both Azure x86_64 \" macOS11 gmake arm AndroidNDK-llvm - - - both Azure x86_64 \" macOS11 gmake arm64 XCode 12.4 - + - both Azure x86_64 \" macOS11 gmake arm XCode 12.4 - + - both Azure x86_64 \" Alpine Linux(musl) gmake - gcc gfortran pthreads + - both Azure arm64 Apple M1 OSX CMAKE/XCode - LLVM - OpenMP - - static Cirrus arm64 Apple M1 OSX CMAKE/Xcode - LLVM - OpenMP - + static Cirrus arm64 Apple M1 OSX CMAKE/XCode x86_64 LLVM - - + - static Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - pthreads - - both Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - pthreads - + both Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - OpenMP - - both Cirrus 8 x86_64 Ryzen FreeBSD gmake - gcc12.2 gfortran pthreads - - both Cirrus x86_64 Ryzen FreeBSD gmake gcc12.2 gfortran pthreads - + both Cirrus x86_64 GENERIC QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 SICORTEX QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 I6400 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 P6600 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 I6500 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 Intel Ubuntu CMAKE - gcc-11.3 gfortran pthreads + - static Github x86_64 Intel Ubuntu gmake - gcc-11.3 gfortran pthreads + - both Github x86_64 Intel Ubuntu CMAKE - gcc-11.3 flang-classic pthreads + - static Github x86_64 Intel Ubuntu gmake - gcc-11.3 flang-classic pthreads + - both Github x86_64 Intel macOS12 CMAKE - AppleClang 14 gfortran pthreads + - static Github x86_64 Intel macOS12 gmake - AppleClang 14 gfortran pthreads + - both Github x86_64 Intel Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + + static Github x86_64 Intel 32bit Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - LLVM 16 - + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - LLVM 16 - + + static Github x86_64 Intel Windows2022 CMAKE/Ninja - gcc 13 - + - static Github x86_64 Intel Ubuntu gmake mips64 gcc gfortran pthreads + - both Github x86_64 generic Ubuntu gmake riscv64 gcc gfortran pthreads - - both Github x86_64 Intel Ubuntu gmake mips32 gcc gfortran pthreads - - both Github x86_64 Intel Ubuntu gmake ia64 gcc gfortran pthreads - - both Github x86_64 C910V QEmu gmake riscv64 gcc gfortran pthreads - - both Github power pwr9 Ubuntu gmake - gcc gfortran OpenMP - - both OSUOSL zarch z14 Ubuntu gmake - gcc gfortran OpenMP - - both OSUOSL"},{"location":"developers/","title":"Developer manual","text":""},{"location":"developers/#source-code-layout","title":"Source code layout","text":"<pre><code>OpenBLAS/  \n\u251c\u2500\u2500 benchmark                  Benchmark codes for BLAS\n\u251c\u2500\u2500 cmake                      CMakefiles\n\u251c\u2500\u2500 ctest                      Test codes for CBLAS interfaces\n\u251c\u2500\u2500 driver                     Implemented in C\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 level2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 level3\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mapper\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 others                 Memory management, threading, etc\n\u251c\u2500\u2500 exports                    Generate shared library\n\u251c\u2500\u2500 interface                  Implement BLAS and CBLAS interfaces (calling driver or kernel)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapack\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 netlib\n\u251c\u2500\u2500 kernel                     Optimized assembly kernels for CPU architectures\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 alpha                  Original GotoBLAS kernels for DEC Alpha\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 arm                    ARMV5,V6,V7 kernels (including generic C codes used by other architectures)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 arm64                  ARMV8\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 generic                General kernel codes written in plain C, parts used by many architectures.\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ia64                   Original GotoBLAS kernels for Intel Itanium\n\u2502   \u251c\u2500\u2500 mips\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mips64\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 power\n|   \u251c\u2500\u2500 riscv64\n|   \u251c\u2500\u2500 simd                   Common code for Universal Intrinsics, used by some x86_64 and arm64 kernels\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 sparc\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 x86\n\u2502   \u251c\u2500\u2500 x86_64\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 zarch   \n\u251c\u2500\u2500 lapack                      Optimized LAPACK codes (replacing those in regular LAPACK)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getf2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getrf\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getrs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 laswp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lauu2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lauum\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 potf2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 potrf\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 trti2\n\u2502   \u251c\u2500\u2500 trtri\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 trtrs\n\u251c\u2500\u2500 lapack-netlib               LAPACK codes from netlib reference implementation\n\u251c\u2500\u2500 reference                   BLAS Fortran reference implementation (unused)\n\u251c\u2500\u2500 relapack                    Elmar Peise's recursive LAPACK (implemented on top of regular LAPACK)\n\u251c\u2500\u2500 test                        Test codes for BLAS\n\u2514\u2500\u2500 utest                       Regression test\n</code></pre> <p>A call tree for <code>dgemm</code> looks as follows: <pre><code>interface/gemm.c\n        \u2502\ndriver/level3/level3.c\n        \u2502\ngemm assembly kernels at kernel/\n</code></pre></p> <p>To find the kernel currently used for a particular supported CPU, please check the corresponding <code>kernel/$(ARCH)/KERNEL.$(CPU)</code> file.</p> <p>Here is an example for <code>kernel/x86_64/KERNEL.HASWELL</code>: <pre><code>...\nDTRMMKERNEL    =  dtrmm_kernel_4x8_haswell.c\nDGEMMKERNEL    =  dgemm_kernel_4x8_haswell.S\n...\n</code></pre> According to the above <code>KERNEL.HASWELL</code>, OpenBLAS Haswell dgemm kernel file is <code>dgemm_kernel_4x8_haswell.S</code>.</p>"},{"location":"developers/#optimizing-gemm-for-a-given-hardware","title":"Optimizing GEMM for a given hardware","text":"<p>Read the Goto paper to understand the algorithm</p> <p>Goto, Kazushige; van de Geijn, Robert A. (2008). \"Anatomy of High-Performance Matrix Multiplication\". ACM Transactions on Mathematical Software 34 (3): Article 12</p> <p>(The above link is available only to ACM members, but this and many related papers is also available on the pages of van de Geijn's FLAME project)</p> <p>The <code>driver/level3/level3.c</code> is the implementation of Goto's algorithm. Meanwhile, you can look at <code>kernel/generic/gemmkernel_2x2.c</code>, which is a naive <code>2x2</code> register blocking <code>gemm</code> kernel in C. Then:</p> <ul> <li>Write optimized assembly kernels. Consider instruction pipeline, available registers, memory/cache access.</li> <li>Tune cache block sizes (<code>Mc</code>, <code>Kc</code>, and <code>Nc</code>)</li> </ul> <p>Note that not all of the CPU-specific parameters in <code>param.h</code> are actively used in algorithms. <code>DNUMOPT</code> only appears as a scale factor in profiling output of the level3 <code>syrk</code> interface code, while its counterpart <code>SNUMOPT</code> (aliased as <code>NUMOPT</code> in <code>common.h</code>) is not used anywhere at all. </p> <p><code>SYMV_P</code> is only used in the generic kernels for the <code>symv</code> and <code>chemv</code>/<code>zhemv</code> functions - at least some of those are usually overridden by CPU-specific implementations, so if you start by cloning the existing implementation for a related CPU you need to check its <code>KERNEL</code> file to see if tuning <code>SYMV_P</code> would have any effect at all.</p> <p><code>GEMV_UNROLL</code> is only used by some older x86-64 kernels, so not all sections in <code>param.h</code> define it. Similarly, not all of the CPU parameters like L2 or L3 cache sizes are necessarily used in current kernels for a given model - by all indications the CPU identification code was imported from some other project originally.</p>"},{"location":"developers/#running-openblas-tests","title":"Running OpenBLAS tests","text":"<p>We use tests for Netlib BLAS, CBLAS, and LAPACK. In addition, we use OpenBLAS-specific regression tests. They can be run with Make:</p> <ul> <li><code>make -C test</code> for BLAS tests</li> <li><code>make -C ctest</code> for CBLAS tests</li> <li><code>make -C utest</code> for OpenBLAS regression tests</li> <li><code>make lapack-test</code> for LAPACK tests</li> </ul> <p>We also use the BLAS-Tester tests for regression testing. It is basically the ATLAS test suite adapted for building with OpenBLAS.</p> <p>The project makes use of several Continuous Integration (CI) services conveniently interfaced with GitHub to automatically run tests on a number of platforms and build configurations.</p> <p>Also note that the test suites included with \"numerically heavy\" projects like Julia, NumPy, SciPy, Octave or QuantumEspresso can be used for regression testing, when those projects are built such that they use OpenBLAS.</p>"},{"location":"developers/#benchmarking","title":"Benchmarking","text":"<p>A number of benchmarking methods are used by OpenBLAS:</p> <ul> <li>Several simple C benchmarks for performance testing individual BLAS functions   are available in the <code>benchmark</code> folder. They can be run locally through the   <code>Makefile</code> in that directory. And the <code>benchmark/scripts</code> subdirectory   contains similar benchmarks that use OpenBLAS via NumPy, SciPy, Octave and R.</li> <li>On pull requests, a representative set of functions is tested for performance   regressions with Codspeed; results can be viewed at   https://codspeed.io/OpenMathLib/OpenBLAS.</li> <li>The OpenMathLib/BLAS-Benchmarks repository   contains an Airspeed Velocity-based benchmark   suite which is run on several CPU architectures in cron jobs. Results are published   to a dashboard: http://www.openmathlib.org/BLAS-Benchmarks/.</li> </ul> <p>Benchmarking code for BLAS libraries, and specific performance analysis results, can be found in a number of places. For example:</p> <ul> <li>MatlabJuliaMatrixOperationsBenchmark   (various matrix operations in Julia and Matlab)</li> <li>mmperf/mmperf (single-core matrix multiplication)</li> </ul>"},{"location":"developers/#adding-autodetection-support-for-a-new-revision-or-variant-of-a-supported-cpu","title":"Adding autodetection support for a new revision or variant of a supported CPU","text":"<p>Especially relevant for x86-64, a new CPU model may be a \"refresh\" (die shrink and/or different number of cores) within an existing model family without significant changes to its instruction set (e.g., Intel Skylake and Kaby Lake still are fundamentally the same architecture as Haswell, low end Goldmont etc. are Nehalem). In this case, compilation with the appropriate older <code>TARGET</code> will already lead to a satisfactory build.</p> <p>To achieve autodetection of the new model, its CPUID (or an equivalent identifier) needs to be added in the <code>cpuid_&lt;architecture&gt;.c</code> relevant for its general architecture, with the returned name for the new type set appropriately. For x86, which has the most complex <code>cpuid</code> file, there are two functions that need to be edited: <code>get_cpuname()</code> to return, e.g., <code>CPUTYPE_HASWELL</code> and <code>get_corename()</code> for the (broader) core family returning, e.g., <code>CORE_HASWELL</code>.<sup>1</sup></p> <p>For architectures where <code>DYNAMIC_ARCH</code> builds are supported, a similar but simpler code section for the corresponding runtime detection of the CPU exists in <code>driver/others/dynamic.c</code> (for x86), and <code>driver/others/dynamic_&lt;arch&gt;.c</code> for other architectures. Note that for x86 the CPUID is compared after splitting it into its family, extended family, model and extended model parts, so the single decimal number returned by Linux in <code>/proc/cpuinfo</code> for the model has to be converted back to hexadecimal before splitting into its constituent digits. For example, <code>142 == 8E</code> translates to extended model 8, model 14.</p>"},{"location":"developers/#adding-dedicated-support-for-a-new-cpu-model","title":"Adding dedicated support for a new CPU model","text":"<p>Usually it will be possible to start from an existing model, clone its <code>KERNEL</code> configuration file to the new name to use for this <code>TARGET</code> and eventually replace individual kernels with versions better suited for peculiarities of the new CPU model. In addition, it is necessary to add (or clone at first) the corresponding section of <code>GEMM_UNROLL</code> parameters in the top-level <code>param.h</code>, and possibly to add definitions such as <code>USE_TRMM</code> (governing whether <code>TRMM</code> functions use the respective <code>GEMM</code> kernel or a separate source file) to the <code>Makefile</code>s (and <code>CMakeLists.txt</code>) in the kernel directory. The new CPU name needs to be added to <code>TargetList.txt</code>, and the CPU auto-detection code used by the <code>getarch</code> helper program - contained in the <code>cpuid_&lt;architecture&gt;.c</code> file amended to include the CPUID (or equivalent) information processing required (see preceding section).</p>"},{"location":"developers/#adding-support-for-an-entirely-new-architecture","title":"Adding support for an entirely new architecture","text":"<p>This endeavour is best started by cloning the entire support structure for 32-bit ARM, and within that the ARMv5 CPU in particular, as this is implemented through plain C kernels only. An example providing a convenient \"shopping list\" can be seen in pull request #1526.</p> <ol> <li> <p>This information ends up in the <code>Makefile.conf</code> and <code>config.h</code> files generated by <code>getarch</code>. Failure to set either will typically lead to a missing definition of the <code>GEMM_UNROLL</code> parameters later in the build, as <code>getarch_2nd</code> will be unable to find a matching parameter section in <code>param.h</code>.\u00a0\u21a9</p> </li> </ol>"},{"location":"distributing/","title":"Redistributing OpenBLAS","text":"<p>Note</p> <p>This document contains recommendations only - packagers and other redistributors are in charge of how OpenBLAS is built and distributed in their systems, and may have good reasons to deviate from the guidance given on this page. These recommendations are aimed at general packaging systems, with a user base that typically is large, open source (or freely available at least), and doesn't behave uniformly or that the packager is directly connected with.*</p> <p>OpenBLAS has a large number of build-time options which can be used to change how it behaves at runtime, how artifacts or symbols are named, etc. Variation in build configuration can be necessary to acheive a given end goal within a distribution or as an end user. However, such variation can also make it more difficult to build on top of OpenBLAS and ship code or other packages in a way that works across many different distros. Here we provide guidance about the most important build options, what effects they may have when changed, and which ones to default to.</p> <p>The Make and CMake build systems provide equivalent options and yield more or less the same artifacts, but not exactly (the CMake builds are still experimental). You can choose either one and the options will function in the same way, however the CMake outputs may require some renaming. To review available build options, see <code>Makefile.rule</code> or <code>CMakeLists.txt</code> in the root of the repository.</p> <p>Build options typically fall into two categories: (a) options that affect the user interface, such as library and symbol names or APIs that are made available, and (b) options that affect performance and runtime behavior, such as threading behavior or CPU architecture-specific code paths. The user interface options are more important to keep aligned between distributions, while for the performance-related options there are typically more reasons to make choices that deviate from the defaults.</p> <p>Here are recommendations for user interface related packaging choices where it is not likely to be a good idea to deviate (typically these are the default settings):</p> <ol> <li>Include CBLAS. The CBLAS interface is widely used and it doesn't affect    binary size much, so don't turn it off.</li> <li>Include LAPACK and LAPACKE. The LAPACK interface is also widely used, and    while it does make up a significant part of the binary size of the installed    library, that does not outweigh the regression in usability when deviating    from the default here.<sup>1</sup></li> <li>Always distribute the pkg-config (<code>.pc</code>) and CMake <code>.cmake</code>) dependency    detection files. These files are used by build systems when users want to    link against OpenBLAS, and there is no benefit of leaving them out.</li> <li>Provide the LP64 interface by default, and if in addition to that you choose    to provide an ILP64 interface build as well, use a symbol suffix to avoid    symbol name clashes (see the next section).</li> </ol>"},{"location":"distributing/#ilp64-interface-builds","title":"ILP64 interface builds","text":"<p>The LP64 (32-bit integer) interface is the default build, and has well-established C and Fortran APIs as determined by the reference (Netlib) BLAS and LAPACK libraries. The ILP64 (64-bit integer) interface however does not have a standard API: symbol names and shared/static library names can be produced in multiple ways, and this tends to make it difficult to use. As of today there is an agreed-upon way of choosing names for OpenBLAS between a number of key users/redistributors, which is the closest thing to a standard that there is now. However, there is an ongoing standardization effort in the reference BLAS and LAPACK libraries, which differs from the current OpenBLAS agreed-upon convention. In this section we'll aim to explain both.</p> <p>Those two methods are fairly similar, and have a key thing in common: using a symbol suffix. This is good practice; it is recommended that if you distribute an ILP64 build, to have it use a symbol suffix containing <code>64</code> in the name. This avoids potential symbol clashes when different packages which depend on OpenBLAS load both an LP64 and an ILP64 library into memory at the same time.</p>"},{"location":"distributing/#the-current-openblas-agreed-upon-ilp64-convention","title":"The current OpenBLAS agreed-upon ILP64 convention","text":"<p>This convention comprises the shared library name and the symbol suffix in the shared library. The symbol suffix to use is <code>64_</code>, implying that the library name will be <code>libopenblas64_.so</code> and the symbols in that library end in <code>64_</code>. The central issue where this was discussed is openblas#646, and adopters include Fedora, Julia, NumPy and SciPy - SuiteSparse already used it as well.</p> <p>To build shared and static libraries with the currently recommended ILP64 conventions with Make: <pre><code>$ make INTERFACE64=1 SYMBOLSUFFIX=64_\n</code></pre></p> <p>This will produce libraries named <code>libopenblas64_.so|a</code>, a pkg-config file named <code>openblas64.pc</code>, and CMake and header files.</p> <p>Installing locally and inspecting the output will show a few more details: <pre><code>$ make install PREFIX=$PWD/../openblas/make64 INTERFACE64=1 SYMBOLSUFFIX=64_\n$ tree .  # output slightly edited down\n.\n\u251c\u2500\u2500 include\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cblas.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 f77blas.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_config.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_utils.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapack.h\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas_config.h\n\u2514\u2500\u2500 lib\n    \u251c\u2500\u2500 cmake\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLASConfig.cmake\n    \u2502\u00a0\u00a0     \u2514\u2500\u2500 OpenBLASConfigVersion.cmake\n    \u251c\u2500\u2500 libopenblas64_.a\n    \u251c\u2500\u2500 libopenblas64_.so\n    \u2514\u2500\u2500 pkgconfig\n        \u2514\u2500\u2500 openblas64.pc\n</code></pre></p> <p>A key point are the symbol names. These will equal the LP64 symbol names, then (for Fortran only) the compiler mangling, and then the <code>64_</code> symbol suffix. Hence to obtain the final symbol names, we need to take into account which Fortran compiler we are using. For the most common cases (e.g., gfortran, Intel Fortran, or Flang), that means appending a single underscore. In that case, the result is:</p> base API name binary symbol name call from Fortran code call from C code <code>dgemm</code> <code>dgemm_64_</code> <code>dgemm_64(...)</code> <code>dgemm_64_(...)</code> <code>cblas_dgemm</code> <code>cblas_dgemm64_</code> n/a <code>cblas_dgemm64_(...)</code> <p>It is quite useful to have these symbol names be as uniform as possible across different packaging systems.</p> <p>The equivalent build options with CMake are: <pre><code>$ mkdir build &amp;&amp; cd build\n$ cmake .. -DINTERFACE64=1 -DSYMBOLSUFFIX=64_ -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON\n$ cmake --build . -j\n</code></pre></p> <p>Note that the result is not 100% identical to the Make result. For example, the library name ends in <code>_64</code> rather than <code>64_</code> - it is recommended to rename them to match the Make library names (also update the <code>libsuffix</code> entry in <code>openblas64.pc</code> to match that rename). <pre><code>$ cmake --install . --prefix $PWD/../../openblas/cmake64\n$ tree .\n.\n\u251c\u2500\u2500 include\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas64\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 cblas.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 f77blas.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_config.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_example_aux.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_utils.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapack.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 openblas64\n\u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u2514\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0     \u2514\u2500\u2500 openblas_config.h\n\u2514\u2500\u2500 lib\n    \u251c\u2500\u2500 cmake\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 OpenBLAS64\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64Config.cmake\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64ConfigVersion.cmake\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64Targets.cmake\n    \u2502\u00a0\u00a0     \u2514\u2500\u2500 OpenBLAS64Targets-noconfig.cmake\n    \u251c\u2500\u2500 libopenblas_64.a\n    \u251c\u2500\u2500 libopenblas_64.so -&gt; libopenblas_64.so.0\n    \u2514\u2500\u2500 pkgconfig\n        \u2514\u2500\u2500 openblas64.pc\n</code></pre></p>"},{"location":"distributing/#the-upcoming-standardized-ilp64-convention","title":"The upcoming standardized ILP64 convention","text":"<p>While the <code>64_</code> convention above got some adoption, it's slightly hacky and is implemented through the use of <code>objcopy</code>. An effort is ongoing for a more broadly adopted convention in the reference BLAS and LAPACK libraries, using (a) the <code>_64</code> suffix, and (b) applying that suffix before rather than after Fortran compiler mangling. The central issue for this is lapack#666.</p> <p>For the most common cases of compiler mangling (a single <code>_</code> appended), the end result will be:</p> base API name binary symbol name call from Fortran code call from C code <code>dgemm</code> <code>dgemm_64_</code> <code>dgemm_64(...)</code> <code>dgemm_64_(...)</code> <code>cblas_dgemm</code> <code>cblas_dgemm_64</code> n/a <code>cblas_dgemm_64(...)</code> <p>For other compiler mangling schemes, replace the trailing <code>_</code> by the scheme in use.</p> <p>The shared library name for this <code>_64</code> convention should be <code>libopenblas_64.so</code>.</p> <p>Note: it is not yet possible to produce an OpenBLAS build which employs this convention! Once reference BLAS and LAPACK with support for <code>_64</code> have been released, a future OpenBLAS release will support it. For now, please use the older <code>64_</code> scheme and avoid using the name <code>libopenblas_64.so</code>; it should be considered reserved for future use of the <code>_64</code> standard as prescribed by reference BLAS/LAPACK.</p>"},{"location":"distributing/#performance-and-runtime-behavior-related-build-options","title":"Performance and runtime behavior related build options","text":"<p>For these options there are multiple reasonable or common choices.</p>"},{"location":"distributing/#threading-related-options","title":"Threading related options","text":"<p>OpenBLAS can be built as a multi-threaded or single-threaded library, with the default being multi-threaded. It's expected that the default <code>libopenblas</code> library is multi-threaded; if you'd like to also distribute single-threaded builds, consider naming them <code>libopenblas_sequential</code>.</p> <p>OpenBLAS can be built with pthreads or OpenMP as the threading model, with the default being pthreads. Both options are commonly used, and the choice here should not influence the shared library name. The choice will be captured by the <code>.pc</code> file. E.g.,: <pre><code>$ pkg-config --libs openblas\n-fopenmp -lopenblas\n\n$ cat openblas.pc\n...\nopenblas_config= ... USE_OPENMP=0 MAX_THREADS=24\n</code></pre></p> <p>The maximum number of threads users will be able to use is determined at build time by the <code>NUM_THREADS</code> build option. It defaults to 24, and there's a wide range of values that are reasonable to use (up to 256). 64 is a typical choice here; there is a memory footprint penalty that is linear in <code>NUM_THREADS</code>. Please see <code>Makefile.rule</code> for more details.</p>"},{"location":"distributing/#cpu-architecture-related-options","title":"CPU architecture related options","text":"<p>OpenBLAS contains a lot of CPU architecture-specific optimizations, hence when distributing to a user base with a variety of hardware, it is recommended to enable CPU architecture runtime detection. This will dynamically select optimized kernels for individual APIs. To do this, use the <code>DYNAMIC_ARCH=1</code> build option. This is usually done on all common CPU families, except when there are known issues.</p> <p>In case the CPU architecture is known (e.g. you're building binaries for macOS M1 users), it is possible to specify the target architecture directly with the <code>TARGET=</code> build option.</p> <p><code>DYNAMIC_ARCH</code> and <code>TARGET</code> are covered in more detail in the main <code>README.md</code> in this repository.</p>"},{"location":"distributing/#real-world-examples","title":"Real-world examples","text":"<p>OpenBLAS is likely to be distributed in one of these distribution models:</p> <ol> <li>As a standalone package, or multiple packages, in a packaging ecosystem like    a Linux distro, Homebrew, conda-forge or MSYS2.</li> <li>Vendored as part of a larger package, e.g. in Julia, NumPy, SciPy, or R.</li> <li>Locally, e.g. making available as a build on a single HPC cluster.</li> </ol> <p>The guidance on this page is most important for models (1) and (2). These links to build recipes for a representative selection of packaging systems may be helpful as a reference:</p> <ul> <li>Fedora</li> <li>Debian</li> <li>Homebrew</li> <li>MSYS2</li> <li>conda-forge</li> <li>NumPy/SciPy</li> <li>Nixpkgs</li> </ul> <ol> <li> <p>All major distributions do include LAPACK as of mid 2023 as far as we know. Older versions of Arch Linux did not, and that was known to cause problems.\u00a0\u21a9</p> </li> </ol>"},{"location":"extensions/","title":"Extensions","text":"<p>OpenBLAS for the most part contains implementations of the reference (Netlib) BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions are also provided however, which mostly can be seen as \"BLAS extensions\". This page documents those non-standard APIs.</p>"},{"location":"extensions/#blas-like-extensions","title":"BLAS-like extensions","text":"Routine Data Types Description ?axpby s,d,c,z like axpy with a multiplier for y ?gemm3m c,z gemm3m ?imatcopy s,d,c,z in-place transpositon/copying ?omatcopy s,d,c,z out-of-place transpositon/copying ?geadd s,d,c,z matrix add ?gemmt s,d,c,z gemm but only a triangular part updated"},{"location":"extensions/#bfloat16-functionality","title":"bfloat16 functionality","text":"<p>BLAS-like and conversion functions for <code>bfloat16</code> (available when OpenBLAS was compiled with <code>BUILD_BFLOAT16=1</code>):</p> <ul> <li><code>void cblas_sbstobf16</code> converts a float array to an array of bfloat16 values by rounding</li> <li><code>void cblas_sbdtobf16</code> converts a double array to an array of bfloat16 values by rounding</li> <li><code>void cblas_sbf16tos</code> converts a bfloat16 array to an array of floats</li> <li><code>void cblas_dbf16tod</code> converts a bfloat16 array to an array of doubles</li> <li><code>float cblas_sbdot</code> computes the dot product of two bfloat16 arrays</li> <li><code>void cblas_sbgemv</code> performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16</li> <li><code>void cblas_sbgemm</code> performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16</li> </ul>"},{"location":"extensions/#utility-functions","title":"Utility functions","text":"<ul> <li><code>openblas_get_num_threads</code></li> <li><code>openblas_set_num_threads</code></li> <li><code>int openblas_get_num_procs(void)</code> returns the number of processors available on the system (may include \"hyperthreading cores\")</li> <li><code>int openblas_get_parallel(void)</code> returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading</li> <li><code>char * openblas_get_config()</code> returns the options OpenBLAS was built with, something like <code>NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell</code></li> <li><code>int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)</code> sets the CPU affinity mask of the given thread   to the provided cpuset. Only available on Linux, with semantics identical to <code>pthread_setaffinity_np</code>.</li> </ul>"},{"location":"faq/","title":"FAQ","text":"<ul> <li>General questions<ul> <li>What is BLAS? Why is it important?</li> <li>What functions are there and how can I call them from my C code?</li> <li>What is OpenBLAS? Why did you create this project?</li> <li>What's the difference between OpenBLAS and GotoBLAS?</li> <li>Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?</li> <li>How can I report a bug?</li> <li>How to reference OpenBLAS.</li> <li>How can I use OpenBLAS in multi-threaded applications?</li> <li>Does OpenBLAS support sparse matrices and/or vectors ?</li> <li>What support is there for recent PC hardware ? What about GPU ?</li> <li>How about the level 3 BLAS performance on Intel Sandy Bridge?</li> </ul> </li> <li>OS and Compiler<ul> <li>How can I call an OpenBLAS function in Microsoft Visual Studio?</li> <li>How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?</li> <li>I get a SEGFAULT with multi-threading on Linux. What's wrong?</li> <li>When I make the library, there is no such instruction: `xgetbv' error. What's wrong?</li> <li>My build fails due to the linker error \"multiple definition of `dlamc3_'\". What is the problem?</li> <li>My build worked fine and passed all tests, but running make lapack-test ends with segfaults</li> <li>How could I disable OpenBLAS threading affinity on runtime?</li> <li>How to solve undefined reference errors when statically linking against libopenblas.a</li> <li>Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6</li> <li>Building OpenBLAS in QEMU/KVM/XEN</li> <li>Building OpenBLAS on POWER fails with IBM XL</li> <li>Replacing system BLAS/updating APT OpenBLAS in Mint/Ubuntu/Debian</li> <li>I built OpenBLAS for use with some other software, but that software cannot find it</li> <li>I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it</li> <li>Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs</li> <li>Build fails with lots of errors about undefined ?GEMM_UNROLL_M</li> <li>CMAKE/OSX: Build fails with 'argument list too long'</li> <li>Likely problems with AVX2 support in Docker Desktop for OSX</li> </ul> </li> <li>Usage<ul> <li>Program is Terminated. Because you tried to allocate too many memory regions</li> <li>How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH</li> <li>After updating the installed OpenBLAS, a program complains about \"undefined symbol gotoblas\"</li> <li>How can I find out at runtime what options the library was built with ?</li> <li>After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?</li> <li>I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL</li> <li>Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board</li> <li>Speed varies wildly between individual runs on a typical ARMV8 smartphone processor</li> <li>I cannot get OpenBLAS to use more than a small subset of available cores on a big system</li> <li>Getting \"ELF load command address/offset not properly aligned\" when loading libopenblas.so<ul> <li>Using OpenBLAS with OpenMP</li> </ul> </li> </ul> </li> </ul>"},{"location":"faq/#general-questions","title":"General questions","text":""},{"location":"faq/#what-is-blas-why-is-it-important","title":"What is BLAS? Why is it important?","text":"<p>BLAS stands for Basic Linear Algebra Subprograms. BLAS provides standard interfaces for linear algebra, including BLAS1 (vector-vector operations), BLAS2 (matrix-vector operations), and BLAS3 (matrix-matrix operations). In general, BLAS is the computational kernel (\"the bottom of the food chain\") in linear algebra or scientific applications. Thus, if BLAS implementation is highly optimized, the whole application can get substantial benefit.</p>"},{"location":"faq/#what-functions-are-there-and-how-can-i-call-them-from-my-c-code","title":"What functions are there and how can I call them from my C code?","text":"<p>As BLAS is a standardized interface, you can refer to the documentation of its reference implementation at netlib.org. Calls from C go through its CBLAS interface, so your code will need to include the provided cblas.h in addition to linking with -lopenblas.  A single-precision matrix multiplication will look like <pre><code>#include &lt;cblas.h&gt;\n...\ncblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, 1.0, A, K, B, N, 0.0, result, N);\n</code></pre> where M,N,K are the dimensions of your data - see https://petewarden.files.wordpress.com/2015/04/gemm_corrected.png  (This image is part of an article on GEMM in the context of deep learning that is well worth reading in full - https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/)</p>"},{"location":"faq/#what-is-openblas-why-did-you-create-this-project","title":"What is OpenBLAS? Why did you create this project?","text":"<p>OpenBLAS is an open source BLAS library forked from the GotoBLAS2-1.13 BSD version. Since Mr. Kazushige Goto left TACC, GotoBLAS is no longer being maintained. Thus, we created this project to continue developing OpenBLAS/GotoBLAS.</p>"},{"location":"faq/#whats-the-difference-between-openblas-and-gotoblas","title":"What's the difference between OpenBLAS and GotoBLAS?","text":"<p>In OpenBLAS 0.2.0, we optimized level 3 BLAS on the Intel Sandy Bridge 64-bit OS. We obtained a performance comparable with that Intel MKL.  </p> <p>We optimized level 3 BLAS performance on the ICT Loongson-3A CPU. It outperformed GotoBLAS by 135% in a single thread and 120% in 4 threads.</p> <p>We fixed some GotoBLAS bugs including a SEGFAULT bug on the new Linux kernel, MingW32/64 bugs, and a ztrmm computing error bug on Intel Nehalem.</p> <p>We also added some minor features, e.g. supporting \"make install\", compiling without LAPACK and upgrading the LAPACK version to 3.4.2.</p> <p>You can find the full list of modifications in Changelog.txt.</p>"},{"location":"faq/#where-do-parameters-gemm_p-gemm_q-gemm_r-come-from","title":"Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?","text":"<p>The detailed explanation is probably in the original publication authored by Kazushige Goto - Goto, Kazushige; van de Geijn, Robert A; Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS). Volume 34 Issue 3, May 2008 While this article is paywalled and too old for preprints to be available on arxiv.org, more recent publications like https://arxiv.org/pdf/1609.00076 contain at least a brief description of the algorithm. In practice, the values are derived by experimentation to yield the block sizes that give the highest performance. A general rule of thumb for selecting a starting point seems to be that PxQ is about half the size of L2 cache.</p>"},{"location":"faq/#how-can-i-report-a-bug","title":"How can I report a bug?","text":"<p>Please file an issue at this issue page or send mail to the OpenBLAS mailing list.</p> <p>Please provide the following information: CPU, OS, compiler, and OpenBLAS compiling flags (Makefile.rule). In addition, please describe how to reproduce this bug.</p>"},{"location":"faq/#how-to-reference-openblas","title":"How to reference OpenBLAS.","text":"<p>You can reference our papers in this page. Alternatively, you can cite the OpenBLAS homepage  http://www.openblas.net.</p>"},{"location":"faq/#how-can-i-use-openblas-in-multi-threaded-applications","title":"How can I use OpenBLAS in multi-threaded applications?","text":"<p>If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.</p> <ul> <li>export OPENBLAS_NUM_THREADS=1 in the environment variables.  Or</li> <li>Call openblas_set_num_threads(1) in the application on runtime. Or</li> <li>Build OpenBLAS single thread version, e.g. make USE_THREAD=0 USE_LOCKING=1 (see comment below)</li> </ul> <p>If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1</p> <p>With the increased availability of fast multicore hardware it has unfortunately become clear that the thread management provided by OpenMP is not sufficient to prevent race conditions when OpenBLAS was built single-threaded by USE_THREAD=0 and there are concurrent calls from multiple threads to OpenBLAS functions. In this case, it is vital to also specify USE_LOCKING=1 (introduced with OpenBLAS 0.3.7). </p>"},{"location":"faq/#does-openblas-support-sparse-matrices-andor-vectors","title":"Does OpenBLAS support sparse matrices and/or vectors ?","text":"<p>OpenBLAS implements only the standard (dense) BLAS and LAPACK functions with a select few extensions popularized by Intel's MKL. Some cases can probably be made to work using e.g. GEMV or AXPBY, in general using a dedicated package like SuiteSparse (which can make use of OpenBLAS or equivalent for standard operations) is recommended.</p>"},{"location":"faq/#what-support-is-there-for-recent-pc-hardware-what-about-gpu","title":"What support is there for recent PC hardware ? What about GPU ?","text":"<p>As OpenBLAS is a volunteer project, it can take some time for the combination of a capable developer, free time, and particular hardware to come along, even for relatively common processors. Starting from 0.3.1, support is being added for AVX 512 (TARGET=SKYLAKEX), requiring a compiler that is capable of handling avx512 intrinsics. While AMD Zen processors should be autodetected by the build system, as of 0.3.2 they are still handled exactly  like Intel Haswell. There once was an effort to build an OpenCL implementation that one can still find at https://github.com/xianyi/clOpenBLAS , but work on this stopped in 2015.</p>"},{"location":"faq/#how-about-the-level-3-blas-performance-on-intel-sandy-bridge","title":"How about the level 3 BLAS performance on Intel Sandy Bridge?","text":"<p>We obtained a performance comparable with Intel MKL that actually outperformed Intel MKL in some cases. Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K Windows 7 SP1 64-bit: </p>"},{"location":"faq/#os-and-compiler","title":"OS and Compiler","text":""},{"location":"faq/#how-can-i-call-an-openblas-function-in-microsoft-visual-studio","title":"How can I call an OpenBLAS function in Microsoft Visual Studio?","text":"<p>Please read this page.</p>"},{"location":"faq/#how-can-i-use-cblas-and-lapacke-without-c99-complex-number-support-eg-in-visual-studio","title":"How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?","text":"<p>Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read this issue page for details.</p> <p>This issue is for using LAPACKE in Visual Studio.</p>"},{"location":"faq/#i-get-a-segfault-with-multi-threading-on-linux-whats-wrong","title":"I get a SEGFAULT with multi-threading on Linux. What's wrong?","text":"<p>This may be related to a bug in the Linux kernel 2.6.32 (?). Try applying the patch segaults.patch to disable mbind using</p> <pre><code> patch &lt; segfaults.patch\n</code></pre> <p>and see if the crashes persist. Note that this patch will lead to many compiler warnings.</p>"},{"location":"faq/#when-i-make-the-library-there-is-no-such-instruction-xgetbv-error-whats-wrong","title":"When I make the library, there is no such instruction: `xgetbv' error. What's wrong?","text":"<p>Please use GCC 4.4 and later version. This version supports xgetbv instruction. If you use the library for Sandy Bridge with AVX instructions, you should use GCC 4.6 and later version.</p> <p>On Mac OS X, please use Clang 3.1 and later version. For example, make CC=clang</p> <p>For the compatibility with old compilers (GCC &lt; 4.4), you can enable NO_AVX flag. For example, make NO_AVX=1</p>"},{"location":"faq/#my-build-fails-due-to-the-linker-error-multiple-definition-of-dlamc3_-what-is-the-problem","title":"My build fails due to the linker error \"multiple definition of `dlamc3_'\". What is the problem?","text":"<p>This linker error occurs if GNU patch is missing or if our patch for LAPACK fails to apply.</p> <p>Background: OpenBLAS implements optimized versions of some LAPACK functions, so we need to disable the reference versions.  If this process fails we end with duplicated implementations of the same function.</p>"},{"location":"faq/#my-build-worked-fine-and-passed-all-tests-but-running-make-lapack-test-ends-with-segfaults","title":"My build worked fine and passed all tests, but running <code>make lapack-test</code> ends with segfaults","text":"<p>Some of the LAPACK tests, notably in xeigtstz, try to allocate around 10MB on the stack. You may need to use <code>ulimit -s</code> to change the default limits on your system to allow this.</p>"},{"location":"faq/#how-could-i-disable-openblas-threading-affinity-on-runtime","title":"How could I disable OpenBLAS threading affinity on runtime?","text":"<p>You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running, <pre><code>export OPENBLAS_MAIN_FREE=1\n</code></pre></p> <p>Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule.</p>"},{"location":"faq/#how-to-solve-undefined-reference-errors-when-statically-linking-against-libopenblasa","title":"How to solve undefined reference errors when statically linking against libopenblas.a","text":"<p>On Linux, if OpenBLAS was compiled with threading support (<code>USE_THREAD=1</code> by default), custom programs statically linked against <code>libopenblas.a</code> should also link to the pthread library e.g.:</p> <pre><code>gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread\n</code></pre> <p>Failing to add the <code>-lpthread</code> flag will cause errors such as:</p> <pre><code>/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':\nmemory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `openblas_fork_handler':\nmemory.c:(.text+0x440): undefined reference to `pthread_atfork'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_memory_alloc':\nmemory.c:(.text+0x7a5): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0x825): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_shutdown':\nmemory.c:(.text+0x9e1): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0xa6e): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_server':\nblas_server.c:(.text+0x273): undefined reference to `pthread_mutex_lock'\nblas_server.c:(.text+0x287): undefined reference to `pthread_mutex_unlock'\nblas_server.c:(.text+0x33f): undefined reference to `pthread_cond_wait'\n/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_init':\nblas_server.c:(.text+0x416): undefined reference to `pthread_mutex_lock'\nblas_server.c:(.text+0x4be): undefined reference to `pthread_mutex_init'\nblas_server.c:(.text+0x4ca): undefined reference to `pthread_cond_init'\nblas_server.c:(.text+0x4e0): undefined reference to `pthread_create'\nblas_server.c:(.text+0x50f): undefined reference to `pthread_mutex_unlock'\n...\n</code></pre> <p>The <code>-lpthread</code> is not required when linking dynamically against <code>libopenblas.so.0</code>.</p>"},{"location":"faq/#building-openblas-for-haswell-or-dynamic-arch-on-rhel-6-centos-6-rocks-61scientific-linux-6","title":"Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6","text":"<p>Minimum requirement to actually run AVX2-enabled software like OpenBLAS is kernel-2.6.32-358, shipped with EL6U4 in 2013</p> <p>The <code>binutils</code> package from RHEL6 does not know the instruction <code>vpermpd</code> or any other AVX2 instruction. You can download a newer <code>binutils</code> package from Enterprise Linux software collections, following instructions here:  https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/  After configuring repository you need to install devtoolset-?-binutils to get later usable binutils package <pre><code>$ yum search devtoolset-\\?-binutils\n$ sudo yum install devtoolset-3-binutils\n</code></pre> once packages are installed check the correct name for SCL redirection set to enable new version <pre><code>$ scl --list\ndevtoolset-3\nrh-python35\n</code></pre> Now just prefix your build commands with respective redirection:  <pre><code>$ scl enable devtoolset-3 -- make DYNAMIC_ARCH=1\n</code></pre> AVX-512 (SKYLAKEX) support requires devtoolset-8-gcc-gfortran (which exceeds formal requirement for AVX-512 because of packaging issues in earlier packages) which dependency-installs respective binutils and gcc or later and kernel 2.6.32-696 aka 6U9 or 3.10.0-327 aka 7U2 or later to run. In absence of abovementioned toolset OpenBLAS will fall back to AVX2 instructions in place of AVX512 sacrificing some performance on SKYLAKE-X platform.</p>"},{"location":"faq/#building-openblas-in-qemukvmxen","title":"Building OpenBLAS in QEMU/KVM/XEN","text":"<p>By default, QEMU reports the CPU as \"QEMU Virtual CPU version 2.2.0\", which shares CPUID with existing 32bit CPU even in 64bit virtual machine, and OpenBLAS recognizes it as PENTIUM2. Depending on the exact combination of CPU features the hypervisor choses to expose, this may not correspond to any CPU that exists, and OpenBLAS will error when trying to build. To fix this, pass <code>-cpu host</code> or <code>-cpu passthough</code> to QEMU, or another CPU model. Similarly, the XEN hypervisor may not pass through all features of the host cpu while reporting the cpu type itself correctly, which can lead to compiler error messages about an \"ABI change\" when compiling AVX512 code. Again changing the Xen configuration by running e.g.  \"xen-cmdline --set-xen cpuid=avx512\" should get around this (as would building OpenBLAS for an older cpu lacking that particular feature, e.g. TARGET=HASWELL)</p>"},{"location":"faq/#building-openblas-on-power-fails-with-ibm-xl","title":"Building OpenBLAS on POWER fails with IBM XL","text":"<pre><code>Trying to compile OpenBLAS with IBM XL ends with error messages about unknown register names\n</code></pre> <p>like \"vs32\". Working around these by using known alternate names for the vector registers only leads to another assembler error about unsupported constraints. This is a known deficiency in the IBM compiler at least up to and including 16.1.0 (and in the POWER version of clang, from which it is derived) - use gcc instead. (See issues #1078 and #1699 for related discussions)</p>"},{"location":"faq/#replacing-system-blasupdating-apt-openblas-in-mintubuntudebian","title":"Replacing system BLAS/updating APT OpenBLAS in Mint/Ubuntu/Debian","text":"<p>Debian and Ubuntu LTS versions provide OpenBLAS package which is not updated after initial release, and under circumstances one might want to use more recent version of OpenBLAS e.g. to get support for newer CPUs</p> <p>Ubuntu and Debian provides 'alternatives' mechanism to comfortably replace BLAS and LAPACK libraries systemwide.</p> <p>After successful build of OpenBLAS (with DYNAMIC_ARCH set to 1)</p> <p><pre><code>$ make clean\n$ make DYNAMIC_ARCH=1\n$ sudo make DYNAMIC_ARCH=1 install\n</code></pre> One can redirect BLAS and LAPACK alternatives to point to source-built OpenBLAS First you have to install NetLib LAPACK reference implementation (to have alternatives to replace): <pre><code>$ sudo apt install libblas-dev liblapack-dev\n</code></pre> Then we can set alternative to our freshly-built library: <pre><code>$ sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0 41 \\\n   --slave /usr/lib/liblapack.so.3 liblapack.so.3 /opt/OpenBLAS/lib/libopenblas.so.0\n</code></pre> Or remove redirection and switch back to APT-provided BLAS implementation order: <pre><code>$ sudo update-alternatives --remove libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0\n</code></pre> In recent versions of the distributions, the installation path for the libraries has been changed to include the name of the host architecture, like /usr/lib/x86_64-linux-gnu/blas/libblas.so.3 or libblas.so.3.x86_64-linux-gnu. Use <code>$ update-alternatives --display libblas.so.3</code> to find out what layout your system has.</p>"},{"location":"faq/#i-built-openblas-for-use-with-some-other-software-but-that-software-cannot-find-it","title":"I built OpenBLAS for use with some other software, but that software cannot find it","text":"<p>Openblas installs as a single library named libopenblas.so, while some programs may be searching for a separate libblas.so and liblapack.so so you may need to create appropriate symbolic links (<code>ln -s libopenblas.so libblas.so; ln -s libopenblas.so liblapack.so</code>) or copies. Also make sure that the installation location (usually /opt/OpenBLAS/lib or /usr/local/lib) is among the library search paths of your system.</p>"},{"location":"faq/#i-included-cblash-in-my-program-but-the-compiler-complains-about-a-missing-commonh-or-functions-from-it","title":"I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it","text":"<p>You probably tried to include a cblas.h that you simply copied from the OpenBLAS source, instead you need to run <code>make install</code> after building OpenBLAS and then use the modified cblas.h that this step builds in the installation path (usually either /usr/local/include, /opt/OpenBLAS/include or whatever you specified as PREFIX= on the <code>make install</code>) </p>"},{"location":"faq/#compiling-openblas-with-gccs-fbounds-check-actually-triggers-aborts-in-programs","title":"Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs","text":"<p>This is due to different interpretations of the (informal) standard for passing characters as arguments between C and FORTRAN functions. As the method for storing text differs in the two languages, when C calls Fortran the text length is passed as an \"invisible\" additional parameter. Historically, this has not been required when the text is just a single character, so older code like the Reference-LAPACK bundled with OpenBLAS does not do it. Recently gcc's checking has changed to require it, but there is no consensus yet if and how the existing LAPACK (and many other codebases) should adapt. (And for actual compilation, gcc has mostly backtracked and provided compatibility options - hence the default build settings in the OpenBLAS Makefiles add -fno-optimize-sibling-calls to the gfortran options to prevent miscompilation with \"affected\" versions. See ticket 2154 in the issue tracker for more details and links) </p>"},{"location":"faq/#build-fails-with-lots-of-errors-about-undefined-gemm_unroll_m","title":"Build fails with lots of errors about undefined ?GEMM_UNROLL_M","text":"<p>Your cpu is apparently too new to be recognized by the build scripts, so they failed to assign appropriate parameters for the block algorithm. Do a <code>make clean</code> and try again with TARGET set to one of the cpu models listed in <code>TargetList.txt</code> - for x86_64 this will usually be HASWELL.</p>"},{"location":"faq/#cmakeosx-build-fails-with-argument-list-too-long","title":"CMAKE/OSX: Build fails with 'argument list too long'","text":"<p>This is a limitation in the maximum length of a command on OSX, coupled with how CMAKE works. You should be able to work around this by adding the option <code>-DCMAKE_Fortran_USE_RESPONSE_FILE_FOR_OBJECTS=1</code> to your CMAKE arguments.</p>"},{"location":"faq/#likely-problems-with-avx2-support-in-docker-desktop-for-osx","title":"Likely problems with AVX2 support in Docker Desktop for OSX","text":"<p>There have been a few reports of wrong calculation results and build-time test failures when building in a container environment managed by the OSX version of Docker Desktop, which uses the xhyve virtualizer underneath. Judging from these reports, AVX2 support in xhyve appears to be subtly broken but a corresponding ticket in the xhyve issue tracker has not drawn any reaction or comment since 2019. Therefore it is strongly recommended to build OpenBLAS with the NO_AVX2=1 option when inside a container under (or for later use with) the Docker Desktop environment on Intel-based Apple hardware.</p>"},{"location":"faq/#usage","title":"Usage","text":""},{"location":"faq/#program-is-terminated-because-you-tried-to-allocate-too-many-memory-regions","title":"Program is Terminated. Because you tried to allocate too many memory regions","text":"<p>In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following. <pre><code>#define NUM_BUFFERS (MAX_CPU_NUMBER * 2)\n</code></pre> This error indicates that the program exceeded the number of buffers.</p> <p>Please build OpenBLAS with larger <code>NUM_THREADS</code>. For example, <code>make NUM_THREADS=32</code> or <code>make NUM_THREADS=64</code>. In <code>Makefile.system</code>, we will set <code>MAX_CPU_NUMBER=NUM_THREADS</code>.</p>"},{"location":"faq/#how-to-choose-target-manually-at-runtime-when-compiled-with-dynamic_arch","title":"How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH","text":"<p>The environment variable which control the kernel selection is <code>OPENBLAS_CORETYPE</code> (see <code>driver/others/dynamic.c</code>) e.g. <code>export OPENBLAS_CORETYPE=Haswell</code>. And the function <code>char* openblas_get_corename()</code> returns the used target.</p>"},{"location":"faq/#after-updating-the-installed-openblas-a-program-complains-about-undefined-symbol-gotoblas","title":"After updating the installed OpenBLAS, a program complains about \"undefined symbol gotoblas\"","text":"<p>This symbol gets defined only when OpenBLAS is built with \"make DYNAMIC_ARCH=1\" (which is what distributors will choose to ensure support for more than just one CPU type).</p>"},{"location":"faq/#how-can-i-find-out-at-runtime-what-options-the-library-was-built-with","title":"How can I find out at runtime what options the library was built with ?","text":"<p>OpenBLAS has two utility functions that may come in here:</p> <p>openblas_get_parallel() will return 0 for a single-threaded library, 1 if multithreading without OpenMP, 2 if built with USE_OPENMP=1</p> <p>openblas_get_config() will return a string containing settings such as USE64BITINT or DYNAMIC_ARCH that were active at build time, as well as the target cpu (or in case of a dynamic_arch build, the currently detected one).</p>"},{"location":"faq/#after-making-openblas-i-find-that-the-static-library-is-multithreaded-but-the-dynamic-one-is-not","title":"After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?","text":"<p>The shared OpenBLAS library you built is probably working fine as well, but your program may be picking up a different (probably single-threaded) version from one of the standard system paths like /usr/lib on startup. Running <code>ldd /path/to/your/program</code> will tell you which library the linkage loader will actually use.</p> <p>Specifying the \"correct\" library location with the <code>-L</code> flag (like <code>-L /opt/OpenBLAS/lib</code>) when linking your program only defines which library will be used to see if all symbols can be resolved, you will need to add an rpath entry to the binary (using <code>-Wl,rpath=/opt/OpenBLAS/lib</code>) to make it request searching that location. Alternatively, remove the \"wrong old\" library (if you can), or set LD_LIBRARY_PATH to the desired location before running your program.</p>"},{"location":"faq/#i-want-to-use-openblas-with-cuda-in-the-hpl-23-benchmark-code-but-it-keeps-looking-for-intel-mkl","title":"I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL","text":"<p>You need to edit file src/cuda/cuda_dgemm.c in the NVIDIA version of HPL, change the \"handle2\" and \"handle\" dlopen calls to use libopenblas.so instead of libmkl_intel_lp64.so, and add an trailing underscore in the dlsym lines for dgemm_mkl and dtrsm_mkl (like  <code>dgemm_mkl = (void(*)())dlsym(handle, \u201cdgemm_\u201d);</code>)</p>"},{"location":"faq/#multithreaded-openblas-runs-no-faster-or-is-even-slower-than-singlethreaded-on-my-armv7-board","title":"Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board","text":"<p>The power saving mechanisms of your board may have shut down some cores, making them invisible to OpenBLAS in its startup phase. Try bringing them online before starting your calculation.</p>"},{"location":"faq/#speed-varies-wildly-between-individual-runs-on-a-typical-armv8-smartphone-processor","title":"Speed varies wildly between individual runs on a typical ARMV8 smartphone processor","text":"<p>Check the technical specifications, it could be that the SoC combines fast and slow cpus and threads can end up on either. In that case, binding the process to specific cores e.g. by setting <code>OMP_PLACES=cores</code> may help. (You may need to experiment with OpenMP options, it has been reported that using <code>OMP_NUM_THREADS=2 OMP_PLACES=cores</code> caused a huge drop in performance on a 4+4 core chip while <code>OMP_NUM_THREADS=2 OMP_PLACES=cores(2)</code> worked as intended - as did OMP_PLACES=cores with 4 threads)</p>"},{"location":"faq/#i-cannot-get-openblas-to-use-more-than-a-small-subset-of-available-cores-on-a-big-system","title":"I cannot get OpenBLAS to use more than a small subset of available cores on a big system","text":"<p>Multithreading support in OpenBLAS requires the use of internal buffers for sharing partial results, the number and size of which is defined at compile time. Unless you specify NUM_THREADS in your make or cmake command, the build scripts try to autodetect the number of cores available in your build host to size the library to match. This unfortunately means that if you move the resulting binary from a small \"front-end node\" to a larger \"compute node\" later, it will still be limited to the hardware capabilities of the original system. The solution is to set NUM_THREADS to a number big enough to encompass the biggest systems you expect to run the binary on - at runtime, it will scale down the maximum number of threads it uses to match the number of cores physically available.</p>"},{"location":"faq/#getting-elf-load-command-addressoffset-not-properly-aligned-when-loading-libopenblasso","title":"Getting \"ELF load command address/offset not properly aligned\" when loading libopenblas.so","text":"<p>If you get a message \"error while loading shared libraries: libopenblas.so.0: ELF load command address/offset not properly aligned\" when starting a program that is (dynamically) linked to OpenBLAS, this is very likely due to a bug in the GNU linker (ld) that is part of the GNU binutils package. This error was specifically observed on older versions of Ubuntu Linux updated with the (at the time) most recent binutils version 2.38, but an internet search turned up sporadic reports involving various other libraries dating back several years. A bugfix was created by the binutils developers and should be available in later versions of binutils.(See issue 3708 for details)</p>"},{"location":"faq/#using-openblas-with-openmp","title":"Using OpenBLAS with OpenMP","text":"<p>OpenMP provides its own locking mechanisms, so when your code makes BLAS/LAPACK calls from inside OpenMP parallel regions it is imperative that you use an OpenBLAS that is built with USE_OPENMP=1, as otherwise deadlocks might occur. Furthermore, OpenBLAS will automatically restrict itself to using only a single thread when called from an OpenMP parallel region. When it is certain that calls will only occur from the main thread of your program (i.e. outside of omp parallel constructs), a standard pthreads build of OpenBLAS can be used as well. In that case it may be useful to tune the linger behaviour of idle threads in both your OpenMP program (e.g. set OMP_WAIT_POLICY=passive) and OpenBLAS (by redefining the THREAD_TIMEOUT variable at build time, or setting the environment variable OPENBLAS_THREAD_TIMEOUT smaller than the default 26) so that the two alternating thread pools do not unnecessarily hog the cpu during the handover.</p>"},{"location":"install/","title":"Install OpenBLAS","text":"<p>Note</p> <p>Lists of precompiled packages are not comprehensive, is not meant to validate nor endorse a particular third-party build over others, and may not always lead to the newest version</p>"},{"location":"install/#quick-install","title":"Quick install","text":"<p>Precompiled packages have recently become available for a number of platforms through their normal installation procedures, so for users of desktop devices at least, the instructions below are mostly relevant when you want to try the most recent development snapshot from git. See your platform's relevant \"Precompiled packages\" section.</p> <p>The Conda-Forge project maintains packages for the conda package manager at https://github.com/conda-forge/openblas-feedstock.</p>"},{"location":"install/#source","title":"Source","text":"<p>Download the latest stable version from release page.</p>"},{"location":"install/#platforms","title":"Platforms","text":""},{"location":"install/#linux","title":"Linux","text":"<p>Just type <code>make</code> to compile the library.</p> <p>Notes:</p> <ul> <li>OpenBLAS doesn't support g77. Please use gfortran or other Fortran compilers. e.g. <code>make FC=gfortran</code>.</li> <li>When building in an emulator (KVM,QEMU etc.) make sure that the combination of CPU features exposed to   the virtual environment matches that of an existing CPU to allow detection of the cpu model to succeed.    (With qemu, this can be done by passing <code>-cpu host</code> or a supported model name at invocation)</li> </ul>"},{"location":"install/#precompiled-packages","title":"Precompiled packages","text":""},{"location":"install/#debianubuntumintkali","title":"Debian/Ubuntu/Mint/Kali","text":"<p>OpenBLAS package is available in default repositories and can act as default BLAS in system</p> <p>Example installation commands: <pre><code>$ sudo apt update\n$ apt search openblas\n$ sudo apt install libopenblas-dev\n$ sudo update-alternatives --config libblas.so.3\n</code></pre>  Alternatively, if distributor's package proves unsatisfactory, you may try latest version of OpenBLAS, Following guide in OpenBLAS FAQ</p>"},{"location":"install/#opensusesle","title":"openSuSE/SLE","text":"<p>Recent OpenSUSE versions include OpenBLAS in default repositories and also permit OpenBLAS to act as replacement of system-wide BLAS.</p> <p>Example installation commands: <pre><code>$ sudo zypper ref\n$ zypper se openblas\n$ sudo zypper in openblas-devel\n$ sudo update-alternatives --config libblas.so.3\n</code></pre> Should you be using older OpenSUSE or SLE that provides no OpenBLAS, you can attach optional or experimental openSUSE repository as a new package source to acquire recent build of OpenBLAS following instructions on openSUSE software site</p>"},{"location":"install/#fedoracentosrhel","title":"Fedora/CentOS/RHEL","text":"<p>Fedora provides OpenBLAS in default installation repositories.</p> <p>To install it try following: <pre><code>$ dnf search openblas\n$ dnf install openblas-devel\n</code></pre> For CentOS/RHEL/Scientific Linux packages are provided via Fedora EPEL repository</p> <p>After adding repository and repository keys installation is pretty straightforward: <pre><code>$ yum search openblas\n$ yum install openblas-devel\n</code></pre> No alternatives mechanism is provided for BLAS, and packages in system repositories are linked against NetLib BLAS or ATLAS BLAS libraries. You may wish to re-package RPMs to use OpenBLAS instead as described here</p>"},{"location":"install/#mageia","title":"Mageia","text":"<p>Mageia offers ATLAS and NetLIB LAPACK in base repositories. You can build your own OpenBLAS replacement, and once installed in /opt TODO: populate /usr/lib64 /usr/include accurately to replicate netlib with update-alternatives</p>"},{"location":"install/#archmanjaroantergos","title":"Arch/Manjaro/Antergos","text":"<pre><code>$ sudo pacman -S openblas\n</code></pre>"},{"location":"install/#windows","title":"Windows","text":"<p>The precompiled binaries available with each release (in https://github.com/xianyi/OpenBLAS/releases) are created with MinGW using an option list of  \"NUM_THREADS=64 TARGET=GENERIC DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 CONSISTENT_FPCSR=1\" - they should work on any x86 or x86_64 computer. The zip archive contains the include files, static and dll libraries as well as configuration files for getting them found via CMAKE or pkgconfig - just create a suitable folder for your OpenBLAS installation and unzip it there. (Note that you will need to edit the provided openblas.pc and OpenBLASConfig.cmake to reflect the installation path on your computer, as distributed they have \"win\" or \"win64\" reflecting the local paths on the system they were built on). Some programs will expect the DLL name to be lapack.dll, blas.dll, or (in the case of the statistics package \"R\") even Rblas.dll to act as a direct replacement for whatever other implementation of BLAS and LAPACK they use by default. Just copy the openblas.dll to the desired name(s). Note that the provided binaries are built with INTERFACE64=0, meaning they use standard 32bit integers for array indexing and the like (as is the default for most if not all BLAS and LAPACK implementations). If the documentation of whatever program you are using with OpenBLAS mentions 64bit integers (INTERFACE64=1) for addressing huge matrix sizes, you will need to build OpenBLAS from source (or open an issue ticket to make the demand for such a precompiled build known).</p>"},{"location":"install/#precompiled-packages_1","title":"Precompiled packages","text":"<ul> <li>http://sourceforge.net/projects/openblas/files</li> <li>https://www.nuget.org/packages?q=openblas</li> </ul>"},{"location":"install/#visual-studio","title":"Visual Studio","text":"<p>As of OpenBLAS v0.2.15, we support MinGW and Visual Studio (using CMake to generate visual studio solution files \u2013 note that you will need at least version 3.11 of CMake for linking to work correctly) to build OpenBLAS on Windows.</p> <p>Note that you need a Fortran compiler if you plan to build and use the LAPACK functions included with OpenBLAS. The sections below describe using either <code>flang</code> as an add-on to clang/LLVM or <code>gfortran</code> as part of MinGW for this purpose. If you want to use the Intel Fortran compiler <code>ifort</code> for this, be sure to also use the Intel C compiler <code>icc</code> for building the C parts, as the ABI imposed by <code>ifort</code> is incompatible with <code>msvc</code>.</p>"},{"location":"install/#1-native-msvc-abi","title":"1. Native (MSVC) ABI","text":"<p>A fully-optimized OpenBLAS that can be statically or dynamically linked to your application can currently be built for the 64-bit architecture with the LLVM compiler infrastructure. We're going to use Miniconda3 to grab all of the tools we need, since some of them are in an experimental status. Before you begin, you'll need to have Microsoft Visual Studio 2015 or newer installed.</p> <ol> <li>Install Miniconda3 for 64 bits using <code>winget install --id Anaconda.Miniconda3</code> or easily download from conda.io.</li> <li>Open the \"Anaconda Command Prompt,\" now available in the Start Menu, or at <code>%USERPROFILE%\\miniconda3\\shell\\condabin\\conda-hook.ps1</code>.</li> <li>In that command prompt window, use <code>cd</code> to change to the directory where you want to build OpenBLAS</li> <li>Now install all of the tools we need:</li> </ol> <pre><code>conda update -n base conda\nconda config --add channels conda-forge\nconda install -y cmake flang clangdev perl libflang ninja\n</code></pre> <ol> <li>Still in the Anaconda Command Prompt window, activate the MSVC environment for 64 bits with <code>vcvarsall x64</code>. On Windows 11 with Visual Studio 2022, this would be done by invoking:</li> </ol> <pre><code>\"c:\\Program Files\\Microsoft Visual Studio\\2022\\Preview\\vc\\Auxiliary\\Build\\vcvars64.bat\"\n</code></pre> <p>With VS2019, the command should be the same \u2013 except for the year number, obviously. For other/older versions of MSVC,    the VS documentation or a quick search on the web should turn up the exact wording you need.</p> <p>Confirm that the environment is active by typing <code>link</code> \u2013 this should return a long list of possible options for the <code>link</code> command. If it just     returns \"command not found\" or similar, review and retype the call to vcvars64.bat.    NOTE: if you are working from a Visual Studio Command prompt window instead (so that you do not have to do the vcvars call), you need to invoke     <code>conda activate</code> so that CONDA_PREFIX etc. get set up correctly before proceeding to step 6. Failing to do so will lead to link errors like     libflangmain.lib not getting found later in the build.</p> <ol> <li>Now configure the project with CMake. Starting in the project directory, execute the following:</li> </ol> <pre><code>set \"LIB=%CONDA_PREFIX%\\Library\\lib;%LIB%\"\nset \"CPATH=%CONDA_PREFIX%\\Library\\include;%CPATH%\"\nmkdir build\ncd build\ncmake .. -G \"Ninja\" -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_C_COMPILER=clang-cl -DCMAKE_Fortran_COMPILER=flang -DCMAKE_MT=mt -DBUILD_WITHOUT_LAPACK=no -DNOFORTRAN=0 -DDYNAMIC_ARCH=ON -DCMAKE_BUILD_TYPE=Release\n</code></pre> <p>You may want to add further options in the <code>cmake</code> command here \u2013 for instance, the default only produces a static .lib version of the library. If you would rather have a DLL, add -DBUILD_SHARED_LIBS=ON above. Note that this step only creates some command files and directories, the actual build happens next.</p> <ol> <li>Build the project:</li> </ol> <p><pre><code>cmake --build . --config Release\n</code></pre>    This step will create the OpenBLAS library in the \"lib\" directory, and various build-time tests in the <code>test</code>, <code>ctest</code> and <code>openblas_utest</code> directories. However it will not separate the header files you might need for building your own programs from those used internally. To put all relevant files in a more convenient arrangement, run the next step.</p> <ol> <li>Install all relevant files created by the build</li> </ol> <p><pre><code>cmake --install . --prefix c:\\opt -v\n</code></pre>    This will copy all files that are needed for building and running your own programs with OpenBLAS to the given location, creating appropriate subdirectories for the individual kinds of files. In the case of \"C:\\opt\" as given above, this would be C:\\opt\\include\\openblas for the header files,     C:\\opt\\bin for the libopenblas.dll and C:\\opt\\lib for the static library. C:\\opt\\share holds various support files that enable other cmake-based build scripts to find OpenBLAS automatically.</p>"},{"location":"install/#visual-studio-2017-c2017-standard","title":"Visual studio 2017+ (C++2017 standard)","text":"<p>In newer visual studio versions, Microsoft has changed how it handles complex types. Even when using a precompiled version of OpenBLAS, you might need to define <code>LAPACK_COMPLEX_CUSTOM</code> in order to define complex types properly for MSVC. For example, some variant of the following might help:</p> <pre><code>#if defined(_MSC_VER)\n    #include &lt;complex.h&gt;\n    #define LAPACK_COMPLEX_CUSTOM\n    #define lapack_complex_float _Fcomplex\n    #define lapack_complex_double _Dcomplex\n#endif\n</code></pre> <p>For reference, see https://github.com/xianyi/OpenBLAS/issues/3661, https://github.com/Reference-LAPACK/lapack/issues/683, and https://stackoverflow.com/questions/47520244/using-openblas-lapacke-in-visual-studio.</p>"},{"location":"install/#cmake-and-visual-studio","title":"CMake and Visual Studio","text":"<p>To build OpenBLAS for the 32-bit architecture, you'll need to use the builtin Visual Studio compilers.</p> <p>Note</p> <p>This method may produce binaries which demonstrate significantly lower performance than those built with the other methods. (The Visual Studio compiler does not support the dialect of assembly used in the cpu-specific optimized files, so only the \"generic\" TARGET which is written in pure C will get built. For the same reason it is not possible (and not necessary) to use -DDYNAMIC_ARCH=ON in a Visual Studio build) You may consider building for the 32-bit architecture using the GNU (MinGW) ABI.</p>"},{"location":"install/#1-install-cmake-at-windows","title":"# 1. Install CMake at Windows","text":""},{"location":"install/#2-use-cmake-to-generate-visual-studio-solution-files","title":"# 2. Use CMake to generate Visual Studio solution files","text":"<pre><code># Do this from Powershell so cmake can find visual studio\ncmake -G \"Visual Studio 14 Win64\" -DCMAKE_BUILD_TYPE=Release .\n</code></pre>"},{"location":"install/#build-the-solution-at-visual-studio","title":"Build the solution at Visual Studio","text":"<p>Note that this step depends on perl, so you'll need to install perl for windows, and put perl on your path so VS can start perl (http://stackoverflow.com/questions/3051049/active-perl-installation-on-windows-operating-system).</p> <p>Step 2 will build the OpenBLAS solution, open it in VS, and build the projects. Note that the dependencies do not seem to be automatically configured: if you try to build libopenblas directly, it will fail with a message saying that some .obj files aren't found, but if you build the projects libopenblas depends on before building libopenblas, the build will succeed.</p>"},{"location":"install/#build-openblas-for-universal-windows-platform","title":"Build OpenBLAS for Universal Windows Platform","text":"<p>OpenBLAS can be built for use on the Universal Windows Platform using a two step process since commit c66b842.</p>"},{"location":"install/#1-follow-steps-1-and-2-above-to-build-the-visual-studio-solution-files-for-windows-this-builds-the-helper-executables-which-are-required-when-building-the-openblas-visual-studio-solution-files-for-uwp-in-step-2","title":"# 1. Follow steps 1 and 2 above to build the Visual Studio solution files for Windows. This builds the helper executables which are required when building the OpenBLAS Visual Studio solution files for UWP in step 2.","text":""},{"location":"install/#2-remove-the-generated-cmakecachetxt-and-cmakefiles-directory-from-the-openblas-source-directory-and-re-run-cmake-with-the-following-options","title":"# 2. Remove the generated CMakeCache.txt and CMakeFiles directory from the OpenBLAS source directory and re-run CMake with the following options:","text":"<pre><code># do this to build UWP compatible solution files\ncmake -G \"Visual Studio 14 Win64\" -DCMAKE_SYSTEM_NAME=WindowsStore -DCMAKE_SYSTEM_VERSION=\"10.0\" -DCMAKE_SYSTEM_PROCESSOR=AMD64 -DVS_WINRT_COMPONENT=TRUE -DCMAKE_BUILD_TYPE=Release .\n</code></pre>"},{"location":"install/#build-the-solution-with-visual-studio","title":"# Build the solution with Visual Studio","text":"<p>This will build the OpenBLAS binaries with the required settings for use with UWP.</p>"},{"location":"install/#2-gnu-mingw-abi","title":"2. GNU (MinGW) ABI","text":"<p>The resulting library can be used in Visual Studio, but it can only be linked dynamically. This configuration has not been thoroughly tested and should be considered experimental.</p>"},{"location":"install/#incompatible-x86-calling-conventions","title":"Incompatible x86 calling conventions","text":"<p>Due to incompatibilities between the calling conventions of MinGW and Visual Studio you will need to make the following modifications ( 32-bit only ):</p> <ol> <li>Use the newer GCC 4.7.0. The older GCC (&lt;4.7.0) has an ABI incompatibility for returning aggregate structures larger than 8 bytes with MSVC. </li> </ol>"},{"location":"install/#build-openblas-on-windows-os","title":"Build OpenBLAS on Windows OS","text":"<ol> <li>Install the MinGW (GCC) compiler suite, either 32-bit (http://www.mingw.org/) or 64-bit (http://mingw-w64.sourceforge.net/). Be sure to install its gfortran package as well (unless you really want to build the BLAS part of OpenBLAS only) and check that gcc and gfortran are the same version \u2013 mixing compilers from different sources or release versions can lead to strange error messages in the linking stage. In addition, please install MSYS with MinGW.</li> <li>Build OpenBLAS in the MSYS shell. Usually, you can just type \"make\". OpenBLAS will detect the compiler and CPU automatically. </li> <li>After the build is complete, OpenBLAS will generate the static library \"libopenblas.a\" and the shared dll library \"libopenblas.dll\" in the folder. You can type \"make PREFIX=/your/installation/path install\" to install the library to a certain location.</li> </ol> <p>Note</p> <p>We suggest using official MinGW or MinGW-w64 compilers. A user reported that s/he met <code>Unhandled exception</code> by other compiler suite. https://groups.google.com/forum/#!topic/openblas-users/me2S4LkE55w</p> <p>Note also that older versions of the alternative builds of mingw-w64 available through http://www.msys2.org may contain a defect that leads to a compilation failure accompanied by the error message <pre><code>&lt;command-line&gt;:0:4: error: expected identifier or '(' before numeric constant\n</code></pre> If you encounter this, please upgrade your msys2 setup or see https://github.com/xianyi/OpenBLAS/issues/1503 for a workaround.</p>"},{"location":"install/#generate-import-library-before-0210-version","title":"Generate import library (before 0.2.10 version)","text":"<ol> <li>First, you will need to have the <code>lib.exe</code> tool in the Visual Studio command prompt. </li> <li>Open the command prompt and type <code>cd OPENBLAS_TOP_DIR/exports</code>, where OPENBLAS_TOP_DIR is the main folder of your OpenBLAS installation.</li> <li>For a 32-bit library, type <code>lib /machine:i386 /def:libopenblas.def</code>. For 64-bit, type <code>lib /machine:X64 /def:libopenblas.def</code>.</li> <li>This will generate the import library \"libopenblas.lib\" and the export library \"libopenblas.exp\" in OPENBLAS_TOP_DIR/exports. Although these two files have the same name, they are totally different.</li> </ol>"},{"location":"install/#generate-import-library-0210-and-after-version","title":"Generate import library (0.2.10 and after version)","text":"<ol> <li>OpenBLAS already generated the import library \"libopenblas.dll.a\" for \"libopenblas.dll\".</li> </ol>"},{"location":"install/#generate-windows-native-pdb-files-from-gccgfortran-build","title":"generate windows native PDB files from gcc/gfortran build","text":"<p>Tool to do so is available at https://github.com/rainers/cv2pdb</p>"},{"location":"install/#use-openblas-dll-library-in-visual-studio","title":"Use OpenBLAS .dll library in Visual Studio","text":"<ol> <li>Copy the import library (before 0.2.10: \"OPENBLAS_TOP_DIR/exports/libopenblas.lib\", 0.2.10 and after: \"OPENBLAS_TOP_DIR/libopenblas.dll.a\") and .dll library \"libopenblas.dll\" into the same folder(The folder of your project that is going to use the BLAS library. You may need to add the libopenblas.dll.a to the linker input list: properties-&gt;Linker-&gt;Input).</li> <li>Please follow the documentation about using third-party .dll libraries in MS Visual Studio 2008 or 2010.  Make sure to link against a library for the correct architecture.  For example, you may receive an error such as \"The application was unable to start correctly (0xc000007b)\" which typically indicates a mismatch between 32/64-bit libraries.</li> </ol> <p>Note</p> <p>If you need CBLAS, you should include cblas.h in /your/installation/path/include in Visual Studio.  Please read this page.</p>"},{"location":"install/#limitations","title":"Limitations","text":"<ul> <li>Both static and dynamic linking are supported with MinGW.  With Visual Studio, however, only dynamic linking is supported and so you should use the import library.</li> <li>Debugging from Visual Studio does not work because MinGW and Visual Studio have incompatible formats for debug information (PDB vs. DWARF/STABS).  You should either debug with GDB on the command-line or with a visual frontend, for instance Eclipse or Qt Creator.</li> </ul>"},{"location":"install/#windows-on-arm","title":"Windows on Arm","text":""},{"location":"install/#prerequisites","title":"Prerequisites","text":"<p>Following tools needs to be installed</p>"},{"location":"install/#1-download-and-install-clang-for-windows-on-arm","title":"1. Download and install clang for windows on arm","text":"<p>Find the latest LLVM build for WoA from LLVM release page</p> <p>E.g: LLVM 12 build for WoA64 can be found here</p> <p>Run the LLVM installer and ensure that LLVM is added to environment PATH.</p>"},{"location":"install/#2-download-and-install-classic-flang-for-windows-on-arm","title":"2. Download and install classic flang for windows on arm","text":"<p>Classic flang is the only available FORTRAN compiler for windows on arm for now and a pre-release build can be found here</p> <p>There is no installer for classic flang and the zip package can be extracted and the path needs to be added to environment PATH.</p> <p>E.g: on PowerShell</p> <pre><code>$env:Path += \";C:\\flang_woa\\bin\"\n</code></pre>"},{"location":"install/#build","title":"Build","text":"<p>The following steps describe how to build the static library for OpenBLAS with and without LAPACK</p>"},{"location":"install/#1-build-openblas-static-library-with-blas-and-lapack-routines-with-make","title":"1. Build OpenBLAS static library with BLAS and LAPACK routines with Make","text":"<p>Following command can be used to build OpenBLAS static library with BLAS and LAPACK routines</p> <pre><code>$ make CC=\"clang-cl\" HOSTCC=\"clang-cl\" AR=\"llvm-ar\" BUILD_WITHOUT_LAPACK=0 NOFORTRAN=0 DYNAMIC_ARCH=0 TARGET=ARMV8 ARCH=arm64 BINARY=64 USE_OPENMP=0 PARALLEL=1 RANLIB=\"llvm-ranlib\" MAKE=make F_COMPILER=FLANG FC=FLANG FFLAGS_NOOPT=\"-march=armv8-a -cpp\" FFLAGS=\"-march=armv8-a -cpp\" NEED_PIC=0 HOSTARCH=arm64 libs netlib\n</code></pre>"},{"location":"install/#2-build-static-library-with-blas-routines-using-cmake","title":"2. Build static library with BLAS routines using CMake","text":"<p>Classic flang has compatibility issues with cmake hence only BLAS routines can be compiled with CMake</p> <pre><code>$ mkdir build\n$ cd build\n$ cmake ..  -G Ninja -DCMAKE_C_COMPILER=clang -DBUILD_WITHOUT_LAPACK=1 -DNOFORTRAN=1 -DDYNAMIC_ARCH=0 -DTARGET=ARMV8 -DARCH=arm64 -DBINARY=64 -DUSE_OPENMP=0 -DCMAKE_SYSTEM_PROCESSOR=ARM64 -DCMAKE_CROSSCOMPILING=1 -DCMAKE_SYSTEM_NAME=Windows\n$ cmake --build . --config Release\n</code></pre>"},{"location":"install/#getarchexe-execution-error","title":"<code>getarch.exe</code> execution error","text":"<p>If you notice that platform-specific headers by <code>getarch.exe</code> are not generated correctly, It could be due to a known debug runtime DLL issue for arm64 platforms. Please check out link for the workaround.</p>"},{"location":"install/#mingw-import-library","title":"MinGW import library","text":"<p>Microsoft Windows has this thing called \"import libraries\". You don't need it in MinGW because the <code>ld</code> linker from GNU Binutils is smart, but you may still want it for whatever reason.</p>"},{"location":"install/#make-the-def","title":"Make the <code>.def</code>","text":"<p>Import libraries are compiled from a list of what symbols to use, <code>.def</code>. This should be already in your <code>exports</code> directory: <code>cd OPENBLAS_TOP_DIR/exports</code>.</p>"},{"location":"install/#making-a-mingw-import-library","title":"Making a MinGW import library","text":"<p>MinGW import libraries have the suffix <code>.a</code>, same as static libraries. (It's actually more common to do <code>.dll.a</code>...)</p> <p>You need to first prepend <code>libopenblas.def</code> with a line <code>LIBRARY libopenblas.dll</code>:</p> <pre><code>cat &lt;(echo \"LIBRARY libopenblas.dll\") libopenblas.def &gt; libopenblas.def.1\nmv libopenblas.def.1 libopenblas.def\n</code></pre> <p>Now it probably looks like:</p> <pre><code>LIBRARY libopenblas.dll\nEXPORTS\n   caxpy=caxpy_  @1\n   caxpy_=caxpy_  @2\n       ...\n</code></pre> <p>Then, generate the import library: <code>dlltool -d libopenblas.def -l libopenblas.a</code></p> <p>Again, there is basically no point in making an import library for use in MinGW. It actually slows down linking.</p>"},{"location":"install/#making-a-msvc-import-library","title":"Making a MSVC import library","text":"<p>Unlike MinGW, MSVC absolutely requires an import library. Now the C ABI of MSVC and MinGW are actually identical, so linking is actually okay. (Any incompatibility in the C ABI would be a bug.)</p> <p>The import libraries of MSVC have the suffix <code>.lib</code>. They are generated from a <code>.def</code> file using MSVC's <code>lib.exe</code>. See the MSVC instructions.</p>"},{"location":"install/#notes","title":"Notes","text":"<ul> <li>Always remember that MinGW is not the same as MSYS2 or Cygwin. MSYS2 and Cygwin are full POSIX environments with a lot of magic such as <code>fork()</code> and its own <code>malloc()</code>. MinGW, which builds on the normal Microsoft C Runtime, has none of that. Be clear about which one you are building for.</li> </ul>"},{"location":"install/#mac-osx","title":"Mac OSX","text":"<p>If your CPU is Sandy Bridge, please use Clang version 3.1 and above. The Clang 3.0 will generate the wrong AVX binary code of OpenBLAS.</p>"},{"location":"install/#precompiled-packages_2","title":"Precompiled packages","text":"<p>https://www.macports.org/ports.php?by=name&amp;substr=openblas</p> <p><code>brew install openblas</code></p> <p>or using the conda package manager from https://github.com/conda-forge/miniforge#download (which also has packages for the new M1 cpu)</p> <p><code>conda install openblas</code></p>"},{"location":"install/#build-on-apple-m1","title":"Build on Apple M1","text":"<p>On newer versions of Xcode and on arm64, you might need to compile with a newer macOS target (11.0) than the default (10.8) with <code>MACOSX_DEPLOYMENT_TARGET=11.0</code>, or switch your command-line tools to use an older SDK (e.g., 13.1).</p> <ul> <li>without Fortran compiler \uff08cannot build LAPACK)   <pre><code>$ make CC=cc NOFORTRAN=1\n</code></pre></li> <li>with Fortran compiler (you could <code>brew install gfortran</code>) https://github.com/xianyi/OpenBLAS/issues/3032   <pre><code>$ export MACOSX_DEPLOYMENT_TARGET=11.0\n$ make CC=cc FC=gfortran\n</code></pre></li> </ul>"},{"location":"install/#android","title":"Android","text":""},{"location":"install/#prerequisites_1","title":"Prerequisites","text":"<p>In addition to the Android NDK, you will need both Perl and a C compiler on the build host as these are currently required by the OpenBLAS build environment.</p>"},{"location":"install/#building-with-android-ndk-using-clang-compiler","title":"Building with android NDK using clang compiler","text":"<p>Around version 11 Android NDKs stopped supporting gcc, so you would need to use clang to compile OpenBLAS. clang is supported from OpenBLAS 0.2.20 version onwards. See below sections on how to build with clang for ARMV7 and ARMV8 targets. The same basic principles as described below for ARMV8 should also apply to building an x86 or x86_64 version (substitute something like NEHALEM for the target instead of ARMV8 and replace all the aarch64 in the toolchain paths obviously) \"Historic\" notes: Since version 19 the default toolchain is provided as a standalone toolchain, so building one yourself following building a standalone toolchain should no longer be necessary. If you want to use static linking with an old NDK version older than about r17, you need to choose an API level below 23 currently due to NDK bug 272 (https://github.com/android-ndk/ndk/issues/272 , the libc.a lacks a definition of stderr) that will probably be fixed in r17 of the NDK.</p>"},{"location":"install/#build-armv7-with-clang","title":"Build ARMV7 with clang","text":"<p><pre><code>## Set path to ndk-bundle\nexport NDK_BUNDLE_DIR=/path/to/ndk-bundle\n\n## Set the PATH to contain paths to clang and arm-linux-androideabi-* utilities\nexport PATH=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH\n\n## Set LDFLAGS so that the linker finds the appropriate libgcc\nexport LDFLAGS=\"-L${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x\"\n\n## Set the clang cross compile flags\nexport CLANG_FLAGS=\"-target arm-linux-androideabi -marm -mfpu=vfp -mfloat-abi=softfp --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/\"\n\n#OpenBLAS Compile\nmake TARGET=ARMV7 ONLY_CBLAS=1 AR=ar CC=\"clang ${CLANG_FLAGS}\" HOSTCC=gcc ARM_SOFTFP_ABI=1 -j4\n</code></pre> On a Mac, it may also be necessary to give the complete path to the <code>ar</code> utility in the make command above, like so: <pre><code>AR=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc-ar\n</code></pre> otherwise you may get a linker error complaining about a \"malformed archive header name at 8\" when the native OSX ar command was invoked instead.</p>"},{"location":"install/#build-armv8-with-clang","title":"Build ARMV8 with clang","text":"<p><pre><code>## Set path to ndk-bundle\nexport NDK_BUNDLE_DIR=/path/to/ndk-bundle/\n\n## Export PATH to contain directories of clang and aarch64-linux-android-* utilities\nexport PATH=${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH\n\n## Setup LDFLAGS so that loader can find libgcc and pass -lm for sqrt\nexport LDFLAGS=\"-L${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x -lm\"\n\n## Setup the clang cross compile options\nexport CLANG_FLAGS=\"-target aarch64-linux-android --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm64 -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/\"\n\n## Compile\nmake TARGET=ARMV8 ONLY_CBLAS=1 AR=ar CC=\"clang ${CLANG_FLAGS}\" HOSTCC=gcc -j4\n</code></pre> Note: Using TARGET=CORTEXA57 in place of ARMV8 will pick up better optimized routines. Implementations for CORTEXA57 target is compatible with all other armv8 targets.</p> <p>Note: For NDK 23b, something as simple as  <pre><code>export PATH=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH\nmake HOSTCC=gcc CC=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang ONLY_CBLAS=1 TARGET=ARMV8\n</code></pre> appears to be sufficient on Linux.</p>"},{"location":"install/#alternative-script-which-was-tested-on-osx-with-ndk2136528147","title":"Alternative script which was tested on OSX with NDK(21.3.6528147)","text":"<p>This script will build openblas for 3 architecture (ARMV7,ARMV8,X86) and put them with <code>sudo make install</code> to <code>/opt/OpenBLAS/lib</code> <pre><code>export NDK=YOUR_PATH_TO_SDK/Android/sdk/ndk/21.3.6528147\nexport TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/darwin-x86_64\n\nmake clean\nmake \\\n    TARGET=ARMV7 \\\n    ONLY_CBLAS=1 \\\n    CC=\"$TOOLCHAIN\"/bin/armv7a-linux-androideabi21-clang \\\n    AR=\"$TOOLCHAIN\"/bin/arm-linux-androideabi-ar \\\n    HOSTCC=gcc \\\n    ARM_SOFTFP_ABI=1 \\\n    -j4\nsudo make install\n\nmake clean\nmake \\\n    TARGET=CORTEXA57 \\\n    ONLY_CBLAS=1 \\\n    CC=$TOOLCHAIN/bin/aarch64-linux-android21-clang \\\n    AR=$TOOLCHAIN/bin/aarch64-linux-android-ar \\\n    HOSTCC=gcc \\\n    -j4\nsudo make install\n\nmake clean\nmake \\\n    TARGET=ATOM \\\n    ONLY_CBLAS=1 \\\n    CC=\"$TOOLCHAIN\"/bin/i686-linux-android21-clang \\\n    AR=\"$TOOLCHAIN\"/bin/i686-linux-android-ar \\\n    HOSTCC=gcc \\\n    ARM_SOFTFP_ABI=1 \\\n    -j4\nsudo make install\n\n## This will build for x86_64 \nmake clean\nmake \\\n    TARGET=ATOM BINARY=64\\\n    ONLY_CBLAS=1 \\\n    CC=\"$TOOLCHAIN\"/bin/x86_64-linux-android21-clang \\\n    AR=\"$TOOLCHAIN\"/bin/x86_64-linux-android-ar \\\n    HOSTCC=gcc \\\n    ARM_SOFTFP_ABI=1 \\\n    -j4\nsudo make install\n</code></pre> Also you can find full list of target architectures in TargetsList.txt</p> <p>anything below this line should be irrelevant nowadays unless you need to perform software archeology</p>"},{"location":"install/#building-openblas-with-very-old-gcc-based-versions-of-the-ndk-without-fortran","title":"Building OpenBLAS with very old gcc-based versions of the NDK, without Fortran","text":"<p>The prebuilt Android NDK toolchains do not include Fortran, hence parts like LAPACK cannot be built. You can still build OpenBLAS without it. For instructions on how to build OpenBLAS with Fortran, see the next section.</p> <p>To use easily the prebuilt toolchains, follow building a standalone toolchain for your desired architecture. This would be <code>arm-linux-androideabi-gcc-4.9</code> for ARMV7 and <code>aarch64-linux-android-gcc-4.9</code> for ARMV8.</p> <p>You can build OpenBLAS (0.2.19 and earlier) with: <pre><code>## Add the toolchain to your path\nexport PATH=/path/to/standalone-toolchain/bin:$PATH\n\n## Build without Fortran for ARMV7\nmake TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1 libs\n## Build without Fortran for ARMV8\nmake TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1 libs\n</code></pre></p> <p>Since we are cross-compiling, we make the <code>libs</code> recipe, not <code>all</code>. Otherwise you will get errors when trying to link/run tests as versions up to and including 0.2.19 cannot build a shared library for Android. </p> <p>From 0.2.20 on, you should leave off the \"libs\" to get a full build, and you may want to use the softfp ABI instead of the deprecated hardfp one on ARMV7 so you would use  <pre><code>## Add the toolchain to your path\nexport PATH=/path/to/standalone-toolchain/bin:$PATH\n\n## Build without Fortran for ARMV7\nmake TARGET=ARMV7 ARM_SOFTFP_ABI=1 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1\n## Build without Fortran for ARMV8\nmake TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1\n</code></pre></p> <p>If you get an error about stdio.h not being found, you need to specify your sysroot in the CFLAGS argument to <code>make</code> like <code>CFLAGS=--sysroot=$NDK/platforms/android-16/arch-arm</code> When you are done, install OpenBLAS into the desired directory. Be sure to also use all command line options here that you specified for building, otherwise errors may occur as it tries to install things you did not build: <pre><code>make PREFIX=/path/to/install-dir TARGET=... install\n</code></pre></p>"},{"location":"install/#building-openblas-with-fortran","title":"Building OpenBLAS with Fortran","text":"<p>Instructions on how to build the GNU toolchains with Fortran can be found here. The Releases section provides prebuilt versions, use the standalone one.</p> <p>You can build OpenBLAS with: <pre><code>## Add the toolchain to your path\nexport PATH=/path/to/standalone-toolchain-with-fortran/bin:$PATH\n\n## Build with Fortran for ARMV7\nmake TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc FC=arm-linux-androideabi-gfortran libs\n## Build with LAPACK for ARMV8\nmake TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc FC=aarch64-linux-android-gfortran libs\n</code></pre></p> <p>As mentioned above you can leave off the <code>libs</code> argument here when building 0.2.20 and later, and you may want to add ARM_SOFTFP_ABI=1 when building for ARMV7.</p>"},{"location":"install/#linking-openblas-0219-and-earlier-for-armv7","title":"Linking OpenBLAS (0.2.19 and earlier) for ARMV7","text":"<p>If you are using <code>ndk-build</code>, you need to set the ABI to hard floating points in your Application.mk: <pre><code>APP_ABI := armeabi-v7a-hard\n</code></pre></p> <p>This will set the appropriate flags for you. If you are not using <code>ndk-build</code>, you will want to add the following flags: <pre><code>TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1\nTARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard\n</code></pre></p> <p>From 0.2.20 on, it is also possible to build for the softfp ABI by specifying ARM_SOFTFP_ABI=1 during the build. In that case, also make sure that all your dependencies are compiled with -mfloat-abi=softfp as well, as mixing \"hard\" and \"soft\" floating point ABIs in a program will make it crash.</p>"},{"location":"install/#iphoneios","title":"iPhone/iOS","text":"<p>As none of the current developers uses iOS, the following instructions are what was found to work in our Azure CI setup, but as far as we know this builds a fully working OpenBLAS for this platform.</p> <p>Go to the directory where you unpacked OpenBLAS,and enter the following commands: <pre><code>     CC=/Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang\n\nCFLAGS= -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch arm64 -miphoneos-version-min=10.0\n\nmake TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1\n</code></pre> Adjust MIN_IOS_VERSION as necessary for your installation, e.g. change the version number to the minimum iOS version you want to target and execute this file to build the library.</p>"},{"location":"install/#mips","title":"MIPS","text":"<p>For mips targets you will need latest toolchains P5600 - MTI GNU/Linux Toolchain I6400, P6600 - IMG GNU/Linux Toolchain</p> <p>The download link is below (http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-03/downloads.html)</p> <p>You can use following commandlines for builds</p> <pre><code>IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including \"bin\" directory -- for example, /opt/linux_toolchain/bin}\nIMG_GCC_PREFIX=mips-img-linux-gnu\nIMG_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}\n\nI6400 Build (n32):\nmake BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL -mabi=n32\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400\n\nI6400 Build (n64):\nmake BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400\n\nP6600 Build (n32):\nmake BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL -mabi=n32\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P6600\n\nP6600 Build (n64):\nmake BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=\"$CFLAGS\" LDFLAGS=\"$CFLAGS\" TARGET=P6600\n\nMTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including \"bin\" directory -- for example, /opt/linux_toolchain/bin}\nMTI_GCC_PREFIX=mips-mti-linux-gnu\nMTI_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}\n\nP5600 Build:\n\nmake BINARY=32 BINARY32=1 CC=$MTI_TOOLCHAIN-gcc AR=$MTI_TOOLCHAIN-ar FC=\"$MTI_TOOLCHAIN-gfortran -EL\"    RANLIB=$MTI_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P5600\n</code></pre>"},{"location":"install/#freebsd","title":"FreeBSD","text":"<p>You will need to install the following tools from the FreeBSD ports tree: * lang/gcc [1] * lang/perl5.12 * ftp/curl * devel/gmake * devel/patch</p> <p>To compile run the command:</p> <pre><code>$ gmake CC=gcc46 FC=gfortran46\n</code></pre> <p>Note that you need to build with GNU make and manually specify the compiler, otherwhise gcc 4.2 from the base system would be used.</p> <p>[1]: Removal of Fortran from the FreeBSD base system</p> <pre><code>pkg install openblas\n</code></pre> <p>see https://www.freebsd.org/ports/index.html</p>"},{"location":"install/#cortex-m","title":"Cortex-M","text":"<p>Cortex-M is a widely used microcontroller that is present in a variety of industrial and consumer electronics. A common variant of the Cortex-M is the STM32F4xx series. Here, we will give instructions for building for the STM32F4xx.</p> <p>First, install the embedded arm gcc compiler from the arm website. Then, create the following toolchain file and build as follows.</p> <pre><code># cmake .. -G Ninja -DCMAKE_C_COMPILER=arm-none-eabi-gcc -DCMAKE_TOOLCHAIN_FILE:PATH=\"toolchain.cmake\" -DNOFORTRAN=1 -DTARGET=ARMV5 -DEMBEDDED=1\n\nset(CMAKE_SYSTEM_NAME Generic)\nset(CMAKE_SYSTEM_PROCESSOR arm)\n\nset(CMAKE_C_COMPILER \"arm-none-eabi-gcc.exe\")\nset(CMAKE_CXX_COMPILER \"arm-none-eabi-g++.exe\")\n\nset(CMAKE_EXE_LINKER_FLAGS \"--specs=nosys.specs\" CACHE INTERNAL \"\")\n\nset(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)\nset(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)\nset(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)\nset(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)\n</code></pre> <p>In your embedded application, the following functions need to be provided for OpenBLAS to work correctly:</p> <pre><code>void free(void* ptr);\nvoid* malloc(size_t size);\n</code></pre> <p>Note</p> <p>If you are developing for an embedded platform, it is your responsibility to make sure that the device has sufficient memory for malloc calls. Libmemory provides one implementation of malloc for embedded platforms.</p>"},{"location":"user_manual/","title":"User manual","text":"<p>This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS, example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting tips. Compiling OpenBLAS is optional, since you may be able to install with a package manager.</p> <p>Note</p> <p>The OpenBLAS documentation does not contain API reference documentation for BLAS or LAPACK, since these are standardized APIs, the documentation for which can be found in other places. If you want to understand every BLAS and LAPACK function and definition, we recommend reading the Netlib BLAS  and Netlib LAPACK documentation.</p> <p>OpenBLAS does contain a limited number of functions that are non-standard, these are documented at OpenBLAS extension functions.</p>"},{"location":"user_manual/#compiling-openblas","title":"Compiling OpenBLAS","text":""},{"location":"user_manual/#normal-compile","title":"Normal compile","text":"<p>The default way to build and install OpenBLAS from source is with Make: <pre><code>make  # add `-j4` to compile in parallel with 4 processes\nmake install\n</code></pre></p> <p>By default, the CPU architecture is detected automatically when invoking <code>make</code>, and the build is optimized for the detected CPU. To override the autodetection, use the <code>TARGET</code> flag:</p> <p><pre><code># `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:\nmake TARGET=NEHALEM\n</code></pre> The full list of known target CPU architectures can be found in <code>TargetList.txt</code> in the root of the repository.</p>"},{"location":"user_manual/#cross-compile","title":"Cross compile","text":"<p>For a basic cross-compilation with Make, three steps need to be taken:</p> <ul> <li>Set the <code>CC</code> and <code>FC</code> environment variables to select the cross toolchains   for C and Fortran.</li> <li>Set the <code>HOSTCC</code> environment variable to select the host C compiler (i.e. the   regular C compiler for the machine on which you are invoking the build).</li> <li>Set <code>TARGET</code> explicitly to the CPU architecture on which the produced   OpenBLAS binaries will be used.</li> </ul>"},{"location":"user_manual/#cross-compilation-examples","title":"Cross-compilation examples","text":"<p>Compile the library for ARM Cortex-A9 linux on an x86-64 machine (note: install only <code>gnueabihf</code> versions of the cross toolchain - see this issue comment for why): <pre><code>make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9\n</code></pre></p> <p>Compile OpenBLAS for a loongson3a CPU on an x86-64 machine: <pre><code>make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A\n</code></pre></p> <p>Compile OpenBLAS for loongson3a CPU with the <code>loongcc</code> (based on Open64) compiler on an x86-64 machine: <pre><code>make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32\n</code></pre></p>"},{"location":"user_manual/#building-a-debug-version","title":"Building a debug version","text":"<p>Add <code>DEBUG=1</code> to your build command, e.g.: <pre><code>make DEBUG=1\n</code></pre></p>"},{"location":"user_manual/#install-to-a-specific-directory","title":"Install to a specific directory","text":"<p>Note</p> <p>Installing to a directory is optional; it is also possible to use the shared or static libraries directly from the build directory.</p> <p>Use <code>make install</code> with the <code>PREFIX</code> flag to install to a specific directory:</p> <pre><code>make install PREFIX=/path/to/installation/directory\n</code></pre> <p>The default directory is <code>/opt/OpenBLAS</code>.</p> <p>Important</p> <p>Note that any flags passed to <code>make</code> during build should also be passed to <code>make install</code> to circumvent any install errors, i.e. some headers not being copied over correctly.</p> <p>For more detailed information on building/installing from source, please read the Installation Guide.</p>"},{"location":"user_manual/#linking-to-openblas","title":"Linking to OpenBLAS","text":"<p>OpenBLAS can be used as a shared or a static library.</p>"},{"location":"user_manual/#link-a-shared-library","title":"Link a shared library","text":"<p>The shared library is normally called <code>libopenblas.so</code>, but not that the name may be different as a result of build flags used or naming choices by a distro packager (see [distributing.md] for details). To link a shared library named <code>libopenblas.so</code>, the flag <code>-lopenblas</code> is needed. To find the OpenBLAS headers, a <code>-I/path/to/includedir</code> is needed. And unless the library is installed in a directory that the linker searches by default, also <code>-L</code> and <code>-Wl,-rpath</code> flags are needed. For a source file <code>test.c</code> (e.g., the example code under Call CBLAS interface further down), the shared library can then be linked with: <pre><code>gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas\n</code></pre></p> <p>The <code>-Wl,-rpath,/your_path/OpenBLAS/lib</code> linker flag can be omitted if you ran <code>ldconfig</code> to update linker cache, put <code>/your_path/OpenBLAS/lib</code> in <code>/etc/ld.so.conf</code> or a file in <code>/etc/ld.so.conf.d</code>, or installed OpenBLAS in a location that is part of the <code>ld.so</code> default search path (usually <code>/lib</code>, <code>/usr/lib</code> and <code>/usr/local/lib</code>). Alternatively, you can set the environment variable <code>LD_LIBRARY_PATH</code> to point to the folder that contains <code>libopenblas.so</code>. Otherwise, the build may succeed but at runtime loading the library will fail with a message like: <pre><code>cannot open shared object file: no such file or directory\n</code></pre></p> <p>More flags may be needed, depending on how OpenBLAS was built:</p> <ul> <li>If <code>libopenblas</code> is multi-threaded, please add <code>-lpthread</code>.</li> <li>If the library contains LAPACK functions (usually also true), please add   <code>-lgfortran</code> (other Fortran libraries may also be needed, e.g. <code>-lquadmath</code>).   Note that if you only make calls to LAPACKE routines, i.e. your code has   <code>#include \"lapacke.h\"</code> and makes calls to methods like <code>LAPACKE_dgeqrf</code>,   then <code>-lgfortran</code> is not needed.</li> </ul> <p>Tip</p> <p>Usually a pkg-config file (e.g., <code>openblas.pc</code>) is installed together with a <code>libopenblas</code> shared library. pkg-config is a tool that will tell you the exact flags needed for linking. For example:</p> <pre><code>$ pkg-config --cflags openblas\n-I/usr/local/include\n$ pkg-config --libs openblas\n-L/usr/local/lib -lopenblas\n</code></pre>"},{"location":"user_manual/#link-a-static-library","title":"Link a static library","text":"<p>Linking a static library is simpler - add the path to the static OpenBLAS library to the compile command: <pre><code>gcc -o test test.c /your/path/libopenblas.a\n</code></pre></p>"},{"location":"user_manual/#code-examples","title":"Code examples","text":""},{"location":"user_manual/#call-cblas-interface","title":"Call CBLAS interface","text":"<p>This example shows calling <code>cblas_dgemm</code> in C:</p> <pre><code>#include &lt;cblas.h&gt;\n#include &lt;stdio.h&gt;\n\nvoid main()\n{\n  int i=0;\n  double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};         \n  double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};  \n  double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5}; \n  cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);\n\n  for(i=0; i&lt;9; i++)\n    printf(\"%lf \", C[i]);\n  printf(\"\\n\");\n}\n</code></pre> <p>To compile this file, save it as <code>test_cblas_dgemm.c</code> and then run: <pre><code>gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran\n</code></pre> will result in a <code>test_cblas_open</code> executable.</p>"},{"location":"user_manual/#call-blas-fortran-interface","title":"Call BLAS Fortran interface","text":"<p>This example shows calling the <code>dgemm</code> Fortran interface in C:</p> <pre><code>#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"sys/time.h\"\n#include \"time.h\"\n\nextern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);\n\nint main(int argc, char* argv[])\n{\n  int i;\n  printf(\"test!\\n\");\n  if(argc&lt;4){\n    printf(\"Input Error\\n\");\n    return 1;\n  }\n\n  int m = atoi(argv[1]);\n  int n = atoi(argv[2]);\n  int k = atoi(argv[3]);\n  int sizeofa = m * k;\n  int sizeofb = k * n;\n  int sizeofc = m * n;\n  char ta = 'N';\n  char tb = 'N';\n  double alpha = 1.2;\n  double beta = 0.001;\n\n  struct timeval start,finish;\n  double duration;\n\n  double* A = (double*)malloc(sizeof(double) * sizeofa);\n  double* B = (double*)malloc(sizeof(double) * sizeofb);\n  double* C = (double*)malloc(sizeof(double) * sizeofc);\n\n  srand((unsigned)time(NULL));\n\n  for (i=0; i&lt;sizeofa; i++)\n    A[i] = i%3+1;//(rand()%100)/10.0;\n\n  for (i=0; i&lt;sizeofb; i++)\n    B[i] = i%3+1;//(rand()%100)/10.0;\n\n  for (i=0; i&lt;sizeofc; i++)\n    C[i] = i%3+1;//(rand()%100)/10.0;\n  //#if 0\n  printf(\"m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\\n\",m,n,k,alpha,beta,sizeofc);\n  gettimeofday(&amp;start, NULL);\n  dgemm_(&amp;ta, &amp;tb, &amp;m, &amp;n, &amp;k, &amp;alpha, A, &amp;m, B, &amp;k, &amp;beta, C, &amp;m);\n  gettimeofday(&amp;finish, NULL);\n\n  duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;\n  double gflops = 2.0 * m *n*k;\n  gflops = gflops/duration*1.0e-6;\n\n  FILE *fp;\n  fp = fopen(\"timeDGEMM.txt\", \"a\");\n  fprintf(fp, \"%dx%dx%d\\t%lf s\\t%lf MFLOPS\\n\", m, n, k, duration, gflops);\n  fclose(fp);\n\n  free(A);\n  free(B);\n  free(C);\n  return 0;\n}\n</code></pre> <p>To compile this file, save it as <code>time_dgemm.c</code> and then run: <pre><code>gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread\n</code></pre> You can then run it as: <code>./time_dgemm &lt;m&gt; &lt;n&gt; &lt;k&gt;</code>, with <code>m</code>, <code>n</code>, and <code>k</code> input parameters to the <code>time_dgemm</code> executable.</p> <p>Note</p> <p>When calling the Fortran interface from C, you have to deal with symbol name differences caused by compiler conventions. That is why the <code>dgemm_</code> function call in the example above has a trailing underscore. This is what it looks like when using <code>gcc</code>/<code>gfortran</code>, however such details may change for different compilers. Hence it requires extra support code. The CBLAS interface may be more portable when writing C code.</p> <p>When writing code that needs to be portable and work across different platforms and compilers, the above code example is not recommended for usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since this problem isn't specific to OpenBLAS) functions are called in widely used projects like Julia, SciPy, or R.</p>"},{"location":"user_manual/#troubleshooting","title":"Troubleshooting","text":"<ul> <li>Please read the FAQ first, your problem may be described there.</li> <li>Please ensure you are using a recent enough compiler, that supports the   features your CPU provides (example: GCC versions before 4.6 were known to   not support AVX kernels, and before 6.1 AVX512CD kernels).</li> <li>The number of CPU cores supported by default is &lt;=256. On Linux x86-64, there   is experimental support for up to 1024 cores and 128 NUMA nodes if you build   the library with <code>BIGNUMA=1</code>.</li> <li>OpenBLAS does not set processor affinity by default. On Linux, you can enable   processor affinity by commenting out the line <code>NO_AFFINITY=1</code> in   <code>Makefile.rule</code>.</li> <li>On Loongson 3A, <code>make test</code> is known to fail with a <code>pthread_create</code> error   and an <code>EAGAIN</code> error code. However, it will be OK when you run the same   testcase in a shell.</li> </ul>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#introduction","title":"Introduction","text":"<p>OpenBLAS is an optimized Basic Linear Algebra Subprograms (BLAS) library based on GotoBLAS2 1.13 BSD version.</p> <p>OpenBLAS implements low-level routines for performing linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. OpenBLAS makes these routines available on multiple platforms, covering server, desktop and mobile operating systems, as well as different architectures including x86, ARM, MIPS, PPC, RISC-V, and zarch.</p> <p>The old GotoBLAS documentation can be found on GitHub.</p>"},{"location":"#license","title":"License","text":"<p>OpenBLAS is licensed under the 3-clause BSD license. The full license can be found on GitHub.</p>"},{"location":"about/","title":"About","text":""},{"location":"about/#mailing-list","title":"Mailing list","text":"<p>We have a GitHub discussions forum to discuss usage and development of OpenBLAS. We also have a Google group for users and a Google group for development of OpenBLAS. </p>"},{"location":"about/#acknowledgements","title":"Acknowledgements","text":"<p>This work was or is partially supported by the following grants, contracts and institutions:</p> <ul> <li>Research and Development of Compiler System and Toolchain for Domestic CPU, National S&amp;T Major Projects: Core Electronic Devices, High-end General Chips and Fundamental Software (No.2009ZX01036-001-002)</li> <li>National High-tech R&amp;D Program of China (Grant No.2012AA010903)</li> <li>PerfXLab</li> <li>Chan Zuckerberg Initiative's Essential Open Source Software for Science program:<ul> <li>Cycle 1 grant: Strengthening NumPy's foundations - growing beyond code (2019-2020)</li> <li>Cycle 3 grant: Improving usability and sustainability for NumPy and OpenBLAS (2020-2021)</li> </ul> </li> <li>Sovereign Tech Fund funding: Keeping high performance linear algebra computation accessible and open for all (2023-2024)</li> </ul> <p>Over the course of OpenBLAS development, a number of donations were received. You can read OpenBLAS's statement of receipts and disbursement and cash balance in this Google doc (covers 2013-2016). A list of backers is available in BACKERS.md in the main repo.</p>"},{"location":"about/#donations","title":"Donations","text":"<p>We welcome hardware donations, including the latest CPUs and motherboards.</p>"},{"location":"about/#open-source-users-of-openblas","title":"Open source users of OpenBLAS","text":"<p>Prominent open source users of OpenBLAS include:</p> <ul> <li>Julia - a high-level, high-performance dynamic programming language for technical computing</li> <li>NumPy - the fundamental package for scientific computing with Python</li> <li>SciPy - fundamental algorithms for scientific computing in Python</li> <li>R - a free software environment for statistical computing and graphics</li> <li>OpenCV - the world's biggest computer vision library</li> </ul> <p>OpenBLAS is packaged in most major Linux distros, as well as general and numerical computing-focused packaging ecosystems like Nix, Homebrew, Spack and conda-forge.</p> <p>OpenBLAS is used directly by libraries written in C, C++ and Fortran (and probably other languages), and directly by end users in those languages.</p>"},{"location":"about/#publications","title":"Publications","text":""},{"location":"about/#2013","title":"2013","text":"<ul> <li>Wang Qian, Zhang Xianyi, Zhang Yunquan, Qing Yi,  AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs, In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver CO, November 2013. [pdf]</li> </ul>"},{"location":"about/#2012","title":"2012","text":"<ul> <li>Zhang Xianyi, Wang Qian, Zhang Yunquan, Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17-19 Dec. 2012.</li> </ul>"},{"location":"build_system/","title":"Build system","text":"<p>This page describes the Make-based build, which is the default/authoritative build method. Note that the OpenBLAS repository also supports building with CMake (not described here) - that generally works and is tested, however there may be small differences between the Make and CMake builds.</p> <p>Warning</p> <p>This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.</p>"},{"location":"build_system/#makefile-dep-graph","title":"Makefile dep graph","text":"<pre><code>Makefile                                                        \n|                                                               \n|-----  Makefile.system # !!! this is included by many of the Makefiles in the subdirectories !!!\n|       |\n|       |=====  Makefile.prebuild # This is triggered (not included) once by Makefile.system \n|       |       |                 # and runs before any of the actual library code is built.\n|       |       |                 # (builds and runs the \"getarch\" tool for cpu identification,\n|       |       |                 # runs the compiler detection scripts c_check and f_check) \n|       |       |\n|       |       -----  (Makefile.conf) [ either this or Makefile_kernel.conf is generated ] \n|       |       |                            { Makefile.system#L243 }\n|       |       -----  (Makefile_kernel.conf) [ temporary Makefile.conf during DYNAMIC_ARCH builds ]\n|       |\n|       |-----  Makefile.rule # defaults for build options that can be given on the make command line\n|       |\n|       |-----  Makefile.$(ARCH) # architecture-specific compiler options and OpenBLAS buffer size values\n|\n|~~~~~ exports/\n|\n|~~~~~ test/\n|\n|~~~~~ utest/  \n|\n|~~~~~ ctest/\n|\n|~~~~~ cpp_thread_test/\n|\n|~~~~~ kernel/\n|\n|~~~~~ ${SUBDIRS}\n|\n|~~~~~ ${BLASDIRS}\n|\n|~~~~~ ${NETLIB_LAPACK_DIR}{,/timing,/testing/{EIG,LIN}}\n|\n|~~~~~ relapack/\n</code></pre>"},{"location":"build_system/#important-variables","title":"Important Variables","text":"<p>Most of the tunable variables are found in Makefile.rule, along with their detailed descriptions. Most of the variables are detected automatically in Makefile.prebuild, if they are not set in the environment.</p>"},{"location":"build_system/#cpu-related","title":"CPU related","text":"<pre><code>ARCH         - Target architecture (eg. x86_64)\nTARGET       - Target CPU architecture, in case of DYNAMIC_ARCH=1 means library will not be usable on less capable CPUs\nTARGET_CORE  - TARGET_CORE will override TARGET internally during each cpu-specific cycle of the build for DYNAMIC_ARCH\nDYNAMIC_ARCH - For building library for multiple TARGETs (does not lose any optimizations, but increases library size)\nDYNAMIC_LIST - optional user-provided subset of the DYNAMIC_CORE list in Makefile.system\n</code></pre>"},{"location":"build_system/#toolchain-related","title":"Toolchain related","text":"<pre><code>CC                 - TARGET C compiler used for compilation (can be cross-toolchains)\nFC                 - TARGET Fortran compiler used for compilation (can be cross-toolchains, set NOFORTRAN=1 if used cross-toolchain has no fortran compiler)\nAR, AS, LD, RANLIB - TARGET toolchain helpers used for compilation (can be cross-toolchains)\n\nHOSTCC             - compiler of build machine, needed to create proper config files for target architecture\nHOST_CFLAGS        - flags for build machine compiler\n</code></pre>"},{"location":"build_system/#library-related","title":"Library related","text":"<pre><code>BINARY          - 32/64 bit library\n\nBUILD_SHARED    - Create shared library\nBUILD_STATIC    - Create static library\n\nQUAD_PRECISION  - enable support for IEEE quad precision [ largely unimplemented leftover from GotoBLAS, do not use ]\nEXPRECISION     - Obsolete option to use float80 of SSE on BSD-like systems\nINTERFACE64     - Build with 64bit integer representations to support large array index values [ incompatible with standard API ]\n\nBUILD_SINGLE    - build the single-precision real functions of BLAS [and optionally LAPACK] \nBUILD_DOUBLE    - build the double-precision real functions\nBUILD_COMPLEX   - build the single-precision complex functions\nBUILD_COMPLEX16 - build the double-precision complex functions\n(all four types are included in the build by default when none was specifically selected)\n\nBUILD_BFLOAT16  - build the \"half precision brainfloat\" real functions \n\nUSE_THREAD      - Use a multithreading backend (default to pthread)\nUSE_LOCKING     - implement locking for thread safety even when USE_THREAD is not set (so that the singlethreaded library can\n                  safely be called from multithreaded programs)\nUSE_OPENMP      - Use OpenMP as multithreading backend\nNUM_THREADS     - define this to the maximum number of parallel threads you expect to need (defaults to the number of cores in the build cpu)\nNUM_PARALLEL    - define this to the number of OpenMP instances that your code may use for parallel calls into OpenBLAS (default 1,see below)\n</code></pre> <p>OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads. For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how many threads need to be supported on the target system(s).</p> <p>With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads. In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call. So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting <code>NUM_PARALLEL</code> to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available.</p>"},{"location":"ci/","title":"CI jobs","text":"Arch Target CPU OS Build system XComp to C Compiler Fortran Compiler threading DYN_ARCH INT64 Libraries CI Provider CPU  count x86_64 Intel 32bit Windows CMAKE/VS2015 - mingw6.3 - pthreads - - static Appveyor x86_64 Intel Windows CMAKE/VS2015 - mingw5.3 - pthreads - - static Appveyor x86_64 Intel Centos5 gmake - gcc 4.8 gfortran pthreads + - both Azure x86_64 SDE (SkylakeX) Ubuntu CMAKE - gcc gfortran pthreads - - both Azure x86_64 Haswell/ SkylakeX Windows CMAKE/VS2017 - VS2017 - - - static Azure x86_64 \" Windows mingw32-make - gcc gfortran list - both Azure x86_64 \" Windows CMAKE/Ninja - LLVM - - - static Azure x86_64 \" Windows CMAKE/Ninja - LLVM flang - - static Azure x86_64 \" Windows CMAKE/Ninja - VS2022 flang* - - static Azure x86_64 \" macOS11 gmake - gcc-10 gfortran OpenMP + - both Azure x86_64 \" macOS11 gmake - gcc-10 gfortran none - - both Azure x86_64 \" macOS12 gmake - gcc-12 gfortran pthreads - - both Azure x86_64 \" macOS11 gmake - llvm - OpenMP + - both Azure x86_64 \" macOS11 CMAKE - llvm - OpenMP no_avx512 - static Azure x86_64 \" macOS11 CMAKE - gcc-10 gfortran pthreads list - shared Azure x86_64 \" macOS11 gmake - llvm ifort pthreads - - both Azure x86_64 \" macOS11 gmake arm AndroidNDK-llvm - - - both Azure x86_64 \" macOS11 gmake arm64 XCode 12.4 - + - both Azure x86_64 \" macOS11 gmake arm XCode 12.4 - + - both Azure x86_64 \" Alpine Linux(musl) gmake - gcc gfortran pthreads + - both Azure arm64 Apple M1 OSX CMAKE/XCode - LLVM - OpenMP - - static Cirrus arm64 Apple M1 OSX CMAKE/Xcode - LLVM - OpenMP - + static Cirrus arm64 Apple M1 OSX CMAKE/XCode x86_64 LLVM - - + - static Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - pthreads - - both Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - pthreads - + both Cirrus arm64 Neoverse N1 Linux gmake - gcc10.2 - OpenMP - - both Cirrus 8 x86_64 Ryzen FreeBSD gmake - gcc12.2 gfortran pthreads - - both Cirrus x86_64 Ryzen FreeBSD gmake gcc12.2 gfortran pthreads - + both Cirrus x86_64 GENERIC QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 SICORTEX QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 I6400 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 P6600 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 I6500 QEMU gmake mips64 gcc gfortran pthreads - - static Github x86_64 Intel Ubuntu CMAKE - gcc-11.3 gfortran pthreads + - static Github x86_64 Intel Ubuntu gmake - gcc-11.3 gfortran pthreads + - both Github x86_64 Intel Ubuntu CMAKE - gcc-11.3 flang-classic pthreads + - static Github x86_64 Intel Ubuntu gmake - gcc-11.3 flang-classic pthreads + - both Github x86_64 Intel macOS12 CMAKE - AppleClang 14 gfortran pthreads + - static Github x86_64 Intel macOS12 gmake - AppleClang 14 gfortran pthreads + - both Github x86_64 Intel Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + + static Github x86_64 Intel 32bit Windows2022 CMAKE/Ninja - mingw gcc 13 gfortran + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - LLVM 16 - + - static Github x86_64 Intel Windows2022 CMAKE/Ninja - LLVM 16 - + + static Github x86_64 Intel Windows2022 CMAKE/Ninja - gcc 13 - + - static Github x86_64 Intel Ubuntu gmake mips64 gcc gfortran pthreads + - both Github x86_64 generic Ubuntu gmake riscv64 gcc gfortran pthreads - - both Github x86_64 Intel Ubuntu gmake mips32 gcc gfortran pthreads - - both Github x86_64 Intel Ubuntu gmake ia64 gcc gfortran pthreads - - both Github x86_64 C910V QEmu gmake riscv64 gcc gfortran pthreads - - both Github power pwr9 Ubuntu gmake - gcc gfortran OpenMP - - both OSUOSL zarch z14 Ubuntu gmake - gcc gfortran OpenMP - - both OSUOSL"},{"location":"developers/","title":"Developer manual","text":""},{"location":"developers/#source-code-layout","title":"Source code layout","text":"<pre><code>OpenBLAS/  \n\u251c\u2500\u2500 benchmark                  Benchmark codes for BLAS\n\u251c\u2500\u2500 cmake                      CMakefiles\n\u251c\u2500\u2500 ctest                      Test codes for CBLAS interfaces\n\u251c\u2500\u2500 driver                     Implemented in C\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 level2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 level3\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mapper\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 others                 Memory management, threading, etc\n\u251c\u2500\u2500 exports                    Generate shared library\n\u251c\u2500\u2500 interface                  Implement BLAS and CBLAS interfaces (calling driver or kernel)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapack\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 netlib\n\u251c\u2500\u2500 kernel                     Optimized assembly kernels for CPU architectures\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 alpha                  Original GotoBLAS kernels for DEC Alpha\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 arm                    ARMV5,V6,V7 kernels (including generic C codes used by other architectures)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 arm64                  ARMV8\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 generic                General kernel codes written in plain C, parts used by many architectures.\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 ia64                   Original GotoBLAS kernels for Intel Itanium\n\u2502   \u251c\u2500\u2500 mips\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mips64\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 power\n|   \u251c\u2500\u2500 riscv64\n|   \u251c\u2500\u2500 simd                   Common code for Universal Intrinsics, used by some x86_64 and arm64 kernels\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 sparc\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 x86\n\u2502   \u251c\u2500\u2500 x86_64\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 zarch   \n\u251c\u2500\u2500 lapack                      Optimized LAPACK codes (replacing those in regular LAPACK)\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getf2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getrf\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 getrs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 laswp\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lauu2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lauum\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 potf2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 potrf\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 trti2\n\u2502   \u251c\u2500\u2500 trtri\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 trtrs\n\u251c\u2500\u2500 lapack-netlib               LAPACK codes from netlib reference implementation\n\u251c\u2500\u2500 reference                   BLAS Fortran reference implementation (unused)\n\u251c\u2500\u2500 relapack                    Elmar Peise's recursive LAPACK (implemented on top of regular LAPACK)\n\u251c\u2500\u2500 test                        Test codes for BLAS\n\u2514\u2500\u2500 utest                       Regression test\n</code></pre> <p>A call tree for <code>dgemm</code> looks as follows: <pre><code>interface/gemm.c\n        \u2502\ndriver/level3/level3.c\n        \u2502\ngemm assembly kernels at kernel/\n</code></pre></p> <p>To find the kernel currently used for a particular supported CPU, please check the corresponding <code>kernel/$(ARCH)/KERNEL.$(CPU)</code> file.</p> <p>Here is an example for <code>kernel/x86_64/KERNEL.HASWELL</code>: <pre><code>...\nDTRMMKERNEL    =  dtrmm_kernel_4x8_haswell.c\nDGEMMKERNEL    =  dgemm_kernel_4x8_haswell.S\n...\n</code></pre> According to the above <code>KERNEL.HASWELL</code>, OpenBLAS Haswell dgemm kernel file is <code>dgemm_kernel_4x8_haswell.S</code>.</p>"},{"location":"developers/#optimizing-gemm-for-a-given-hardware","title":"Optimizing GEMM for a given hardware","text":"<p>Read the Goto paper to understand the algorithm</p> <p>Goto, Kazushige; van de Geijn, Robert A. (2008). \"Anatomy of High-Performance Matrix Multiplication\". ACM Transactions on Mathematical Software 34 (3): Article 12</p> <p>(The above link is available only to ACM members, but this and many related papers is also available on the pages of van de Geijn's FLAME project)</p> <p>The <code>driver/level3/level3.c</code> is the implementation of Goto's algorithm. Meanwhile, you can look at <code>kernel/generic/gemmkernel_2x2.c</code>, which is a naive <code>2x2</code> register blocking <code>gemm</code> kernel in C. Then:</p> <ul> <li>Write optimized assembly kernels. Consider instruction pipeline, available registers, memory/cache access.</li> <li>Tune cache block sizes (<code>Mc</code>, <code>Kc</code>, and <code>Nc</code>)</li> </ul> <p>Note that not all of the CPU-specific parameters in <code>param.h</code> are actively used in algorithms. <code>DNUMOPT</code> only appears as a scale factor in profiling output of the level3 <code>syrk</code> interface code, while its counterpart <code>SNUMOPT</code> (aliased as <code>NUMOPT</code> in <code>common.h</code>) is not used anywhere at all. </p> <p><code>SYMV_P</code> is only used in the generic kernels for the <code>symv</code> and <code>chemv</code>/<code>zhemv</code> functions - at least some of those are usually overridden by CPU-specific implementations, so if you start by cloning the existing implementation for a related CPU you need to check its <code>KERNEL</code> file to see if tuning <code>SYMV_P</code> would have any effect at all.</p> <p><code>GEMV_UNROLL</code> is only used by some older x86-64 kernels, so not all sections in <code>param.h</code> define it. Similarly, not all of the CPU parameters like L2 or L3 cache sizes are necessarily used in current kernels for a given model - by all indications the CPU identification code was imported from some other project originally.</p>"},{"location":"developers/#running-openblas-tests","title":"Running OpenBLAS tests","text":"<p>We use tests for Netlib BLAS, CBLAS, and LAPACK. In addition, we use OpenBLAS-specific regression tests. They can be run with Make:</p> <ul> <li><code>make -C test</code> for BLAS tests</li> <li><code>make -C ctest</code> for CBLAS tests</li> <li><code>make -C utest</code> for OpenBLAS regression tests</li> <li><code>make lapack-test</code> for LAPACK tests</li> </ul> <p>We also use the BLAS-Tester tests for regression testing. It is basically the ATLAS test suite adapted for building with OpenBLAS.</p> <p>The project makes use of several Continuous Integration (CI) services conveniently interfaced with GitHub to automatically run tests on a number of platforms and build configurations.</p> <p>Also note that the test suites included with \"numerically heavy\" projects like Julia, NumPy, SciPy, Octave or QuantumEspresso can be used for regression testing, when those projects are built such that they use OpenBLAS.</p>"},{"location":"developers/#benchmarking","title":"Benchmarking","text":"<p>A number of benchmarking methods are used by OpenBLAS:</p> <ul> <li>Several simple C benchmarks for performance testing individual BLAS functions   are available in the <code>benchmark</code> folder. They can be run locally through the   <code>Makefile</code> in that directory. And the <code>benchmark/scripts</code> subdirectory   contains similar benchmarks that use OpenBLAS via NumPy, SciPy, Octave and R.</li> <li>On pull requests, a representative set of functions is tested for performance   regressions with Codspeed; results can be viewed at   https://codspeed.io/OpenMathLib/OpenBLAS.</li> <li>The OpenMathLib/BLAS-Benchmarks repository   contains an Airspeed Velocity-based benchmark   suite which is run on several CPU architectures in cron jobs. Results are published   to a dashboard: http://www.openmathlib.org/BLAS-Benchmarks/.</li> </ul> <p>Benchmarking code for BLAS libraries, and specific performance analysis results, can be found in a number of places. For example:</p> <ul> <li>MatlabJuliaMatrixOperationsBenchmark   (various matrix operations in Julia and Matlab)</li> <li>mmperf/mmperf (single-core matrix multiplication)</li> </ul>"},{"location":"developers/#adding-autodetection-support-for-a-new-revision-or-variant-of-a-supported-cpu","title":"Adding autodetection support for a new revision or variant of a supported CPU","text":"<p>Especially relevant for x86-64, a new CPU model may be a \"refresh\" (die shrink and/or different number of cores) within an existing model family without significant changes to its instruction set (e.g., Intel Skylake and Kaby Lake still are fundamentally the same architecture as Haswell, low end Goldmont etc. are Nehalem). In this case, compilation with the appropriate older <code>TARGET</code> will already lead to a satisfactory build.</p> <p>To achieve autodetection of the new model, its CPUID (or an equivalent identifier) needs to be added in the <code>cpuid_&lt;architecture&gt;.c</code> relevant for its general architecture, with the returned name for the new type set appropriately. For x86, which has the most complex <code>cpuid</code> file, there are two functions that need to be edited: <code>get_cpuname()</code> to return, e.g., <code>CPUTYPE_HASWELL</code> and <code>get_corename()</code> for the (broader) core family returning, e.g., <code>CORE_HASWELL</code>.<sup>1</sup></p> <p>For architectures where <code>DYNAMIC_ARCH</code> builds are supported, a similar but simpler code section for the corresponding runtime detection of the CPU exists in <code>driver/others/dynamic.c</code> (for x86), and <code>driver/others/dynamic_&lt;arch&gt;.c</code> for other architectures. Note that for x86 the CPUID is compared after splitting it into its family, extended family, model and extended model parts, so the single decimal number returned by Linux in <code>/proc/cpuinfo</code> for the model has to be converted back to hexadecimal before splitting into its constituent digits. For example, <code>142 == 8E</code> translates to extended model 8, model 14.</p>"},{"location":"developers/#adding-dedicated-support-for-a-new-cpu-model","title":"Adding dedicated support for a new CPU model","text":"<p>Usually it will be possible to start from an existing model, clone its <code>KERNEL</code> configuration file to the new name to use for this <code>TARGET</code> and eventually replace individual kernels with versions better suited for peculiarities of the new CPU model. In addition, it is necessary to add (or clone at first) the corresponding section of <code>GEMM_UNROLL</code> parameters in the top-level <code>param.h</code>, and possibly to add definitions such as <code>USE_TRMM</code> (governing whether <code>TRMM</code> functions use the respective <code>GEMM</code> kernel or a separate source file) to the <code>Makefile</code>s (and <code>CMakeLists.txt</code>) in the kernel directory. The new CPU name needs to be added to <code>TargetList.txt</code>, and the CPU auto-detection code used by the <code>getarch</code> helper program - contained in the <code>cpuid_&lt;architecture&gt;.c</code> file amended to include the CPUID (or equivalent) information processing required (see preceding section).</p>"},{"location":"developers/#adding-support-for-an-entirely-new-architecture","title":"Adding support for an entirely new architecture","text":"<p>This endeavour is best started by cloning the entire support structure for 32-bit ARM, and within that the ARMv5 CPU in particular, as this is implemented through plain C kernels only. An example providing a convenient \"shopping list\" can be seen in pull request #1526.</p> <ol> <li> <p>This information ends up in the <code>Makefile.conf</code> and <code>config.h</code> files generated by <code>getarch</code>. Failure to set either will typically lead to a missing definition of the <code>GEMM_UNROLL</code> parameters later in the build, as <code>getarch_2nd</code> will be unable to find a matching parameter section in <code>param.h</code>.\u00a0\u21a9</p> </li> </ol>"},{"location":"distributing/","title":"Redistributing OpenBLAS","text":"<p>Note</p> <p>This document contains recommendations only - packagers and other redistributors are in charge of how OpenBLAS is built and distributed in their systems, and may have good reasons to deviate from the guidance given on this page. These recommendations are aimed at general packaging systems, with a user base that typically is large, open source (or freely available at least), and doesn't behave uniformly or that the packager is directly connected with.*</p> <p>OpenBLAS has a large number of build-time options which can be used to change how it behaves at runtime, how artifacts or symbols are named, etc. Variation in build configuration can be necessary to acheive a given end goal within a distribution or as an end user. However, such variation can also make it more difficult to build on top of OpenBLAS and ship code or other packages in a way that works across many different distros. Here we provide guidance about the most important build options, what effects they may have when changed, and which ones to default to.</p> <p>The Make and CMake build systems provide equivalent options and yield more or less the same artifacts, but not exactly (the CMake builds are still experimental). You can choose either one and the options will function in the same way, however the CMake outputs may require some renaming. To review available build options, see <code>Makefile.rule</code> or <code>CMakeLists.txt</code> in the root of the repository.</p> <p>Build options typically fall into two categories: (a) options that affect the user interface, such as library and symbol names or APIs that are made available, and (b) options that affect performance and runtime behavior, such as threading behavior or CPU architecture-specific code paths. The user interface options are more important to keep aligned between distributions, while for the performance-related options there are typically more reasons to make choices that deviate from the defaults.</p> <p>Here are recommendations for user interface related packaging choices where it is not likely to be a good idea to deviate (typically these are the default settings):</p> <ol> <li>Include CBLAS. The CBLAS interface is widely used and it doesn't affect    binary size much, so don't turn it off.</li> <li>Include LAPACK and LAPACKE. The LAPACK interface is also widely used, and    while it does make up a significant part of the binary size of the installed    library, that does not outweigh the regression in usability when deviating    from the default here.<sup>1</sup></li> <li>Always distribute the pkg-config (<code>.pc</code>) and CMake <code>.cmake</code>) dependency    detection files. These files are used by build systems when users want to    link against OpenBLAS, and there is no benefit of leaving them out.</li> <li>Provide the LP64 interface by default, and if in addition to that you choose    to provide an ILP64 interface build as well, use a symbol suffix to avoid    symbol name clashes (see the next section).</li> </ol>"},{"location":"distributing/#ilp64-interface-builds","title":"ILP64 interface builds","text":"<p>The LP64 (32-bit integer) interface is the default build, and has well-established C and Fortran APIs as determined by the reference (Netlib) BLAS and LAPACK libraries. The ILP64 (64-bit integer) interface however does not have a standard API: symbol names and shared/static library names can be produced in multiple ways, and this tends to make it difficult to use. As of today there is an agreed-upon way of choosing names for OpenBLAS between a number of key users/redistributors, which is the closest thing to a standard that there is now. However, there is an ongoing standardization effort in the reference BLAS and LAPACK libraries, which differs from the current OpenBLAS agreed-upon convention. In this section we'll aim to explain both.</p> <p>Those two methods are fairly similar, and have a key thing in common: using a symbol suffix. This is good practice; it is recommended that if you distribute an ILP64 build, to have it use a symbol suffix containing <code>64</code> in the name. This avoids potential symbol clashes when different packages which depend on OpenBLAS load both an LP64 and an ILP64 library into memory at the same time.</p>"},{"location":"distributing/#the-current-openblas-agreed-upon-ilp64-convention","title":"The current OpenBLAS agreed-upon ILP64 convention","text":"<p>This convention comprises the shared library name and the symbol suffix in the shared library. The symbol suffix to use is <code>64_</code>, implying that the library name will be <code>libopenblas64_.so</code> and the symbols in that library end in <code>64_</code>. The central issue where this was discussed is openblas#646, and adopters include Fedora, Julia, NumPy and SciPy - SuiteSparse already used it as well.</p> <p>To build shared and static libraries with the currently recommended ILP64 conventions with Make: <pre><code>$ make INTERFACE64=1 SYMBOLSUFFIX=64_\n</code></pre></p> <p>This will produce libraries named <code>libopenblas64_.so|a</code>, a pkg-config file named <code>openblas64.pc</code>, and CMake and header files.</p> <p>Installing locally and inspecting the output will show a few more details: <pre><code>$ make install PREFIX=$PWD/../openblas/make64 INTERFACE64=1 SYMBOLSUFFIX=64_\n$ tree .  # output slightly edited down\n.\n\u251c\u2500\u2500 include\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cblas.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 f77blas.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_config.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapacke_utils.h\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 lapack.h\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas_config.h\n\u2514\u2500\u2500 lib\n    \u251c\u2500\u2500 cmake\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLASConfig.cmake\n    \u2502\u00a0\u00a0     \u2514\u2500\u2500 OpenBLASConfigVersion.cmake\n    \u251c\u2500\u2500 libopenblas64_.a\n    \u251c\u2500\u2500 libopenblas64_.so\n    \u2514\u2500\u2500 pkgconfig\n        \u2514\u2500\u2500 openblas64.pc\n</code></pre></p> <p>A key point are the symbol names. These will equal the LP64 symbol names, then (for Fortran only) the compiler mangling, and then the <code>64_</code> symbol suffix. Hence to obtain the final symbol names, we need to take into account which Fortran compiler we are using. For the most common cases (e.g., gfortran, Intel Fortran, or Flang), that means appending a single underscore. In that case, the result is:</p> base API name binary symbol name call from Fortran code call from C code <code>dgemm</code> <code>dgemm_64_</code> <code>dgemm_64(...)</code> <code>dgemm_64_(...)</code> <code>cblas_dgemm</code> <code>cblas_dgemm64_</code> n/a <code>cblas_dgemm64_(...)</code> <p>It is quite useful to have these symbol names be as uniform as possible across different packaging systems.</p> <p>The equivalent build options with CMake are: <pre><code>$ mkdir build &amp;&amp; cd build\n$ cmake .. -DINTERFACE64=1 -DSYMBOLSUFFIX=64_ -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON\n$ cmake --build . -j\n</code></pre></p> <p>Note that the result is not 100% identical to the Make result. For example, the library name ends in <code>_64</code> rather than <code>64_</code> - it is recommended to rename them to match the Make library names (also update the <code>libsuffix</code> entry in <code>openblas64.pc</code> to match that rename). <pre><code>$ cmake --install . --prefix $PWD/../../openblas/cmake64\n$ tree .\n.\n\u251c\u2500\u2500 include\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 openblas64\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 cblas.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 f77blas.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_config.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_example_aux.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapacke_utils.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 lapack.h\n\u2502\u00a0\u00a0     \u251c\u2500\u2500 openblas64\n\u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u2514\u2500\u2500 lapacke_mangling.h\n\u2502\u00a0\u00a0     \u2514\u2500\u2500 openblas_config.h\n\u2514\u2500\u2500 lib\n    \u251c\u2500\u2500 cmake\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 OpenBLAS64\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64Config.cmake\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64ConfigVersion.cmake\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 OpenBLAS64Targets.cmake\n    \u2502\u00a0\u00a0     \u2514\u2500\u2500 OpenBLAS64Targets-noconfig.cmake\n    \u251c\u2500\u2500 libopenblas_64.a\n    \u251c\u2500\u2500 libopenblas_64.so -&gt; libopenblas_64.so.0\n    \u2514\u2500\u2500 pkgconfig\n        \u2514\u2500\u2500 openblas64.pc\n</code></pre></p>"},{"location":"distributing/#the-upcoming-standardized-ilp64-convention","title":"The upcoming standardized ILP64 convention","text":"<p>While the <code>64_</code> convention above got some adoption, it's slightly hacky and is implemented through the use of <code>objcopy</code>. An effort is ongoing for a more broadly adopted convention in the reference BLAS and LAPACK libraries, using (a) the <code>_64</code> suffix, and (b) applying that suffix before rather than after Fortran compiler mangling. The central issue for this is lapack#666.</p> <p>For the most common cases of compiler mangling (a single <code>_</code> appended), the end result will be:</p> base API name binary symbol name call from Fortran code call from C code <code>dgemm</code> <code>dgemm_64_</code> <code>dgemm_64(...)</code> <code>dgemm_64_(...)</code> <code>cblas_dgemm</code> <code>cblas_dgemm_64</code> n/a <code>cblas_dgemm_64(...)</code> <p>For other compiler mangling schemes, replace the trailing <code>_</code> by the scheme in use.</p> <p>The shared library name for this <code>_64</code> convention should be <code>libopenblas_64.so</code>.</p> <p>Note: it is not yet possible to produce an OpenBLAS build which employs this convention! Once reference BLAS and LAPACK with support for <code>_64</code> have been released, a future OpenBLAS release will support it. For now, please use the older <code>64_</code> scheme and avoid using the name <code>libopenblas_64.so</code>; it should be considered reserved for future use of the <code>_64</code> standard as prescribed by reference BLAS/LAPACK.</p>"},{"location":"distributing/#performance-and-runtime-behavior-related-build-options","title":"Performance and runtime behavior related build options","text":"<p>For these options there are multiple reasonable or common choices.</p>"},{"location":"distributing/#threading-related-options","title":"Threading related options","text":"<p>OpenBLAS can be built as a multi-threaded or single-threaded library, with the default being multi-threaded. It's expected that the default <code>libopenblas</code> library is multi-threaded; if you'd like to also distribute single-threaded builds, consider naming them <code>libopenblas_sequential</code>.</p> <p>OpenBLAS can be built with pthreads or OpenMP as the threading model, with the default being pthreads. Both options are commonly used, and the choice here should not influence the shared library name. The choice will be captured by the <code>.pc</code> file. E.g.,: <pre><code>$ pkg-config --libs openblas\n-fopenmp -lopenblas\n\n$ cat openblas.pc\n...\nopenblas_config= ... USE_OPENMP=0 MAX_THREADS=24\n</code></pre></p> <p>The maximum number of threads users will be able to use is determined at build time by the <code>NUM_THREADS</code> build option. It defaults to 24, and there's a wide range of values that are reasonable to use (up to 256). 64 is a typical choice here; there is a memory footprint penalty that is linear in <code>NUM_THREADS</code>. Please see <code>Makefile.rule</code> for more details.</p>"},{"location":"distributing/#cpu-architecture-related-options","title":"CPU architecture related options","text":"<p>OpenBLAS contains a lot of CPU architecture-specific optimizations, hence when distributing to a user base with a variety of hardware, it is recommended to enable CPU architecture runtime detection. This will dynamically select optimized kernels for individual APIs. To do this, use the <code>DYNAMIC_ARCH=1</code> build option. This is usually done on all common CPU families, except when there are known issues.</p> <p>In case the CPU architecture is known (e.g. you're building binaries for macOS M1 users), it is possible to specify the target architecture directly with the <code>TARGET=</code> build option.</p> <p><code>DYNAMIC_ARCH</code> and <code>TARGET</code> are covered in more detail in the main <code>README.md</code> in this repository.</p>"},{"location":"distributing/#real-world-examples","title":"Real-world examples","text":"<p>OpenBLAS is likely to be distributed in one of these distribution models:</p> <ol> <li>As a standalone package, or multiple packages, in a packaging ecosystem like    a Linux distro, Homebrew, conda-forge or MSYS2.</li> <li>Vendored as part of a larger package, e.g. in Julia, NumPy, SciPy, or R.</li> <li>Locally, e.g. making available as a build on a single HPC cluster.</li> </ol> <p>The guidance on this page is most important for models (1) and (2). These links to build recipes for a representative selection of packaging systems may be helpful as a reference:</p> <ul> <li>Fedora</li> <li>Debian</li> <li>Homebrew</li> <li>MSYS2</li> <li>conda-forge</li> <li>NumPy/SciPy</li> <li>Nixpkgs</li> </ul> <ol> <li> <p>All major distributions do include LAPACK as of mid 2023 as far as we know. Older versions of Arch Linux did not, and that was known to cause problems.\u00a0\u21a9</p> </li> </ol>"},{"location":"extensions/","title":"Extensions","text":"<p>OpenBLAS for the most part contains implementations of the reference (Netlib) BLAS, CBLAS, LAPACK and LAPACKE interfaces. A few OpenBLAS-specific functions are also provided however, which mostly can be seen as \"BLAS extensions\". This page documents those non-standard APIs.</p>"},{"location":"extensions/#blas-like-extensions","title":"BLAS-like extensions","text":"Routine Data Types Description ?axpby s,d,c,z like axpy with a multiplier for y ?gemm3m c,z gemm3m ?imatcopy s,d,c,z in-place transpositon/copying ?omatcopy s,d,c,z out-of-place transpositon/copying ?geadd s,d,c,z matrix add ?gemmt s,d,c,z gemm but only a triangular part updated"},{"location":"extensions/#bfloat16-functionality","title":"bfloat16 functionality","text":"<p>BLAS-like and conversion functions for <code>bfloat16</code> (available when OpenBLAS was compiled with <code>BUILD_BFLOAT16=1</code>):</p> <ul> <li><code>void cblas_sbstobf16</code> converts a float array to an array of bfloat16 values by rounding</li> <li><code>void cblas_sbdtobf16</code> converts a double array to an array of bfloat16 values by rounding</li> <li><code>void cblas_sbf16tos</code> converts a bfloat16 array to an array of floats</li> <li><code>void cblas_dbf16tod</code> converts a bfloat16 array to an array of doubles</li> <li><code>float cblas_sbdot</code> computes the dot product of two bfloat16 arrays</li> <li><code>void cblas_sbgemv</code> performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16</li> <li><code>void cblas_sbgemm</code> performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16</li> </ul>"},{"location":"extensions/#utility-functions","title":"Utility functions","text":"<ul> <li><code>openblas_get_num_threads</code></li> <li><code>openblas_set_num_threads</code></li> <li><code>int openblas_get_num_procs(void)</code> returns the number of processors available on the system (may include \"hyperthreading cores\")</li> <li><code>int openblas_get_parallel(void)</code> returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading</li> <li><code>char * openblas_get_config()</code> returns the options OpenBLAS was built with, something like <code>NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell</code></li> <li><code>int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)</code> sets the CPU affinity mask of the given thread   to the provided cpuset. Only available on Linux, with semantics identical to <code>pthread_setaffinity_np</code>.</li> </ul>"},{"location":"faq/","title":"FAQ","text":"<ul> <li>General questions<ul> <li>What is BLAS? Why is it important?</li> <li>What functions are there and how can I call them from my C code?</li> <li>What is OpenBLAS? Why did you create this project?</li> <li>What's the difference between OpenBLAS and GotoBLAS?</li> <li>Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?</li> <li>How can I report a bug?</li> <li>How to reference OpenBLAS.</li> <li>How can I use OpenBLAS in multi-threaded applications?</li> <li>Does OpenBLAS support sparse matrices and/or vectors ?</li> <li>What support is there for recent PC hardware ? What about GPU ?</li> <li>How about the level 3 BLAS performance on Intel Sandy Bridge?</li> </ul> </li> <li>OS and Compiler<ul> <li>How can I call an OpenBLAS function in Microsoft Visual Studio?</li> <li>How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?</li> <li>I get a SEGFAULT with multi-threading on Linux. What's wrong?</li> <li>When I make the library, there is no such instruction: `xgetbv' error. What's wrong?</li> <li>My build fails due to the linker error \"multiple definition of `dlamc3_'\". What is the problem?</li> <li>My build worked fine and passed all tests, but running make lapack-test ends with segfaults</li> <li>How could I disable OpenBLAS threading affinity on runtime?</li> <li>How to solve undefined reference errors when statically linking against libopenblas.a</li> <li>Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6</li> <li>Building OpenBLAS in QEMU/KVM/XEN</li> <li>Building OpenBLAS on POWER fails with IBM XL</li> <li>Replacing system BLAS/updating APT OpenBLAS in Mint/Ubuntu/Debian</li> <li>I built OpenBLAS for use with some other software, but that software cannot find it</li> <li>I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it</li> <li>Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs</li> <li>Build fails with lots of errors about undefined ?GEMM_UNROLL_M</li> <li>CMAKE/OSX: Build fails with 'argument list too long'</li> <li>Likely problems with AVX2 support in Docker Desktop for OSX</li> </ul> </li> <li>Usage<ul> <li>Program is Terminated. Because you tried to allocate too many memory regions</li> <li>How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH</li> <li>After updating the installed OpenBLAS, a program complains about \"undefined symbol gotoblas\"</li> <li>How can I find out at runtime what options the library was built with ?</li> <li>After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?</li> <li>I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL</li> <li>Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board</li> <li>Speed varies wildly between individual runs on a typical ARMV8 smartphone processor</li> <li>I cannot get OpenBLAS to use more than a small subset of available cores on a big system</li> <li>Getting \"ELF load command address/offset not properly aligned\" when loading libopenblas.so<ul> <li>Using OpenBLAS with OpenMP</li> </ul> </li> </ul> </li> </ul>"},{"location":"faq/#general-questions","title":"General questions","text":""},{"location":"faq/#what-is-blas-why-is-it-important","title":"What is BLAS? Why is it important?","text":"<p>BLAS stands for Basic Linear Algebra Subprograms. BLAS provides standard interfaces for linear algebra, including BLAS1 (vector-vector operations), BLAS2 (matrix-vector operations), and BLAS3 (matrix-matrix operations). In general, BLAS is the computational kernel (\"the bottom of the food chain\") in linear algebra or scientific applications. Thus, if BLAS implementation is highly optimized, the whole application can get substantial benefit.</p>"},{"location":"faq/#what-functions-are-there-and-how-can-i-call-them-from-my-c-code","title":"What functions are there and how can I call them from my C code?","text":"<p>As BLAS is a standardized interface, you can refer to the documentation of its reference implementation at netlib.org. Calls from C go through its CBLAS interface, so your code will need to include the provided cblas.h in addition to linking with -lopenblas.  A single-precision matrix multiplication will look like <pre><code>#include &lt;cblas.h&gt;\n...\ncblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, 1.0, A, K, B, N, 0.0, result, N);\n</code></pre> where M,N,K are the dimensions of your data - see https://petewarden.files.wordpress.com/2015/04/gemm_corrected.png  (This image is part of an article on GEMM in the context of deep learning that is well worth reading in full - https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/)</p>"},{"location":"faq/#what-is-openblas-why-did-you-create-this-project","title":"What is OpenBLAS? Why did you create this project?","text":"<p>OpenBLAS is an open source BLAS library forked from the GotoBLAS2-1.13 BSD version. Since Mr. Kazushige Goto left TACC, GotoBLAS is no longer being maintained. Thus, we created this project to continue developing OpenBLAS/GotoBLAS.</p>"},{"location":"faq/#whats-the-difference-between-openblas-and-gotoblas","title":"What's the difference between OpenBLAS and GotoBLAS?","text":"<p>In OpenBLAS 0.2.0, we optimized level 3 BLAS on the Intel Sandy Bridge 64-bit OS. We obtained a performance comparable with that Intel MKL.  </p> <p>We optimized level 3 BLAS performance on the ICT Loongson-3A CPU. It outperformed GotoBLAS by 135% in a single thread and 120% in 4 threads.</p> <p>We fixed some GotoBLAS bugs including a SEGFAULT bug on the new Linux kernel, MingW32/64 bugs, and a ztrmm computing error bug on Intel Nehalem.</p> <p>We also added some minor features, e.g. supporting \"make install\", compiling without LAPACK and upgrading the LAPACK version to 3.4.2.</p> <p>You can find the full list of modifications in Changelog.txt.</p>"},{"location":"faq/#where-do-parameters-gemm_p-gemm_q-gemm_r-come-from","title":"Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?","text":"<p>The detailed explanation is probably in the original publication authored by Kazushige Goto - Goto, Kazushige; van de Geijn, Robert A; Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS). Volume 34 Issue 3, May 2008 While this article is paywalled and too old for preprints to be available on arxiv.org, more recent publications like https://arxiv.org/pdf/1609.00076 contain at least a brief description of the algorithm. In practice, the values are derived by experimentation to yield the block sizes that give the highest performance. A general rule of thumb for selecting a starting point seems to be that PxQ is about half the size of L2 cache.</p>"},{"location":"faq/#how-can-i-report-a-bug","title":"How can I report a bug?","text":"<p>Please file an issue at this issue page or send mail to the OpenBLAS mailing list.</p> <p>Please provide the following information: CPU, OS, compiler, and OpenBLAS compiling flags (Makefile.rule). In addition, please describe how to reproduce this bug.</p>"},{"location":"faq/#how-to-reference-openblas","title":"How to reference OpenBLAS.","text":"<p>You can reference our papers in this page. Alternatively, you can cite the OpenBLAS homepage  http://www.openblas.net.</p>"},{"location":"faq/#how-can-i-use-openblas-in-multi-threaded-applications","title":"How can I use OpenBLAS in multi-threaded applications?","text":"<p>If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.</p> <ul> <li>export OPENBLAS_NUM_THREADS=1 in the environment variables.  Or</li> <li>Call openblas_set_num_threads(1) in the application on runtime. Or</li> <li>Build OpenBLAS single thread version, e.g. make USE_THREAD=0 USE_LOCKING=1 (see comment below)</li> </ul> <p>If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1</p> <p>With the increased availability of fast multicore hardware it has unfortunately become clear that the thread management provided by OpenMP is not sufficient to prevent race conditions when OpenBLAS was built single-threaded by USE_THREAD=0 and there are concurrent calls from multiple threads to OpenBLAS functions. In this case, it is vital to also specify USE_LOCKING=1 (introduced with OpenBLAS 0.3.7). </p>"},{"location":"faq/#does-openblas-support-sparse-matrices-andor-vectors","title":"Does OpenBLAS support sparse matrices and/or vectors ?","text":"<p>OpenBLAS implements only the standard (dense) BLAS and LAPACK functions with a select few extensions popularized by Intel's MKL. Some cases can probably be made to work using e.g. GEMV or AXPBY, in general using a dedicated package like SuiteSparse (which can make use of OpenBLAS or equivalent for standard operations) is recommended.</p>"},{"location":"faq/#what-support-is-there-for-recent-pc-hardware-what-about-gpu","title":"What support is there for recent PC hardware ? What about GPU ?","text":"<p>As OpenBLAS is a volunteer project, it can take some time for the combination of a capable developer, free time, and particular hardware to come along, even for relatively common processors. Starting from 0.3.1, support is being added for AVX 512 (TARGET=SKYLAKEX), requiring a compiler that is capable of handling avx512 intrinsics. While AMD Zen processors should be autodetected by the build system, as of 0.3.2 they are still handled exactly  like Intel Haswell. There once was an effort to build an OpenCL implementation that one can still find at https://github.com/xianyi/clOpenBLAS , but work on this stopped in 2015.</p>"},{"location":"faq/#how-about-the-level-3-blas-performance-on-intel-sandy-bridge","title":"How about the level 3 BLAS performance on Intel Sandy Bridge?","text":"<p>We obtained a performance comparable with Intel MKL that actually outperformed Intel MKL in some cases. Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K Windows 7 SP1 64-bit: </p>"},{"location":"faq/#os-and-compiler","title":"OS and Compiler","text":""},{"location":"faq/#how-can-i-call-an-openblas-function-in-microsoft-visual-studio","title":"How can I call an OpenBLAS function in Microsoft Visual Studio?","text":"<p>Please read this page.</p>"},{"location":"faq/#how-can-i-use-cblas-and-lapacke-without-c99-complex-number-support-eg-in-visual-studio","title":"How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?","text":"<p>Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read this issue page for details.</p> <p>This issue is for using LAPACKE in Visual Studio.</p>"},{"location":"faq/#i-get-a-segfault-with-multi-threading-on-linux-whats-wrong","title":"I get a SEGFAULT with multi-threading on Linux. What's wrong?","text":"<p>This may be related to a bug in the Linux kernel 2.6.32 (?). Try applying the patch segaults.patch to disable mbind using</p> <pre><code> patch &lt; segfaults.patch\n</code></pre> <p>and see if the crashes persist. Note that this patch will lead to many compiler warnings.</p>"},{"location":"faq/#when-i-make-the-library-there-is-no-such-instruction-xgetbv-error-whats-wrong","title":"When I make the library, there is no such instruction: `xgetbv' error. What's wrong?","text":"<p>Please use GCC 4.4 and later version. This version supports xgetbv instruction. If you use the library for Sandy Bridge with AVX instructions, you should use GCC 4.6 and later version.</p> <p>On Mac OS X, please use Clang 3.1 and later version. For example, make CC=clang</p> <p>For the compatibility with old compilers (GCC &lt; 4.4), you can enable NO_AVX flag. For example, make NO_AVX=1</p>"},{"location":"faq/#my-build-fails-due-to-the-linker-error-multiple-definition-of-dlamc3_-what-is-the-problem","title":"My build fails due to the linker error \"multiple definition of `dlamc3_'\". What is the problem?","text":"<p>This linker error occurs if GNU patch is missing or if our patch for LAPACK fails to apply.</p> <p>Background: OpenBLAS implements optimized versions of some LAPACK functions, so we need to disable the reference versions.  If this process fails we end with duplicated implementations of the same function.</p>"},{"location":"faq/#my-build-worked-fine-and-passed-all-tests-but-running-make-lapack-test-ends-with-segfaults","title":"My build worked fine and passed all tests, but running <code>make lapack-test</code> ends with segfaults","text":"<p>Some of the LAPACK tests, notably in xeigtstz, try to allocate around 10MB on the stack. You may need to use <code>ulimit -s</code> to change the default limits on your system to allow this.</p>"},{"location":"faq/#how-could-i-disable-openblas-threading-affinity-on-runtime","title":"How could I disable OpenBLAS threading affinity on runtime?","text":"<p>You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running, <pre><code>export OPENBLAS_MAIN_FREE=1\n</code></pre></p> <p>Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule.</p>"},{"location":"faq/#how-to-solve-undefined-reference-errors-when-statically-linking-against-libopenblasa","title":"How to solve undefined reference errors when statically linking against libopenblas.a","text":"<p>On Linux, if OpenBLAS was compiled with threading support (<code>USE_THREAD=1</code> by default), custom programs statically linked against <code>libopenblas.a</code> should also link to the pthread library e.g.:</p> <pre><code>gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread\n</code></pre> <p>Failing to add the <code>-lpthread</code> flag will cause errors such as:</p> <pre><code>/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':\nmemory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `openblas_fork_handler':\nmemory.c:(.text+0x440): undefined reference to `pthread_atfork'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_memory_alloc':\nmemory.c:(.text+0x7a5): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0x825): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_shutdown':\nmemory.c:(.text+0x9e1): undefined reference to `pthread_mutex_lock'\nmemory.c:(.text+0xa6e): undefined reference to `pthread_mutex_unlock'\n/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_server':\nblas_server.c:(.text+0x273): undefined reference to `pthread_mutex_lock'\nblas_server.c:(.text+0x287): undefined reference to `pthread_mutex_unlock'\nblas_server.c:(.text+0x33f): undefined reference to `pthread_cond_wait'\n/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_init':\nblas_server.c:(.text+0x416): undefined reference to `pthread_mutex_lock'\nblas_server.c:(.text+0x4be): undefined reference to `pthread_mutex_init'\nblas_server.c:(.text+0x4ca): undefined reference to `pthread_cond_init'\nblas_server.c:(.text+0x4e0): undefined reference to `pthread_create'\nblas_server.c:(.text+0x50f): undefined reference to `pthread_mutex_unlock'\n...\n</code></pre> <p>The <code>-lpthread</code> is not required when linking dynamically against <code>libopenblas.so.0</code>.</p>"},{"location":"faq/#building-openblas-for-haswell-or-dynamic-arch-on-rhel-6-centos-6-rocks-61scientific-linux-6","title":"Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6","text":"<p>Minimum requirement to actually run AVX2-enabled software like OpenBLAS is kernel-2.6.32-358, shipped with EL6U4 in 2013</p> <p>The <code>binutils</code> package from RHEL6 does not know the instruction <code>vpermpd</code> or any other AVX2 instruction. You can download a newer <code>binutils</code> package from Enterprise Linux software collections, following instructions here:  https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/  After configuring repository you need to install devtoolset-?-binutils to get later usable binutils package <pre><code>$ yum search devtoolset-\\?-binutils\n$ sudo yum install devtoolset-3-binutils\n</code></pre> once packages are installed check the correct name for SCL redirection set to enable new version <pre><code>$ scl --list\ndevtoolset-3\nrh-python35\n</code></pre> Now just prefix your build commands with respective redirection:  <pre><code>$ scl enable devtoolset-3 -- make DYNAMIC_ARCH=1\n</code></pre> AVX-512 (SKYLAKEX) support requires devtoolset-8-gcc-gfortran (which exceeds formal requirement for AVX-512 because of packaging issues in earlier packages) which dependency-installs respective binutils and gcc or later and kernel 2.6.32-696 aka 6U9 or 3.10.0-327 aka 7U2 or later to run. In absence of abovementioned toolset OpenBLAS will fall back to AVX2 instructions in place of AVX512 sacrificing some performance on SKYLAKE-X platform.</p>"},{"location":"faq/#building-openblas-in-qemukvmxen","title":"Building OpenBLAS in QEMU/KVM/XEN","text":"<p>By default, QEMU reports the CPU as \"QEMU Virtual CPU version 2.2.0\", which shares CPUID with existing 32bit CPU even in 64bit virtual machine, and OpenBLAS recognizes it as PENTIUM2. Depending on the exact combination of CPU features the hypervisor choses to expose, this may not correspond to any CPU that exists, and OpenBLAS will error when trying to build. To fix this, pass <code>-cpu host</code> or <code>-cpu passthough</code> to QEMU, or another CPU model. Similarly, the XEN hypervisor may not pass through all features of the host cpu while reporting the cpu type itself correctly, which can lead to compiler error messages about an \"ABI change\" when compiling AVX512 code. Again changing the Xen configuration by running e.g.  \"xen-cmdline --set-xen cpuid=avx512\" should get around this (as would building OpenBLAS for an older cpu lacking that particular feature, e.g. TARGET=HASWELL)</p>"},{"location":"faq/#building-openblas-on-power-fails-with-ibm-xl","title":"Building OpenBLAS on POWER fails with IBM XL","text":"<pre><code>Trying to compile OpenBLAS with IBM XL ends with error messages about unknown register names\n</code></pre> <p>like \"vs32\". Working around these by using known alternate names for the vector registers only leads to another assembler error about unsupported constraints. This is a known deficiency in the IBM compiler at least up to and including 16.1.0 (and in the POWER version of clang, from which it is derived) - use gcc instead. (See issues #1078 and #1699 for related discussions)</p>"},{"location":"faq/#replacing-system-blasupdating-apt-openblas-in-mintubuntudebian","title":"Replacing system BLAS/updating APT OpenBLAS in Mint/Ubuntu/Debian","text":"<p>Debian and Ubuntu LTS versions provide OpenBLAS package which is not updated after initial release, and under circumstances one might want to use more recent version of OpenBLAS e.g. to get support for newer CPUs</p> <p>Ubuntu and Debian provides 'alternatives' mechanism to comfortably replace BLAS and LAPACK libraries systemwide.</p> <p>After successful build of OpenBLAS (with DYNAMIC_ARCH set to 1)</p> <p><pre><code>$ make clean\n$ make DYNAMIC_ARCH=1\n$ sudo make DYNAMIC_ARCH=1 install\n</code></pre> One can redirect BLAS and LAPACK alternatives to point to source-built OpenBLAS First you have to install NetLib LAPACK reference implementation (to have alternatives to replace): <pre><code>$ sudo apt install libblas-dev liblapack-dev\n</code></pre> Then we can set alternative to our freshly-built library: <pre><code>$ sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0 41 \\\n   --slave /usr/lib/liblapack.so.3 liblapack.so.3 /opt/OpenBLAS/lib/libopenblas.so.0\n</code></pre> Or remove redirection and switch back to APT-provided BLAS implementation order: <pre><code>$ sudo update-alternatives --remove libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0\n</code></pre> In recent versions of the distributions, the installation path for the libraries has been changed to include the name of the host architecture, like /usr/lib/x86_64-linux-gnu/blas/libblas.so.3 or libblas.so.3.x86_64-linux-gnu. Use <code>$ update-alternatives --display libblas.so.3</code> to find out what layout your system has.</p>"},{"location":"faq/#i-built-openblas-for-use-with-some-other-software-but-that-software-cannot-find-it","title":"I built OpenBLAS for use with some other software, but that software cannot find it","text":"<p>Openblas installs as a single library named libopenblas.so, while some programs may be searching for a separate libblas.so and liblapack.so so you may need to create appropriate symbolic links (<code>ln -s libopenblas.so libblas.so; ln -s libopenblas.so liblapack.so</code>) or copies. Also make sure that the installation location (usually /opt/OpenBLAS/lib or /usr/local/lib) is among the library search paths of your system.</p>"},{"location":"faq/#i-included-cblash-in-my-program-but-the-compiler-complains-about-a-missing-commonh-or-functions-from-it","title":"I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it","text":"<p>You probably tried to include a cblas.h that you simply copied from the OpenBLAS source, instead you need to run <code>make install</code> after building OpenBLAS and then use the modified cblas.h that this step builds in the installation path (usually either /usr/local/include, /opt/OpenBLAS/include or whatever you specified as PREFIX= on the <code>make install</code>) </p>"},{"location":"faq/#compiling-openblas-with-gccs-fbounds-check-actually-triggers-aborts-in-programs","title":"Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs","text":"<p>This is due to different interpretations of the (informal) standard for passing characters as arguments between C and FORTRAN functions. As the method for storing text differs in the two languages, when C calls Fortran the text length is passed as an \"invisible\" additional parameter. Historically, this has not been required when the text is just a single character, so older code like the Reference-LAPACK bundled with OpenBLAS does not do it. Recently gcc's checking has changed to require it, but there is no consensus yet if and how the existing LAPACK (and many other codebases) should adapt. (And for actual compilation, gcc has mostly backtracked and provided compatibility options - hence the default build settings in the OpenBLAS Makefiles add -fno-optimize-sibling-calls to the gfortran options to prevent miscompilation with \"affected\" versions. See ticket 2154 in the issue tracker for more details and links) </p>"},{"location":"faq/#build-fails-with-lots-of-errors-about-undefined-gemm_unroll_m","title":"Build fails with lots of errors about undefined ?GEMM_UNROLL_M","text":"<p>Your cpu is apparently too new to be recognized by the build scripts, so they failed to assign appropriate parameters for the block algorithm. Do a <code>make clean</code> and try again with TARGET set to one of the cpu models listed in <code>TargetList.txt</code> - for x86_64 this will usually be HASWELL.</p>"},{"location":"faq/#cmakeosx-build-fails-with-argument-list-too-long","title":"CMAKE/OSX: Build fails with 'argument list too long'","text":"<p>This is a limitation in the maximum length of a command on OSX, coupled with how CMAKE works. You should be able to work around this by adding the option <code>-DCMAKE_Fortran_USE_RESPONSE_FILE_FOR_OBJECTS=1</code> to your CMAKE arguments.</p>"},{"location":"faq/#likely-problems-with-avx2-support-in-docker-desktop-for-osx","title":"Likely problems with AVX2 support in Docker Desktop for OSX","text":"<p>There have been a few reports of wrong calculation results and build-time test failures when building in a container environment managed by the OSX version of Docker Desktop, which uses the xhyve virtualizer underneath. Judging from these reports, AVX2 support in xhyve appears to be subtly broken but a corresponding ticket in the xhyve issue tracker has not drawn any reaction or comment since 2019. Therefore it is strongly recommended to build OpenBLAS with the NO_AVX2=1 option when inside a container under (or for later use with) the Docker Desktop environment on Intel-based Apple hardware.</p>"},{"location":"faq/#usage","title":"Usage","text":""},{"location":"faq/#program-is-terminated-because-you-tried-to-allocate-too-many-memory-regions","title":"Program is Terminated. Because you tried to allocate too many memory regions","text":"<p>In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following. <pre><code>#define NUM_BUFFERS (MAX_CPU_NUMBER * 2)\n</code></pre> This error indicates that the program exceeded the number of buffers.</p> <p>Please build OpenBLAS with larger <code>NUM_THREADS</code>. For example, <code>make NUM_THREADS=32</code> or <code>make NUM_THREADS=64</code>. In <code>Makefile.system</code>, we will set <code>MAX_CPU_NUMBER=NUM_THREADS</code>.</p>"},{"location":"faq/#how-to-choose-target-manually-at-runtime-when-compiled-with-dynamic_arch","title":"How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH","text":"<p>The environment variable which control the kernel selection is <code>OPENBLAS_CORETYPE</code> (see <code>driver/others/dynamic.c</code>) e.g. <code>export OPENBLAS_CORETYPE=Haswell</code>. And the function <code>char* openblas_get_corename()</code> returns the used target.</p>"},{"location":"faq/#after-updating-the-installed-openblas-a-program-complains-about-undefined-symbol-gotoblas","title":"After updating the installed OpenBLAS, a program complains about \"undefined symbol gotoblas\"","text":"<p>This symbol gets defined only when OpenBLAS is built with \"make DYNAMIC_ARCH=1\" (which is what distributors will choose to ensure support for more than just one CPU type).</p>"},{"location":"faq/#how-can-i-find-out-at-runtime-what-options-the-library-was-built-with","title":"How can I find out at runtime what options the library was built with ?","text":"<p>OpenBLAS has two utility functions that may come in here:</p> <p>openblas_get_parallel() will return 0 for a single-threaded library, 1 if multithreading without OpenMP, 2 if built with USE_OPENMP=1</p> <p>openblas_get_config() will return a string containing settings such as USE64BITINT or DYNAMIC_ARCH that were active at build time, as well as the target cpu (or in case of a dynamic_arch build, the currently detected one).</p>"},{"location":"faq/#after-making-openblas-i-find-that-the-static-library-is-multithreaded-but-the-dynamic-one-is-not","title":"After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?","text":"<p>The shared OpenBLAS library you built is probably working fine as well, but your program may be picking up a different (probably single-threaded) version from one of the standard system paths like /usr/lib on startup. Running <code>ldd /path/to/your/program</code> will tell you which library the linkage loader will actually use.</p> <p>Specifying the \"correct\" library location with the <code>-L</code> flag (like <code>-L /opt/OpenBLAS/lib</code>) when linking your program only defines which library will be used to see if all symbols can be resolved, you will need to add an rpath entry to the binary (using <code>-Wl,rpath=/opt/OpenBLAS/lib</code>) to make it request searching that location. Alternatively, remove the \"wrong old\" library (if you can), or set LD_LIBRARY_PATH to the desired location before running your program.</p>"},{"location":"faq/#i-want-to-use-openblas-with-cuda-in-the-hpl-23-benchmark-code-but-it-keeps-looking-for-intel-mkl","title":"I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL","text":"<p>You need to edit file src/cuda/cuda_dgemm.c in the NVIDIA version of HPL, change the \"handle2\" and \"handle\" dlopen calls to use libopenblas.so instead of libmkl_intel_lp64.so, and add an trailing underscore in the dlsym lines for dgemm_mkl and dtrsm_mkl (like  <code>dgemm_mkl = (void(*)())dlsym(handle, \u201cdgemm_\u201d);</code>)</p>"},{"location":"faq/#multithreaded-openblas-runs-no-faster-or-is-even-slower-than-singlethreaded-on-my-armv7-board","title":"Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board","text":"<p>The power saving mechanisms of your board may have shut down some cores, making them invisible to OpenBLAS in its startup phase. Try bringing them online before starting your calculation.</p>"},{"location":"faq/#speed-varies-wildly-between-individual-runs-on-a-typical-armv8-smartphone-processor","title":"Speed varies wildly between individual runs on a typical ARMV8 smartphone processor","text":"<p>Check the technical specifications, it could be that the SoC combines fast and slow cpus and threads can end up on either. In that case, binding the process to specific cores e.g. by setting <code>OMP_PLACES=cores</code> may help. (You may need to experiment with OpenMP options, it has been reported that using <code>OMP_NUM_THREADS=2 OMP_PLACES=cores</code> caused a huge drop in performance on a 4+4 core chip while <code>OMP_NUM_THREADS=2 OMP_PLACES=cores(2)</code> worked as intended - as did OMP_PLACES=cores with 4 threads)</p>"},{"location":"faq/#i-cannot-get-openblas-to-use-more-than-a-small-subset-of-available-cores-on-a-big-system","title":"I cannot get OpenBLAS to use more than a small subset of available cores on a big system","text":"<p>Multithreading support in OpenBLAS requires the use of internal buffers for sharing partial results, the number and size of which is defined at compile time. Unless you specify NUM_THREADS in your make or cmake command, the build scripts try to autodetect the number of cores available in your build host to size the library to match. This unfortunately means that if you move the resulting binary from a small \"front-end node\" to a larger \"compute node\" later, it will still be limited to the hardware capabilities of the original system. The solution is to set NUM_THREADS to a number big enough to encompass the biggest systems you expect to run the binary on - at runtime, it will scale down the maximum number of threads it uses to match the number of cores physically available.</p>"},{"location":"faq/#getting-elf-load-command-addressoffset-not-properly-aligned-when-loading-libopenblasso","title":"Getting \"ELF load command address/offset not properly aligned\" when loading libopenblas.so","text":"<p>If you get a message \"error while loading shared libraries: libopenblas.so.0: ELF load command address/offset not properly aligned\" when starting a program that is (dynamically) linked to OpenBLAS, this is very likely due to a bug in the GNU linker (ld) that is part of the GNU binutils package. This error was specifically observed on older versions of Ubuntu Linux updated with the (at the time) most recent binutils version 2.38, but an internet search turned up sporadic reports involving various other libraries dating back several years. A bugfix was created by the binutils developers and should be available in later versions of binutils.(See issue 3708 for details)</p>"},{"location":"faq/#using-openblas-with-openmp","title":"Using OpenBLAS with OpenMP","text":"<p>OpenMP provides its own locking mechanisms, so when your code makes BLAS/LAPACK calls from inside OpenMP parallel regions it is imperative that you use an OpenBLAS that is built with USE_OPENMP=1, as otherwise deadlocks might occur. Furthermore, OpenBLAS will automatically restrict itself to using only a single thread when called from an OpenMP parallel region. When it is certain that calls will only occur from the main thread of your program (i.e. outside of omp parallel constructs), a standard pthreads build of OpenBLAS can be used as well. In that case it may be useful to tune the linger behaviour of idle threads in both your OpenMP program (e.g. set OMP_WAIT_POLICY=passive) and OpenBLAS (by redefining the THREAD_TIMEOUT variable at build time, or setting the environment variable OPENBLAS_THREAD_TIMEOUT smaller than the default 26) so that the two alternating thread pools do not unnecessarily hog the cpu during the handover.</p>"},{"location":"install/","title":"Install OpenBLAS","text":"<p>OpenBLAS can be installed through package managers or from source. If you only want to use OpenBLAS rather than make changes to it, we recommend installing a pre-built binary package with your package manager of choice.</p> <p>This page contains an overview of installing with package managers as well as from source. For the latter, see further down on this page.</p>"},{"location":"install/#installing-with-a-package-manager","title":"Installing with a package manager","text":"<p>Note</p> <p>Almost every package manager provides OpenBLAS packages; the list on this page is not comprehensive. If your package manager of choice isn't shown here, please search its package database for <code>openblas</code> or <code>libopenblas</code>.</p>"},{"location":"install/#linux","title":"Linux","text":"<p>On Linux, OpenBLAS can be installed with the system package manager, or with a package manager like Conda (or alternative package managers for the conda-forge ecosystem, like Mamba, Micromamba, or Pixi), Spack, or Nix. For the latter set of tools, the package name in all cases is <code>openblas</code>. Since package management in quite a few of these tools is declarative (i.e., managed by adding <code>openblas</code> to a metadata file describing the dependencies for your project or environment), we won't attempt to give detailed instructions for these tools here.</p> <p>Linux distributions typically split OpenBLAS up in two packages: one containing the library itself (typically named <code>openblas</code> or <code>libopenblas</code>), and one containing headers, pkg-config and CMake files (typically named the same as the package for the library with <code>-dev</code> or <code>-devel</code> appended; e.g., <code>openblas-devel</code>). Please keep in mind that if you want to install OpenBLAS in order to use it directly in your own project, you will need to install both of those packages.</p> <p>Distro-specific installation commands:</p> Debian/Ubuntu/Mint/KaliopenSUSE/SLEFedora/CentOS/RHELArch/Manjaro/Antergos <p><pre><code>$ sudo apt update\n$ sudo apt install libopenblas-dev\n</code></pre> OpenBLAS can be configured as the default BLAS through the <code>update-alternatives</code> mechanism:</p> <pre><code>$ sudo update-alternatives --config libblas.so.3\n</code></pre> <pre><code>$ sudo zypper refresh\n$ sudo zypper install openblas-devel\n</code></pre> <p>OpenBLAS can be configured as the default BLAS through the <code>update-alternatives</code> mechanism: <pre><code>$ sudo update-alternatives --config libblas.so.3\n</code></pre></p> <pre><code>$ dnf check-update\n$ dnf install openblas-devel\n</code></pre> <p>Warning</p> <p>Fedora does not ship the pkg-config files for OpenBLAS. Instead, it wants you to link against FlexiBLAS (which uses OpenBLAS by default as its backend on Fedora), which you can install with:</p> <pre><code>$ dnf install flexiblas-devel\n</code></pre> <p>For CentOS and RHEL, OpenBLAS packages are provided via the Fedora EPEL repository. After adding that repository and its repository keys, you can install <code>openblas-devel</code> with either <code>dnf</code> or <code>yum</code>.</p> <pre><code>$ sudo pacman -S openblas\n</code></pre>"},{"location":"install/#windows","title":"Windows","text":"Conda-forgevcpkgOpenBLAS releases <p>OpenBLAS can be installed with <code>conda</code> (or <code>mamba</code>, <code>micromamba</code>, or <code>pixi</code>) from conda-forge: <pre><code>conda install openblas\n</code></pre></p> <p>Conda-forge provides a method for switching the default BLAS implementation used by all packages. To use that for OpenBLAS, install <code>libblas=*=*openblas</code> (see the docs on this mechanism for more details).</p> <p>OpenBLAS can be installed with vcpkg: <pre><code># In classic mode:\nvcpkg install openblas\n\n# Or in manifest mode:\nvcpkg add port openblas\n</code></pre></p> <p>Windows is the only platform for which binaries are made available by the OpenBLAS project itself. They can be downloaded from the GitHub Releases](https://github.com/OpenMathLib/OpenBLAS/releases) page. These binaries are built with MinGW, using the following build options: <pre><code>NUM_THREADS=64 TARGET=GENERIC DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 CONSISTENT_FPCSR=1 INTERFACE=0\n</code></pre> There are separate packages for x86-64 and x86. The zip archive contains the include files, static and shared libraries, as well as configuration files for getting them found via CMake or pkg-config. To use these binaries, create a suitable folder for your OpenBLAS installation and unzip the <code>.zip</code> bundle there (note that you will need to edit the provided <code>openblas.pc</code> and <code>OpenBLASConfig.cmake</code> to reflect the installation path on your computer, as distributed they have \"win\" or \"win64\" reflecting the local paths on the system they were built on).</p> <p>Note that the same binaries can be downloaded from SourceForge; this is mostly of historical interest.</p>"},{"location":"install/#macos","title":"macOS","text":"<p>To install OpenBLAS with a package manager on macOS, run:</p> HomebrewMacPortsConda-forge <pre><code>% brew install openblas\n</code></pre> <pre><code>% sudo port install OpenBLAS-devel\n</code></pre> <pre><code>% conda install openblas\n</code></pre> <p>Conda-forge provides a method for switching the default BLAS implementation used by all packages. To use that for OpenBLAS, install <code>libblas=*=*openblas</code> (see the docs on this mechanism for more details).</p>"},{"location":"install/#freebsd","title":"FreeBSD","text":"<p>You can install OpenBLAS from the FreeBSD Ports collection: <pre><code>pkg install openblas\n</code></pre></p>"},{"location":"install/#building-from-source","title":"Building from source","text":"<p>We recommend download the latest stable version from the GitHub Releases page, or checking it out from a git tag, rather than a dev version from the <code>develop</code> branch.</p> <p>Tip</p> <p>The User manual contains a section with detailed information on compiling OpenBLAS, including how to customize builds and how to cross-compile. Please read that documentation first. This page contains only platform-specific build information, and assumes you already understand the general build system invocations to build OpenBLAS, with the specific build options you want to control multi-threading and other non-platform-specific behavior).</p>"},{"location":"install/#linux-and-macos","title":"Linux and macOS","text":"<p>Ensure you have C and Fortran compilers installed, then simply type <code>make</code> to compile the library. There are no other build dependencies, nor unusual platform-specific environment variables to set or other system setup to do.</p> <p>Note</p> <p>When building in an emulator (KVM, QEMU, etc.), please make sure that the combination of CPU features exposed to the virtual environment matches that of an existing CPU to allow detection of the CPU model to succeed. (With <code>qemu</code>, this can be done by passing <code>-cpu host</code> or a supported model name at invocation).</p>"},{"location":"install/#windows_1","title":"Windows","text":"<p>We support building OpenBLAS with either MinGW or Visual Studio on Windows. Using MSVC will yield an OpenBLAS build with the Windows platform-native ABI. Using MinGW will yield a different ABI. We'll describe both methods in detail in this section, since the process for each is quite different.</p>"},{"location":"install/#visual-studio-native-windows-abi","title":"Visual Studio &amp; native Windows ABI","text":"<p>For Visual Studio, you can use CMake to generate Visual Studio solution files; note that you will need at least CMake 3.11 for linking to work correctly).</p> <p>Note that you need a Fortran compiler if you plan to build and use the LAPACK functions included with OpenBLAS. The sections below describe using either <code>flang</code> as an add-on to clang/LLVM or <code>gfortran</code> as part of MinGW for this purpose. If you want to use the Intel Fortran compiler (<code>ifort</code> or <code>ifx</code>) for this, be sure to also use the Intel C compiler (<code>icc</code> or <code>icx</code>) for building the C parts, as the ABI imposed by <code>ifort</code> is incompatible with MSVC</p> <p>A fully-optimized OpenBLAS that can be statically or dynamically linked to your application can currently be built for the 64-bit architecture with the LLVM compiler infrastructure. We're going to use Miniconda3 to grab all of the tools we need, since some of them are in an experimental status. Before you begin, you'll need to have Microsoft Visual Studio 2015 or newer installed.</p> <ol> <li>Install Miniconda3 for 64-bit Windows using <code>winget install --id Anaconda.Miniconda3</code>,    or easily download from conda.io.</li> <li>Open the \"Anaconda Command Prompt\" now available in the Start Menu, or at <code>%USERPROFILE%\\miniconda3\\shell\\condabin\\conda-hook.ps1</code>.</li> <li>In that command prompt window, use <code>cd</code> to change to the directory where you want to build OpenBLAS.</li> <li>Now install all of the tools we need:    <pre><code>conda update -n base conda\nconda config --add channels conda-forge\nconda install -y cmake flang clangdev perl libflang ninja\n</code></pre></li> <li> <p>Still in the Anaconda Command Prompt window, activate the 64-bit MSVC environment with <code>vcvarsall x64</code>.     On Windows 11 with Visual Studio 2022, this would be done by invoking:</p> <pre><code>\"c:\\Program Files\\Microsoft Visual Studio\\2022\\Preview\\vc\\Auxiliary\\Build\\vcvars64.bat\"\n</code></pre> <p>With VS2019, the command should be the same (except for the year number of course). For other versions of MSVC, please check the Visual Studio documentation for exactly how to invoke the <code>vcvars64.bat</code> script.</p> <p>Confirm that the environment is active by typing <code>link</code>. This should return a long list of possible options for the <code>link</code> command. If it just returns \"command not found\" or similar, review and retype the call to <code>vcvars64.bat</code>.</p> <p>Note</p> <p>if you are working from a Visual Studio command prompt window instead (so that you do not have to do the <code>vcvars</code> call), you need to invoke <code>conda activate</code> so that <code>CONDA_PREFIX</code> etc. get set up correctly before proceeding to step 6. Failing to do so will lead to link errors like <code>libflangmain.lib</code> not getting found later in the build.</p> </li> <li> <p>Now configure the project with CMake. Starting in the project directory, execute the following:     <pre><code>set \"LIB=%CONDA_PREFIX%\\Library\\lib;%LIB%\"\nset \"CPATH=%CONDA_PREFIX%\\Library\\include;%CPATH%\"\nmkdir build\ncd build\ncmake .. -G \"Ninja\" -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_C_COMPILER=clang-cl -DCMAKE_Fortran_COMPILER=flang -DCMAKE_MT=mt -DBUILD_WITHOUT_LAPACK=no -DNOFORTRAN=0 -DDYNAMIC_ARCH=ON -DCMAKE_BUILD_TYPE=Release\n</code></pre></p> <p>You may want to add further options in the <code>cmake</code> command here. For instance, the default only produces a static <code>.lib</code> version of the library. If you would rather have a DLL, add <code>-DBUILD_SHARED_LIBS=ON</code> above. Note that this step only creates some command files and directories, the actual build happens next.</p> </li> <li> <p>Build the project:</p> <p><pre><code>cmake --build . --config Release\n</code></pre> This step will create the OpenBLAS library in the <code>lib</code> directory, and various build-time tests in the <code>test</code>, <code>ctest</code> and <code>openblas_utest</code> directories. However it will not separate the header files you might need for building your own programs from those used internally. To put all relevant files in a more convenient arrangement, run the next step.</p> </li> <li> <p>Install all relevant files created by the build:</p> <p><pre><code>cmake --install . --prefix c:\\opt -v\n</code></pre> This will copy all files that are needed for building and running your own programs with OpenBLAS to the given location, creating appropriate subdirectories for the individual kinds of files. In the case of <code>C:\\opt</code> as given above, this would be:</p> <ul> <li><code>C:\\opt\\include\\openblas</code> for the header files, </li> <li><code>C:\\opt\\bin</code> for the <code>libopenblas.dll</code> shared library,</li> <li><code>C:\\opt\\lib</code> for the static library, and</li> <li><code>C:\\opt\\share</code> holds various support files that enable other cmake-based   build scripts to find OpenBLAS automatically.</li> </ul> </li> </ol> <p>Change in complex types for Visual Studio 2017 and up</p> <p>In newer Visual Studio versions, Microsoft has changed how it handles complex types. Even when using a precompiled version of OpenBLAS, you might need to define <code>LAPACK_COMPLEX_CUSTOM</code> in order to define complex types properly for MSVC. For example, some variant of the following might help:</p> <pre><code>#if defined(_MSC_VER)\n    #include &lt;complex.h&gt;\n    #define LAPACK_COMPLEX_CUSTOM\n    #define lapack_complex_float _Fcomplex\n    #define lapack_complex_double _Dcomplex\n#endif\n</code></pre> <p>For reference, see openblas#3661, lapack#683, and this Stack Overflow question.</p> <p>Building 32-bit binaries with MSVC</p> <p>This method may produce binaries which demonstrate significantly lower performance than those built with the other methods. The Visual Studio compiler does not support the dialect of assembly used in the cpu-specific optimized files, so only the \"generic\" <code>TARGET</code> which is written in pure C will get built. For the same reason it is not possible (and not necessary) to use <code>-DDYNAMIC_ARCH=ON</code> in a Visual Studio build. You may consider building for the 32-bit architecture using the GNU (MinGW) ABI instead.</p>"},{"location":"install/#cmake-visual-studio-integration","title":"CMake &amp; Visual Studio integration","text":"<p>To generate Visual Studio solution files, ensure CMake is installed and then run: <pre><code># Do this from Powershell so cmake can find visual studio\ncmake -G \"Visual Studio 14 Win64\" -DCMAKE_BUILD_TYPE=Release .\n</code></pre></p> <p>To then build OpenBLAS using those solution files from within Visual Studio, we also need Perl. Please install it and ensure it's on the <code>PATH</code> (see, e.g., this Stack Overflow question for how).</p> <p>If you build from within Visual Studio, the dependencies may not be automatically configured: if you try to build <code>libopenblas</code> directly, it may fail with a message saying that some <code>.obj</code> files aren't found. If this happens, you can work around the problem by building the projects that <code>libopenblas</code> depends on before building <code>libopenblas</code> itself.</p>"},{"location":"install/#build-openblas-for-universal-windows-platform","title":"Build OpenBLAS for Universal Windows Platform","text":"<p>OpenBLAS can be built targeting Universal Windows Platform (UWP) like this:</p> <ol> <li>Follow the steps above to build the Visual Studio solution files for     Windows. This builds the helper executables which are required when building     the OpenBLAS Visual Studio solution files for UWP in step 2.</li> <li> <p>Remove the generated <code>CMakeCache.txt</code> and the <code>CMakeFiles</code> directory from     the OpenBLAS source directory, then re-run CMake with the following options:</p> <p><pre><code># do this to build UWP compatible solution files\ncmake -G \"Visual Studio 14 Win64\" -DCMAKE_SYSTEM_NAME=WindowsStore -DCMAKE_SYSTEM_VERSION=\"10.0\" -DCMAKE_SYSTEM_PROCESSOR=AMD64 -DVS_WINRT_COMPONENT=TRUE -DCMAKE_BUILD_TYPE=Release .\n</code></pre> 3.  Now build the solution with Visual Studio.</p> </li> </ol>"},{"location":"install/#mingw-gnu-abi","title":"MinGW &amp; GNU ABI","text":"<p>Note</p> <p>The resulting library from building with MinGW as described below can be used in Visual Studio, but it can only be linked dynamically. This configuration has not been thoroughly tested and should be considered experimental.</p> <p>To build OpenBLAS on Windows with MinGW:</p> <ol> <li>Install the MinGW (GCC) compiler suite, either the 32-bit     [MinGW]((http://www.mingw.org/) or the 64-bit     MinGW-w64 toolchain. Be sure to install     its <code>gfortran</code> package as well (unless you really want to build the BLAS part     of OpenBLAS only) and check that <code>gcc</code> and <code>gfortran</code> are the same version.     In addition, please install MSYS2 with MinGW.</li> <li>Build OpenBLAS in the MSYS2 shell. Usually, you can just type <code>make</code>.     OpenBLAS will detect the compiler and CPU automatically. </li> <li>After the build is complete, OpenBLAS will generate the static library     <code>libopenblas.a</code> and the shared library <code>libopenblas.dll</code> in the folder. You     can type <code>make PREFIX=/your/installation/path install</code> to install the     library to a certain location.</li> </ol> <p>Note that OpenBLAS will generate the import library <code>libopenblas.dll.a</code> for <code>libopenblas.dll</code> by default.</p> <p>If you want to generate Windows-native PDB files from a MinGW build, you can use the cv2pdb tool to do so.</p> <p>To then use the built OpenBLAS shared library in Visual Studio:</p> <ol> <li>Copy the import library (<code>OPENBLAS_TOP_DIR/libopenblas.dll.a</code>) and the     shared library (<code>libopenblas.dll</code>) into the same folder (this must be the     folder of your project that is going to use the BLAS library. You may need     to add <code>libopenblas.dll.a</code> to the linker input list: <code>properties-&gt;Linker-&gt;Input</code>).</li> <li>Please follow the Visual Studio documentation about using third-party .dll     libraries, and make sure to link against a library for the correct     architecture.<sup>1</sup></li> <li>If you need CBLAS, you should include <code>cblas.h</code> in     <code>/your/installation/path/include</code> in Visual Studio. Please see     openblas#95 for more details.</li> </ol> <p>Limitations of using the MinGW build within Visual Studio</p> <ul> <li>Both static and dynamic linking are supported with MinGW.  With Visual   Studio, however, only dynamic linking is supported and so you should use   the import library.</li> <li>Debugging from Visual Studio does not work because MinGW and Visual   Studio have incompatible formats for debug information (PDB vs.   DWARF/STABS).  You should either debug with GDB on the command line or   with a visual frontend, for instance Eclipse or   Qt Creator.</li> </ul>"},{"location":"install/#windows-on-arm","title":"Windows on Arm","text":"<p>The following tools needs to be installed to build for Windows on Arm (WoA):</p> <ul> <li>Clang for Windows on Arm.     Find the latest LLVM build for WoA from LLVM release page.     E.g: LLVM 12 build for WoA64 can be found here     Run the LLVM installer and ensure that LLVM is added to environment PATH.</li> <li>Download and install classic Flang for Windows on Arm.     Classic Flang is the only available Fortran compiler for Windows on Arm for now.     A pre-release build can be found here     There is no installer for classic flang and the zip package can be     extracted and the path needs to be added to environment <code>PATH</code>.     E.g., in PowerShell:     <pre><code>$env:Path += \";C:\\flang_woa\\bin\"\n</code></pre></li> </ul> <p>The following steps describe how to build the static library for OpenBLAS with and without LAPACK:</p> <ol> <li> <p>Build OpenBLAS static library with BLAS and LAPACK routines with Make:</p> <pre><code>$ make CC=\"clang-cl\" HOSTCC=\"clang-cl\" AR=\"llvm-ar\" BUILD_WITHOUT_LAPACK=0 NOFORTRAN=0 DYNAMIC_ARCH=0 TARGET=ARMV8 ARCH=arm64 BINARY=64 USE_OPENMP=0 PARALLEL=1 RANLIB=\"llvm-ranlib\" MAKE=make F_COMPILER=FLANG FC=FLANG FFLAGS_NOOPT=\"-march=armv8-a -cpp\" FFLAGS=\"-march=armv8-a -cpp\" NEED_PIC=0 HOSTARCH=arm64 libs netlib\n</code></pre> </li> <li> <p>Build static library with BLAS routines using CMake:</p> <p>Classic Flang has compatibility issues with CMake, hence only BLAS routines can be compiled with CMake:</p> <pre><code>$ mkdir build\n$ cd build\n$ cmake ..  -G Ninja -DCMAKE_C_COMPILER=clang -DBUILD_WITHOUT_LAPACK=1 -DNOFORTRAN=1 -DDYNAMIC_ARCH=0 -DTARGET=ARMV8 -DARCH=arm64 -DBINARY=64 -DUSE_OPENMP=0 -DCMAKE_SYSTEM_PROCESSOR=ARM64 -DCMAKE_CROSSCOMPILING=1 -DCMAKE_SYSTEM_NAME=Windows\n$ cmake --build . --config Release\n</code></pre> </li> </ol> <p><code>getarch.exe</code> execution error</p> <p>If you notice that platform-specific headers by <code>getarch.exe</code> are not generated correctly, this could be due to a known debug runtime DLL issue for arm64 platforms. Please check out this page for a workaround.</p>"},{"location":"install/#generating-an-import-library","title":"Generating an import library","text":"<p>Microsoft Windows has this thing called \"import libraries\". You need it for MSVC; you don't need it for MinGW because the <code>ld</code> linker is smart enough - however, you may still want it for some reason, so we'll describe the process for both MSVC and MinGW.</p> <p>Import libraries are compiled from a list of what symbols to use, which are contained in a <code>.def</code> file. A <code>.def</code> file should be already be present in the <code>exports</code> directory under the top-level OpenBLAS directory after you've run a build. In your shell, move to this directory: <code>cd exports</code>.</p> MSVCMinGW <p>Unlike MinGW, MSVC absolutely requires an import library. Now the C ABI of MSVC and MinGW are actually identical, so linking is actually okay (any incompatibility in the C ABI would be a bug).</p> <p>The import libraries of MSVC have the suffix <code>.lib</code>. They are generated from a <code>.def</code> file using MSVC's <code>lib.exe</code>. See the MSVC instructions.</p> <p>MinGW import libraries have the suffix <code>.a</code>, just like static libraries. Our goal is to produce the file <code>libopenblas.dll.a</code>.</p> <p>You need to first insert a line <code>LIBRARY libopenblas.dll</code> in <code>libopenblas.def</code>: <pre><code>cat &lt;(echo \"LIBRARY libopenblas.dll\") libopenblas.def &gt; libopenblas.def.1\nmv libopenblas.def.1 libopenblas.def\n</code></pre></p> <p>Now the <code>.def</code> file probably looks like: <pre><code>LIBRARY libopenblas.dll\nEXPORTS\n   caxpy=caxpy_  @1\n   caxpy_=caxpy_  @2\n       ...\n</code></pre> Then, generate the import library: <code>dlltool -d libopenblas.def -l libopenblas.dll.a</code></p> <p>Again, there is basically no point in making an import library for use in MinGW. It actually slows down linking.</p>"},{"location":"install/#android","title":"Android","text":"<p>To build OpenBLAS for Android, you will need the following tools installed on your machine:</p> <ul> <li>The Android NDK</li> <li>Perl</li> <li>Clang compiler on the build machine</li> </ul> <p>The next two sections below describe how to build with Clang for ARMV7 and ARMV8 targets, respectively. The same basic principles as described below for ARMV8 should also apply to building an x86 or x86-64 version (substitute something like <code>NEHALEM</code> for the target instead of <code>ARMV8</code>, and replace all the <code>aarch64</code> in the toolchain paths with <code>x86</code> or <code>x96_64</code> as appropriate).</p> <p>Historic note</p> <p>Since NDK version 19, the default toolchain is provided as a standalone toolchain, so building one yourself following building a standalone toolchain should no longer be necessary.</p>"},{"location":"install/#building-for-armv7","title":"Building for ARMV7","text":"<pre><code># Set path to ndk-bundle\nexport NDK_BUNDLE_DIR=/path/to/ndk-bundle\n\n# Set the PATH to contain paths to clang and arm-linux-androideabi-* utilities\nexport PATH=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH\n\n# Set LDFLAGS so that the linker finds the appropriate libgcc\nexport LDFLAGS=\"-L${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x\"\n\n# Set the clang cross compile flags\nexport CLANG_FLAGS=\"-target arm-linux-androideabi -marm -mfpu=vfp -mfloat-abi=softfp --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/\"\n\n#OpenBLAS Compile\nmake TARGET=ARMV7 ONLY_CBLAS=1 AR=ar CC=\"clang ${CLANG_FLAGS}\" HOSTCC=gcc ARM_SOFTFP_ABI=1 -j4\n</code></pre> <p>On macOS, it may also be necessary to give the complete path to the <code>ar</code> utility in the make command above, like so: <pre><code>AR=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc-ar\n</code></pre> otherwise you may get a linker error complaining like <code>malformed archive header name at 8</code> when the native macOS <code>ar</code> command was invoked instead.</p>"},{"location":"install/#building-for-armv8","title":"Building for ARMV8","text":"<p><pre><code># Set path to ndk-bundle\nexport NDK_BUNDLE_DIR=/path/to/ndk-bundle/\n\n# Export PATH to contain directories of clang and aarch64-linux-android-* utilities\nexport PATH=${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH\n\n# Setup LDFLAGS so that loader can find libgcc and pass -lm for sqrt\nexport LDFLAGS=\"-L${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x -lm\"\n\n# Setup the clang cross compile options\nexport CLANG_FLAGS=\"-target aarch64-linux-android --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm64 -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/\"\n\n# Compile\nmake TARGET=ARMV8 ONLY_CBLAS=1 AR=ar CC=\"clang ${CLANG_FLAGS}\" HOSTCC=gcc -j4\n</code></pre> Note: using <code>TARGET=CORTEXA57</code> in place of <code>ARMV8</code> will pick up better optimized routines. Implementations for the <code>CORTEXA57</code> target are compatible with all other <code>ARMV8</code> targets.</p> <p>Note: for NDK 23b, something as simple as: <pre><code>export PATH=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH\nmake HOSTCC=gcc CC=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang ONLY_CBLAS=1 TARGET=ARMV8\n</code></pre> appears to be sufficient on Linux.</p> Alternative build script for 3 architectures <p>This script will build OpenBLAS for 3 architecture (<code>ARMV7</code>, <code>ARMV8</code>, <code>X86</code>) and install them to <code>/opt/OpenBLAS/lib</code>. It was tested on macOS with NDK version 21.3.6528147.</p> <p><pre><code>export NDK=YOUR_PATH_TO_SDK/Android/sdk/ndk/21.3.6528147\nexport TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/darwin-x86_64\n\nmake clean\nmake \\\n    TARGET=ARMV7 \\\n    ONLY_CBLAS=1 \\\n    CC=\"$TOOLCHAIN\"/bin/armv7a-linux-androideabi21-clang \\\n    AR=\"$TOOLCHAIN\"/bin/arm-linux-androideabi-ar \\\n    HOSTCC=gcc \\\n    ARM_SOFTFP_ABI=1 \\\n    -j4\nsudo make install\n\nmake clean\nmake \\\n    TARGET=CORTEXA57 \\\n    ONLY_CBLAS=1 \\\n    CC=$TOOLCHAIN/bin/aarch64-linux-android21-clang \\\n    AR=$TOOLCHAIN/bin/aarch64-linux-android-ar \\\n    HOSTCC=gcc \\\n-j4\nsudo make install\n\nmake clean\nmake \\\n    TARGET=ATOM \\\n    ONLY_CBLAS=1 \\\nCC=\"$TOOLCHAIN\"/bin/i686-linux-android21-clang \\\nAR=\"$TOOLCHAIN\"/bin/i686-linux-android-ar \\\nHOSTCC=gcc \\\nARM_SOFTFP_ABI=1 \\\n-j4\nsudo make install\n\n## This will build for x86_64 \nmake clean\nmake \\\n    TARGET=ATOM BINARY=64\\\nONLY_CBLAS=1 \\\nCC=\"$TOOLCHAIN\"/bin/x86_64-linux-android21-clang \\\nAR=\"$TOOLCHAIN\"/bin/x86_64-linux-android-ar \\\nHOSTCC=gcc \\\nARM_SOFTFP_ABI=1 \\\n-j4\nsudo make install\n</code></pre> You can find full list of target architectures in TargetList.txt</p>"},{"location":"install/#iphoneios","title":"iPhone/iOS","text":"<p>As none of the current developers uses iOS, the following instructions are what was found to work in our Azure CI setup, but as far as we know this builds a fully working OpenBLAS for this platform.</p> <p>Go to the directory where you unpacked OpenBLAS,and enter the following commands: <pre><code>CC=/Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang\n\nCFLAGS= -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch arm64 -miphoneos-version-min=10.0\n\nmake TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1\n</code></pre> Adjust <code>MIN_IOS_VERSION</code> as necessary for your installation. E.g., change the version number to the minimum iOS version you want to target and execute this file to build the library.</p>"},{"location":"install/#mips","title":"MIPS","text":"<p>For MIPS targets you will need latest toolchains:</p> <ul> <li>P5600 - MTI GNU/Linux Toolchain</li> <li>I6400, P6600 - IMG GNU/Linux Toolchain</li> </ul> <p>You can use following commandlines for builds:</p> <pre><code>IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including \"bin\" directory -- for example, /opt/linux_toolchain/bin}\nIMG_GCC_PREFIX=mips-img-linux-gnu\nIMG_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}\n\n# I6400 Build (n32):\nmake BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL -mabi=n32\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400\n\n# I6400 Build (n64):\nmake BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400\n\n# P6600 Build (n32):\nmake BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL -mabi=n32\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P6600\n\n# P6600 Build (n64):\nmake BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC=\"$IMG_TOOLCHAIN-gfortran -EL\" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=\"$CFLAGS\" LDFLAGS=\"$CFLAGS\" TARGET=P6600\n\nMTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including \"bin\" directory -- for example, /opt/linux_toolchain/bin}\nMTI_GCC_PREFIX=mips-mti-linux-gnu\nMTI_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}\n\n# P5600 Build:\n\nmake BINARY=32 BINARY32=1 CC=$MTI_TOOLCHAIN-gcc AR=$MTI_TOOLCHAIN-ar FC=\"$MTI_TOOLCHAIN-gfortran -EL\"    RANLIB=$MTI_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS=\"-EL\" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P5600\n</code></pre>"},{"location":"install/#freebsd_1","title":"FreeBSD","text":"<p>You will need to install the following tools from the FreeBSD ports tree:</p> <ul> <li>lang/gcc</li> <li>lang/perl5.12</li> <li>ftp/curl</li> <li>devel/gmake</li> <li>devel/patch</li> </ul> <p>To compile run the command: <pre><code>$ gmake CC=gcc FC=gfortran\n</code></pre></p>"},{"location":"install/#cortex-m","title":"Cortex-M","text":"<p>Cortex-M is a widely used microcontroller that is present in a variety of industrial and consumer electronics. A common variant of the Cortex-M is the <code>STM32F4xx</code> series. Here, we will give instructions for building for that series.</p> <p>First, install the embedded Arm GCC compiler from the Arm website. Then, create the following <code>toolchain.cmake</code> file:</p> <pre><code>set(CMAKE_SYSTEM_NAME Generic)\nset(CMAKE_SYSTEM_PROCESSOR arm)\n\nset(CMAKE_C_COMPILER \"arm-none-eabi-gcc.exe\")\nset(CMAKE_CXX_COMPILER \"arm-none-eabi-g++.exe\")\n\nset(CMAKE_EXE_LINKER_FLAGS \"--specs=nosys.specs\" CACHE INTERNAL \"\")\n\nset(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)\nset(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)\nset(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)\nset(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)\n</code></pre> <p>Then build OpenBLAS with: <pre><code>$ cmake .. -G Ninja -DCMAKE_C_COMPILER=arm-none-eabi-gcc -DCMAKE_TOOLCHAIN_FILE:PATH=\"toolchain.cmake\" -DNOFORTRAN=1 -DTARGET=ARMV5 -DEMBEDDED=1\n</code></pre></p> <p>In your embedded application, the following functions need to be provided for OpenBLAS to work correctly: <pre><code>void free(void* ptr);\nvoid* malloc(size_t size);\n</code></pre></p> <p>Note</p> <p>If you are developing for an embedded platform, it is your responsibility to make sure that the device has sufficient memory for <code>malloc</code> calls. Libmemory provides one implementation of <code>malloc</code> for embedded platforms.</p> <ol> <li> <p>If the OpenBLAS DLLs are not linked correctly, you may see an error like    \"The application was unable to start correctly (0xc000007b)\", which typically    indicates a mismatch between 32-bit and 64-bit libraries.\u00a0\u21a9</p> </li> </ol>"},{"location":"user_manual/","title":"User manual","text":"<p>This user manual covers compiling OpenBLAS itself, linking your code to OpenBLAS, example code to use the C (CBLAS) and Fortran (BLAS) APIs, and some troubleshooting tips. Compiling OpenBLAS is optional, since you may be able to install with a package manager.</p> <p>Note</p> <p>The OpenBLAS documentation does not contain API reference documentation for BLAS or LAPACK, since these are standardized APIs, the documentation for which can be found in other places. If you want to understand every BLAS and LAPACK function and definition, we recommend reading the Netlib BLAS  and Netlib LAPACK documentation.</p> <p>OpenBLAS does contain a limited number of functions that are non-standard, these are documented at OpenBLAS extension functions.</p>"},{"location":"user_manual/#compiling-openblas","title":"Compiling OpenBLAS","text":""},{"location":"user_manual/#normal-compile","title":"Normal compile","text":"<p>The default way to build and install OpenBLAS from source is with Make: <pre><code>make  # add `-j4` to compile in parallel with 4 processes\nmake install\n</code></pre></p> <p>By default, the CPU architecture is detected automatically when invoking <code>make</code>, and the build is optimized for the detected CPU. To override the autodetection, use the <code>TARGET</code> flag:</p> <p><pre><code># `make TARGET=xxx` sets target CPU: e.g. for an Intel Nehalem CPU:\nmake TARGET=NEHALEM\n</code></pre> The full list of known target CPU architectures can be found in <code>TargetList.txt</code> in the root of the repository.</p>"},{"location":"user_manual/#cross-compile","title":"Cross compile","text":"<p>For a basic cross-compilation with Make, three steps need to be taken:</p> <ul> <li>Set the <code>CC</code> and <code>FC</code> environment variables to select the cross toolchains   for C and Fortran.</li> <li>Set the <code>HOSTCC</code> environment variable to select the host C compiler (i.e. the   regular C compiler for the machine on which you are invoking the build).</li> <li>Set <code>TARGET</code> explicitly to the CPU architecture on which the produced   OpenBLAS binaries will be used.</li> </ul>"},{"location":"user_manual/#cross-compilation-examples","title":"Cross-compilation examples","text":"<p>Compile the library for ARM Cortex-A9 linux on an x86-64 machine (note: install only <code>gnueabihf</code> versions of the cross toolchain - see this issue comment for why): <pre><code>make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9\n</code></pre></p> <p>Compile OpenBLAS for a loongson3a CPU on an x86-64 machine: <pre><code>make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A\n</code></pre></p> <p>Compile OpenBLAS for loongson3a CPU with the <code>loongcc</code> (based on Open64) compiler on an x86-64 machine: <pre><code>make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32\n</code></pre></p>"},{"location":"user_manual/#building-a-debug-version","title":"Building a debug version","text":"<p>Add <code>DEBUG=1</code> to your build command, e.g.: <pre><code>make DEBUG=1\n</code></pre></p>"},{"location":"user_manual/#install-to-a-specific-directory","title":"Install to a specific directory","text":"<p>Note</p> <p>Installing to a directory is optional; it is also possible to use the shared or static libraries directly from the build directory.</p> <p>Use <code>make install</code> with the <code>PREFIX</code> flag to install to a specific directory:</p> <pre><code>make install PREFIX=/path/to/installation/directory\n</code></pre> <p>The default directory is <code>/opt/OpenBLAS</code>.</p> <p>Important</p> <p>Note that any flags passed to <code>make</code> during build should also be passed to <code>make install</code> to circumvent any install errors, i.e. some headers not being copied over correctly.</p> <p>For more detailed information on building/installing from source, please read the Installation Guide.</p>"},{"location":"user_manual/#linking-to-openblas","title":"Linking to OpenBLAS","text":"<p>OpenBLAS can be used as a shared or a static library.</p>"},{"location":"user_manual/#link-a-shared-library","title":"Link a shared library","text":"<p>The shared library is normally called <code>libopenblas.so</code>, but not that the name may be different as a result of build flags used or naming choices by a distro packager (see [distributing.md] for details). To link a shared library named <code>libopenblas.so</code>, the flag <code>-lopenblas</code> is needed. To find the OpenBLAS headers, a <code>-I/path/to/includedir</code> is needed. And unless the library is installed in a directory that the linker searches by default, also <code>-L</code> and <code>-Wl,-rpath</code> flags are needed. For a source file <code>test.c</code> (e.g., the example code under Call CBLAS interface further down), the shared library can then be linked with: <pre><code>gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas\n</code></pre></p> <p>The <code>-Wl,-rpath,/your_path/OpenBLAS/lib</code> linker flag can be omitted if you ran <code>ldconfig</code> to update linker cache, put <code>/your_path/OpenBLAS/lib</code> in <code>/etc/ld.so.conf</code> or a file in <code>/etc/ld.so.conf.d</code>, or installed OpenBLAS in a location that is part of the <code>ld.so</code> default search path (usually <code>/lib</code>, <code>/usr/lib</code> and <code>/usr/local/lib</code>). Alternatively, you can set the environment variable <code>LD_LIBRARY_PATH</code> to point to the folder that contains <code>libopenblas.so</code>. Otherwise, the build may succeed but at runtime loading the library will fail with a message like: <pre><code>cannot open shared object file: no such file or directory\n</code></pre></p> <p>More flags may be needed, depending on how OpenBLAS was built:</p> <ul> <li>If <code>libopenblas</code> is multi-threaded, please add <code>-lpthread</code>.</li> <li>If the library contains LAPACK functions (usually also true), please add   <code>-lgfortran</code> (other Fortran libraries may also be needed, e.g. <code>-lquadmath</code>).   Note that if you only make calls to LAPACKE routines, i.e. your code has   <code>#include \"lapacke.h\"</code> and makes calls to methods like <code>LAPACKE_dgeqrf</code>,   then <code>-lgfortran</code> is not needed.</li> </ul> <p>Tip</p> <p>Usually a pkg-config file (e.g., <code>openblas.pc</code>) is installed together with a <code>libopenblas</code> shared library. pkg-config is a tool that will tell you the exact flags needed for linking. For example:</p> <pre><code>$ pkg-config --cflags openblas\n-I/usr/local/include\n$ pkg-config --libs openblas\n-L/usr/local/lib -lopenblas\n</code></pre>"},{"location":"user_manual/#link-a-static-library","title":"Link a static library","text":"<p>Linking a static library is simpler - add the path to the static OpenBLAS library to the compile command: <pre><code>gcc -o test test.c /your/path/libopenblas.a\n</code></pre></p>"},{"location":"user_manual/#code-examples","title":"Code examples","text":""},{"location":"user_manual/#call-cblas-interface","title":"Call CBLAS interface","text":"<p>This example shows calling <code>cblas_dgemm</code> in C:</p> <pre><code>#include &lt;cblas.h&gt;\n#include &lt;stdio.h&gt;\n\nvoid main()\n{\n  int i=0;\n  double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};         \n  double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};  \n  double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5}; \n  cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);\n\n  for(i=0; i&lt;9; i++)\n    printf(\"%lf \", C[i]);\n  printf(\"\\n\");\n}\n</code></pre> <p>To compile this file, save it as <code>test_cblas_dgemm.c</code> and then run: <pre><code>gcc -o test_cblas_open test_cblas_dgemm.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran\n</code></pre> will result in a <code>test_cblas_open</code> executable.</p>"},{"location":"user_manual/#call-blas-fortran-interface","title":"Call BLAS Fortran interface","text":"<p>This example shows calling the <code>dgemm</code> Fortran interface in C:</p> <pre><code>#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"sys/time.h\"\n#include \"time.h\"\n\nextern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);\n\nint main(int argc, char* argv[])\n{\n  int i;\n  printf(\"test!\\n\");\n  if(argc&lt;4){\n    printf(\"Input Error\\n\");\n    return 1;\n  }\n\n  int m = atoi(argv[1]);\n  int n = atoi(argv[2]);\n  int k = atoi(argv[3]);\n  int sizeofa = m * k;\n  int sizeofb = k * n;\n  int sizeofc = m * n;\n  char ta = 'N';\n  char tb = 'N';\n  double alpha = 1.2;\n  double beta = 0.001;\n\n  struct timeval start,finish;\n  double duration;\n\n  double* A = (double*)malloc(sizeof(double) * sizeofa);\n  double* B = (double*)malloc(sizeof(double) * sizeofb);\n  double* C = (double*)malloc(sizeof(double) * sizeofc);\n\n  srand((unsigned)time(NULL));\n\n  for (i=0; i&lt;sizeofa; i++)\n    A[i] = i%3+1;//(rand()%100)/10.0;\n\n  for (i=0; i&lt;sizeofb; i++)\n    B[i] = i%3+1;//(rand()%100)/10.0;\n\n  for (i=0; i&lt;sizeofc; i++)\n    C[i] = i%3+1;//(rand()%100)/10.0;\n  //#if 0\n  printf(\"m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\\n\",m,n,k,alpha,beta,sizeofc);\n  gettimeofday(&amp;start, NULL);\n  dgemm_(&amp;ta, &amp;tb, &amp;m, &amp;n, &amp;k, &amp;alpha, A, &amp;m, B, &amp;k, &amp;beta, C, &amp;m);\n  gettimeofday(&amp;finish, NULL);\n\n  duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;\n  double gflops = 2.0 * m *n*k;\n  gflops = gflops/duration*1.0e-6;\n\n  FILE *fp;\n  fp = fopen(\"timeDGEMM.txt\", \"a\");\n  fprintf(fp, \"%dx%dx%d\\t%lf s\\t%lf MFLOPS\\n\", m, n, k, duration, gflops);\n  fclose(fp);\n\n  free(A);\n  free(B);\n  free(C);\n  return 0;\n}\n</code></pre> <p>To compile this file, save it as <code>time_dgemm.c</code> and then run: <pre><code>gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a -lpthread\n</code></pre> You can then run it as: <code>./time_dgemm &lt;m&gt; &lt;n&gt; &lt;k&gt;</code>, with <code>m</code>, <code>n</code>, and <code>k</code> input parameters to the <code>time_dgemm</code> executable.</p> <p>Note</p> <p>When calling the Fortran interface from C, you have to deal with symbol name differences caused by compiler conventions. That is why the <code>dgemm_</code> function call in the example above has a trailing underscore. This is what it looks like when using <code>gcc</code>/<code>gfortran</code>, however such details may change for different compilers. Hence it requires extra support code. The CBLAS interface may be more portable when writing C code.</p> <p>When writing code that needs to be portable and work across different platforms and compilers, the above code example is not recommended for usage. Instead, we advise looking at how OpenBLAS (or BLAS in general, since this problem isn't specific to OpenBLAS) functions are called in widely used projects like Julia, SciPy, or R.</p>"},{"location":"user_manual/#troubleshooting","title":"Troubleshooting","text":"<ul> <li>Please read the FAQ first, your problem may be described there.</li> <li>Please ensure you are using a recent enough compiler, that supports the   features your CPU provides (example: GCC versions before 4.6 were known to   not support AVX kernels, and before 6.1 AVX512CD kernels).</li> <li>The number of CPU cores supported by default is &lt;=256. On Linux x86-64, there   is experimental support for up to 1024 cores and 128 NUMA nodes if you build   the library with <code>BIGNUMA=1</code>.</li> <li>OpenBLAS does not set processor affinity by default. On Linux, you can enable   processor affinity by commenting out the line <code>NO_AFFINITY=1</code> in   <code>Makefile.rule</code>.</li> <li>On Loongson 3A, <code>make test</code> is known to fail with a <code>pthread_create</code> error   and an <code>EAGAIN</code> error code. However, it will be OK when you run the same   testcase in a shell.</li> </ul>"}]}
\ No newline at end of file