Squash commit of GitHub wiki

* Created Installation Guide (markdown)
* Updated quick installation (markdown)
* Updated Home (markdown)
* Updated Document (markdown)
* Updated Document (markdown)
* Updated Document (markdown)
* Created Installation Guide (markdown)
* Created Home (markdown)
* Init version
* Updated OpenBLAS Wiki (markdown)
* Updated OpenBLAS Wiki (markdown)
* Updated OpenBLAS Wiki (markdown)
* Updated Document (markdown)
* Updated Installation Guide (markdown)
* Updated Installation Guide (markdown)
* Created Download (markdown)
* Created Faq (markdown)
* Updated Faq (markdown)
* Updated FAQ
* Created How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Document (markdown)
* Updated Faq (markdown)
* Updated Faq (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Faq (markdown)
* Updated OpenBLAS Wiki (markdown)
* Updated Home (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Created How to generate import library for MingW (markdown)
* Updated Document (markdown)
* Updated Faq (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Build instrunctions for FreeBSD
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Installation Guide (markdown)
* Updated Faq (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* minor edits
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Faq (markdown)
* Installation instructions for Windows
* Updated Faq (markdown)
* G77 conventions no longer needed with GCC 4.7+
* Updated Home (markdown)
* Document why issue 168 occurred.
* Updated Home (markdown)
* Created Publications (markdown)
* Updated Home (markdown)
* Updated Document (markdown)
* Updated Faq (markdown)
* Updated Download (markdown)
* Updated Publications (markdown)
* Updated Faq (markdown)
* Updated Document (markdown)
* Revert 7580d38ffad37e6613e6304707aaaa681f3d78c2 ... b1bd4ff37d2106bbd5c4730a08dbb789cc44e7d4
* Created Mailing List (markdown)
* Updated Mailing List (markdown)
* Updated Mailing List (markdown)
* Updated Home (markdown)
* Updated Document (markdown)
* Updated Publications (markdown)
* Updated Download (markdown)
* Updated Faq (markdown)
* Updated Home (markdown)
* Updated Faq (markdown)
* Updated Home (markdown)
* Revert b69f1417cdf8820be046cc27a2b96b42a25bc3a3 ... 90a227c317c3572ced943461ac3a252c40790f44 on Home
* Updated Home (markdown)
* Updated Publications (markdown)
* Updated Faq (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* We already ensure the stack alignment in Makefile.system for Win32.
* Updated Faq (markdown)
* Updated Faq (markdown)
* Updated Publications (markdown)
* Created Donation (markdown)
* Updated Home (markdown)
* Updated Document (markdown)
* Updated Faq (markdown)
* Updated Publications (markdown)
* Updated Download (markdown)
* Updated Mailing List (markdown)
* Updated Donation (markdown)
* Updated Download (markdown)
* Updated Donation (markdown)
* Updated Donation (markdown)
* Updated Donation (markdown)
* Updated Donation (markdown)
* Updated Home (markdown)
* Updated Faq (markdown)
* Updated Download (markdown)
* Updated Home (markdown)
* Updated Home (markdown)
* Add new entry for static linking and pthread.
* Fix named anchors (see http://stackoverflow.com/questions/5319754/cross-reference-named-anchor-in-markdown/7335259#7335259)
* Created Related packages that use OpenBLAS (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Document (markdown)
* Created To-do List (markdown)
* Updated To do List (markdown)
* Updated Fixed optimized kernels To do List (markdown)
* Fix English idiom
* Remove trailing whitespace
* Updated Fixed optimized kernels To do List (markdown)
* Updated Fixed optimized kernels To do List (markdown)
* Updated Fixed optimized kernels To do List (markdown)
* Updated Fixed optimized kernels To do List (markdown)
* Updated Faq (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Created Machine List (markdown)
* Updated Document (markdown)
* Updated Installation Guide (markdown)
* Created User Manual (markdown)
* Updated User Manual (markdown)
* Updated Document (markdown)
* Updated User Manual (markdown)
* Updated User Manual (markdown)
* Updated User Manual (markdown)
* Updated User Manual (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated Faq (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated Machine List (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Updated Related packages that use OpenBLAS (markdown)
* Add a note about building in QEMU
* Updated Home (markdown)
* Updated Faq (markdown)
* update for allocating too many meory error.
* Updated Faq (markdown)
* Updated Faq (markdown)
* Updated Installation Guide (markdown)
* Updated Faq (markdown)
* Init function doc
* Updated Document (markdown)
* Updated User Manual (markdown)
* Updated User Manual (markdown)
* Created How to build OpenBLAS for Android (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Updated Home (markdown)
* Part of the description is really no clear, I add some more information, so it would be easier for VS user to fix the problems facing them.
* Created Developer manual (markdown)
* Updated Document (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* a typo, download ** frome -> download from
* Updated Faq (markdown)
* English (minor edit)
* Updated Developer manual (markdown)
* Updated Developer manual (markdown)
* Updated Developer manual (markdown)
* Updated Machine List (markdown)
* Updated Developer manual (markdown)
* Updated Developer manual (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* issue 842
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Added FC for building with Fortran
* Change link for the Intel MKL documentation
* Updated User Manual (markdown)
* Updated User Manual (markdown)
* Added MIPS build instructions from issue 949
* use TARGET_CFLAGS and TARGET_LDFLAGS instead of CFLAGS and LDFLAGS for linking OpenBLAS on ARMv7
* Add Windows updates (msys2,mingw/w64 merger), Android/MIPS pointers, qemu hint
* Building libs & netlib targets to prevent errors in tests
* Recipes not targets (for make)
* Making only libs, not netlib (which also contains link/run tests...)
* Copied from instructions by Ivan Ushakov, originally posted in issue 569
* Updated How to build OpenBLAS for iPhone iOS (markdown)
* Updated Faq (markdown)
* Created How to build OpenBLAS for iPhone iOS (markdown)
* error code (0xc000007b) was missing a character
* Updated How to build OpenBLAS for iPhone iOS (ARMv8) (markdown)
* Updated How to build OpenBLAS for iPhone iOS (ARMv8) (markdown)
* Revert 7e9dd0ebf079e002e3aa831fa671fde3e8cfad81...8d105c7be8cd447482f61e0295c0c146f5314eb5 on How to build OpenBLAS for iPhone iOS
* Add guide on how to reversibly supplant Ubuntu LTS libblas.so.3
* typo
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated User Manual (markdown)
* Updated Faq (markdown)
* Updated Download (markdown)
* Add perl to pacman package list
* Fixed formatting on general questions
* Copied from issue 1136
* Added instructions for building for Windows UWP.
* To clear confusions vs super-fat-binaries that dont exist.
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Update for 0.2.20 (full builds, ARMv7 softfp support, newer NDKs using CLANG)
* Updated How to build OpenBLAS for Android (markdown)
* Fix some formatting issues
* Updated How to build OpenBLAS for Android (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to build OpenBLAS for Android (markdown)
* Created Precompiled installation packages (markdown)
* Updated Precompiled installation packages (markdown)
* Example - debian?
* Mention (and link to) distribution-specific packages
* Updated Installation Guide (markdown)
* OpenSuSE (13.2, SLE included)
* Updated Precompiled installation packages (markdown)
* Updated Precompiled installation packages (markdown)
* Make it look consistent.
* Fedora+EPEL // maybe rpmbuild is too heavy
* Updated Precompiled installation packages (markdown)
* Updated Precompiled installation packages (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Precompiled installation packages (markdown)
* fix toolchain argument in armv8 clang build as per issue 1337
* add note about stdio.h not found error
* Add flang instructions
* Use the SVG Travis badge
* homebrew option for OSX
* Promote native MSVC builds with LLVM
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Direct people to the appropriate instructions
* Add link to the Goto paper
* Add CMAKE_BUILD_TYPE
* Add note about having to specify AR on a Mac, from issue 1435
* Mention requirement to build a standalone toolchain in the clang section as well
* added 'perl' to conda install command
* homebrew/science was deprecated. This tap is now empty as all its formulae were migrated.
* Added hint for "expected identifier" error message to mingw section following
issue 1503
* Revert 9161c3b54281131e892dec739d888f35e6c59cf3...03f879be0c9e6a55705bc7efd5ee193299e04029 on How to use OpenBLAS in Microsoft Visual Studio
* Revert to recommending mingw-w64 from sf.net and add note about issue 1503
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Update MSVC installation procedure with info from issue 1521
* Add downgrade option for msys2 mingw compiler issue as suggested by econwang
in issue 1503
* Add note about static linking bug with NDK 16 and API>22
* Updated Precompiled installation packages (markdown)
* Updated Precompiled installation packages (markdown)
* Updated Faq (markdown)
* OBS is renamed and deep link format changed. Apparently recent SLE includes rpm by default too.
* Add links to Conda-Forge and to staticfloat's builds for Julia
* Mention _64 suffix appended to Julia builds with INTERFACE64 (issue 1617)
* Fix unwanted markdown italicization
* Add instruction to change to the generic sgemmkernel implementation from issue 1531
* Added hint about stack size requirements for running lapack-test from PR 1645; fixed markup of section headings
* Add link to RvdG's publications page as a non-paywalled source of the "Goto paper"
* Add section about non-suitability of the IBM XL compiler on POWER8
* Mention cmake version requirement in view or recent issues with link failures in utest etc.
* Replace outdated entry for Sandybridge support with more general section on AVX512, Ryzen and GPU
* Mention Apple Accelerate here as iOS build issue tickets usually die as soon as someone points out this option to the questioner.
* Add section about unexpectedly using an older pre-installed version of the shared library (issue 1822)
* fix markup of new entry
* Mention perl and C compiler as prerequisites on the build host
* Save WIP page
* Updated Notes on parallelism and OpenBLAS (markdown)
* Updated Notes on parallelism and OpenBLAS (markdown)
* Updated Notes on parallelism and OpenBLAS (markdown)
* Updated [WIP] Notes on parallelism and OpenBLAS (markdown)
* Updated [WIP] Notes on parallelism and OpenBLAS (markdown)
* Updated [WIP] Notes on parallelism and OpenBLAS (markdown)
* Destroyed [WIP] Notes on parallelism and OpenBLAS (markdown)
* Updated Faq (markdown)
* Add small note on AVX512 for CentOS/RHEL section.
* document the extension functions
* formatting
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated Download (markdown)
* Add brief general usage information from issue 1925
* Add link to Pete Warden blog article on GEMM rather than just deep-linking to a diagram from it
* Document some of the less useful parameters from param.h
* Updated Installation Guide (markdown)
* Done with issue 2089
* Add note about changed library names for update-alternatives on Debian/Ubuntu
* Updated Home (markdown)
* Add note about using OpenBLAS with CUDA_HPL 2.3 from issue issue 909
* Fix typos in previous commit
* Add pdb instructions fir cross-builds
* Add note about generic QEMU CPUID clashing with existing P2(MMX)
* typo
* typo
* C code syntax highlight
* Updated multithreading section to introduce option USE_LOCKING (issue 2164)
* Updated How to build OpenBLAS for iPhone iOS (ARMv8) (markdown)
* Updated How to build OpenBLAS for iPhone iOS (ARMv8) (markdown)
* Clarify Miniconda/cmake install instructions and redact outdated note about msys2
* Document cmake install step
* Updated How to build OpenBLAS for Android (markdown)
* Add solution for programs that look for libblas.so/liblapack.so
* Add entry for powersaving modes on ARM boards (from issue 2540)
* Add suggestion for speed problems on big.little systems from issue 2589
* Convert the ARMV8 big.little tidbit to a separate topic and update it with more details from the issue ticket
* Add entry about problems caused by using the raw cblas.h (issue 2593)
* complete quote symbol around CPATH environment variable
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Add note about running conda activate when working in a VS window (from issue 2637)
* Add note about (not) compiling with -fbounds-check (ticket 2657)
* Add entry about compile-time NUM_THREADS setting (issue 2678)
* Added some sketchy description of adding cpuids for autodetection, adding targets and architectures
* Markup and typo fixes
* Add openblas_set_affinity from PR 2547
* Created _Footer (markdown)
* Destroyed _Footer (markdown)
* Add LAPACK-like SHGEMM to document the "official" status of the SH prefix
* fix formatting of latest addition
* Move outdated instructions for gcc-based NDK versions to the bottom, add hint about x86 builds
* Add help for cpuid recognition failure
* Update source tree layout & mention extraneous cpu paramerts
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Explain why pure VS builds are slower, and highlight that they do not support DYNAMIC_ARCH
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Mention fortran requirement and incompatibility of ifort with msvc
* preliminary page for understanding the build system, needs a lot more work and input from more knowledgeable people than me
* Updated Build system overview (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* add information for HOSTCC, HOST_CFLAGS
* Added alternative script which was tested on OSX with latest NDK
* added link to targets list
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* added script for x86_64 architecture
* Updated WIP   Build system overview (community made) (markdown)
* Updated WIP   Build system overview (community made) (markdown)
* updated link to FLAME publications list
* Created How to use OpenBLAS on Cortex-M (markdown)
* Updated How to use OpenBLAS on Cortex M (markdown)
* Updated Precompiled installation packages (markdown)
* Updated How to use OpenBLAS on Cortex M (markdown)
* Updated How to use OpenBLAS on Cortex M (markdown)
* Updated How to use OpenBLAS on Cortex M (markdown)
* Update source layout graph and start a short section on benchmarking to collect various pointers from the issue tracker
* Add workaround for building with CMAKE on OSX
* Use actual small headings to fix... weird bullet indent shit
* Oops
* Updated Faq (markdown)
* Updated Faq (markdown)
* Updated How to generate import library for MingW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* Updated How to generate import library for MinGW (markdown)
* explicitly set CMAKE_MT to replace the new cmake default llvm-mt (failing)
* Add -Wl,-rpath,/your_path/OpenBLAS/lib option to gcc linker line in "Link shared library" section + explanation for why it is needed/can be omitted. Also make note that -lgfortran not needed if only making LAPACKE calls.
* Add note explaining that build flags passed to make should also be passed to make install
* give example of install error
* Describe how to build openblas library for win/arm64 targets
* Add Xen to the existing entry for QEMU/KVM based on issue 3445
* Updated Download (markdown)
* Updated Installation Guide (markdown)
* Updated Installation Guide (markdown)
* Revert b8da0e8523b898a2206d1e2fe99dbfb4ebb0ffa8...bc55aade759d2f925689b000828da249e1fc6a1a on Installation Guide
* Revert b0c9a2ee060b8dd0b46b4c58375ef2a743c0363a...cecf8cf67963bd77a0bb97086e3a457a4cee11ff on Download
* Revert bc55aade759d2f925689b000828da249e1fc6a1a...134894a0f09a0e92eef1b9a5c9e63f459d2db55e on Installation Guide
* Add NDK23B example
* Makes iOS build more robust
* Double -isysroot
* Bump up required devtoolset version for AVX-512 intrinsics.
* Updated Installation Guide (markdown)
* Updated How to build OpenBLAS for Windows on ARM64 (markdown)
* Revert b8da0e8523b898a2206d1e2fe99dbfb4ebb0ffa8...75bba70832f8765faee693931c4a9e3eb6c84d98 on Installation Guide
* Revert 75bba70832f8765faee693931c4a9e3eb6c84d98...d171e711a5cd8026b2eb507b249b5e51fa28b2a2 on Installation Guide
* restore Windows link after malicious edit
* Revert 1bcb03dcef85c675aace7f0a755d5aa36ec46eca...f732906434146b1a1ee82abe944a6d51d8f43b81 on Installation Guide
* restore Windows link after malicious edit
* Updated Installation Guide (markdown)
* Bump up AVX-512 devtoolset because of identified packaging issues
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* n-dash html entity instead of -
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Add the bfloat16 functions
* mention AXPBY
* Update building for Apple M1
* Updated How to build OpenBLAS for Windows on ARM64 (markdown)
* Created How to build OpenBLAS for macOS M1 / arm64 (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Add NO_AVX2 build hint for OSX Docker Desktop/xhyve (issues 2194 and 2244)
* Mention the ELF offset/address bug from binutils 2.38 ld
* moved issue 665 (sparse matrix/vector support) to a faq entry
* Update and simplify based on CI experience and 3741
* Updated Download (markdown)
* Updated How to build OpenBLAS for Windows on ARM64 (markdown)
* Revert 0dcee87d486028fbd88c603853cdcae810e025c6...bf3d15e74d42b0b01618b4beb7b9d658fb905118 on Download
* Revert a02f9e470f8e26eda1b8d8601ad2486557721ccf...c862aeb3492c29b487858d43c93676855b60a1f2 on How to build OpenBLAS for Windows on ARM64
* Updated How to use OpenBLAS in Microsoft Visual Studio (markdown)
* Revert 9db97d11d88c801e8c5e9b8d6cc85fb44e5bca61...d2eb48810f3ecc1680900581473005f79c394ca4 on How to use OpenBLAS in Microsoft Visual Studio
* start with the smallest configs, Appveyor and Cirrus
* Updated CI jobs overview (markdown)
* Add Azure CI
* Add github workflows
* Add the crossbuild parts of the dynamic_arch workflow
* remove trailing separator
* Add FreeBSD/Cirrus
* Add ILP64 jobs on Cirrus
* Add C910V and the OSUOSL Jenkins jobs (currently configured for my fork)
* Updated Installation Guide (markdown)
* Expand section on precompiled windows binaries to mention INTERFACE64=0 option
* Remove reference to buildbot (domain reregistered to someone else, issue 4148
* Add OpenMP hints for mixed threads mode from issue 3186
* document NUM_PARALLEL (paraphrased from issue 1735) and expand other entries a bit
* Mention use of llvm-ar rather than gcc-ar in recent NDKs and remove perl requirement
* Add ?gemmt from -1.3.23/0.3.24
* note that LLVM is an optional install with VS2022
* clarify that all tools for the xbuild come with VS2022
* add instructions for cross-compiling from Windows/x86 (copied from issue #4459)

Co-authored-by: Martin Kroeker <martin@ruby.chemie.uni-freiburg.de>
Co-authored-by: xianyi <traits.zhang@gmail.com>
Co-authored-by: Zhang Xianyi <traits.zhang@gmail.com>
Co-authored-by: Andrew <bradatajs@yahoo.com>
Co-authored-by: Elethom <elethomhunter@gmail.com>
Co-authored-by: Isuru Fernando <isuruf@gmail.com>
Co-authored-by: A. Tammy <epsilon-0@users.noreply.github.com>
Co-authored-by: Andrew <16061801+brada4@users.noreply.github.com>
Co-authored-by: Paul MUSTIÈRE <paul.mustiere@gmail.com>
Co-authored-by: TiborGY <gyori.tibor@stud.u-szeged.hu>
Co-authored-by: xoviat <xoviat@users.noreply.github.com>
Co-authored-by: zchothia <zaheer.chothia@gmail.com>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
Co-authored-by: Kevin Yang <kevin@cobaltspeech.com>
Co-authored-by: Mavaddat Javid <javid@mavaddat.ca>
Co-authored-by: Derek Huang <37860662+phetdam@users.noreply.github.com>
Co-authored-by: Iblis Lin <iblis@hs.ntnu.edu.tw>
Co-authored-by: Niyas Sait <niyas.sait@linaro.org>
Co-authored-by: Roman Nazarevych <lemberg.rn@gmail.com>
Co-authored-by: rumiv <100173053+rumiv@users.noreply.github.com>
Co-authored-by: Felix Yan <felixonmars@archlinux.org>
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Timothy Gu <timothygu99@gmail.com>
Co-authored-by: Yubin Wang <wangyubin19890515@163.com>
Co-authored-by: dtidmarsh <tidmarsh.david@gmail.com>
Co-authored-by: hninhninhtun <30315263+hninhninhtun@users.noreply.github.com>
Co-authored-by: masel0 <96305063+masel0@users.noreply.github.com>
Co-authored-by: meow464 <70211708+meow464@users.noreply.github.com>
Co-authored-by: Ankush Chauhan <ankush.26.11@gmail.com>
Co-authored-by: Ashwin Sekhar T K <ashwinyes@users.noreply.github.com>
Co-authored-by: Chunde <xmuhcd@msn.com>
Co-authored-by: Corey Richardson <corey@octayn.net>
Co-authored-by: CristianAndrade94 <117796497+CristianAndrade94@users.noreply.github.com>
Co-authored-by: Dave Liu <dliu@rivierapartners.com>
Co-authored-by: David Hagen <david@appliedbiomath.com>
Co-authored-by: Gökçen Eraslan <gokcen.eraslan@gmail.com>
Co-authored-by: Hong <hong@topbug.net>
Co-authored-by: Iarsv <96173089+Iarsv@users.noreply.github.com>
Co-authored-by: Isuru Fernando <isuru.11@cse.mrt.ac.lk>
Co-authored-by: Jellby <jellby@yahoo.com>
Co-authored-by: Joachim Wagner <jwagner@computing.dcu.ie>
Co-authored-by: Joseph Shen <joseph.smeng@gmail.com>
Co-authored-by: Kevin Ji <kevin.ji@outlook.com>
Co-authored-by: Liming Wang <lmwang@gmail.com>
Co-authored-by: Marco Pompili <marcs.pompili@gmail.com>
Co-authored-by: Marcus Ottosson <konstruktion@gmail.com>
Co-authored-by: Musen <yuan.gan@fandm.edu>
Co-authored-by: Neil Shipp <neilsh@microsoft.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Randall Bohn <rsbohn@familysearch.org>
Co-authored-by: Tiger <51085070+1149859096@users.noreply.github.com>
Co-authored-by: Tillsten <mail.till@gmx.de>
Co-authored-by: Tommy Carozzani <tommy.carozzani@nanolive.ch>
Co-authored-by: Tyler <TT--@users.noreply.github.com>
Co-authored-by: Xingyu Na <asr.naxingyu@gmail.com>
Co-authored-by: Zhuo Zhang <zchrissirhcz@gmail.com>
Co-authored-by: brada4 <bradatajs@yahoo.com>
Co-authored-by: ccy022364 <ccy022364@163.com>
Co-authored-by: davmaz <davmaz@users.noreply.github.com>
Co-authored-by: dkapelyan <30596321+dkapelyan@users.noreply.github.com>
Co-authored-by: eolianoe <eolianoe@users.noreply.github.com>
Co-authored-by: fommil <sam.halliday@Gmail.com>
Co-authored-by: magras <magras@users.noreply.github.com>
Co-authored-by: neitann <96481461+neitann@users.noreply.github.com>
Co-authored-by: raffamaiden <raffamaiden@gmail.com>
Co-authored-by: sogf <93816959+sogf@users.noreply.github.com>
Co-authored-by: wernsaar <wernsaar@googlemail.com>
This commit is contained in:
Matthew Barber 2023-08-04 08:21:01 +01:00
parent dc0338af47
commit 0bb973976b
25 changed files with 1827 additions and 0 deletions

3
.md Normal file
View File

@ -0,0 +1,3 @@
# Installation Guide
test

54
CI-jobs-overview.md Normal file
View File

@ -0,0 +1,54 @@
| Arch|Target CPU|OS|Build system|XComp to|C Compiler|Fortran Compiler|threading|DYN_ARCH|INT64|Libraries| CI Provider| CPU count|
| ------------|---|---|-----------|-------------|----------|----------------|------|------------|----------|-----------|----------|-------|
| x86_64 |Intel 32bit|Windows|CMAKE/VS2015| -|mingw6.3| - | pthreads | - | - | static | Appveyor| |
| x86_64 |Intel |Windows|CMAKE/VS2015| -|mingw5.3| - | pthreads | - | - | static | Appveyor| |
| x86_64 |Intel |Centos5|gmake | -|gcc 4.8 |gfortran| pthreads | + | - | both | Azure | |
| x86_64 |SDE (SkylakeX)|Ubuntu| CMAKE| - | gcc | gfortran | pthreads | - | - | both | Azure | |
| x86_64 |Haswell/ SkylakeX|Windows|CMAKE/VS2017| - | VS2017| - | | - | - | static | Azure | |
| x86_64 | " | Windows|mingw32-make| - |gcc | gfortran | | list | - | both | Azure | |
| x86_64 | " |Windows|CMAKE/Ninja| - |LLVM | - | | - | - | static | Azure | |
| x86_64 | " |Windows|CMAKE/Ninja| - |LLVM | flang | | - | - | static | Azure | |
| x86_64 | " |Windows|CMAKE/Ninja| - |VS2022| flang* | | - | - | static | Azure | |
| x86_64 | " |macOS11|gmake | - | gcc-10|gfortran| OpenMP | + | - | both | Azure | |
| x86_64 | " |macOS11|gmake | - | gcc-10|gfortran| none | - | - | both | Azure | |
| x86_64 | " |macOS12|gmake | - | gcc-12|gfortran|pthreads| - | - | both | Azure | |
| x86_64 | " |macOS11|gmake | - | llvm | - | OpenMP | + | - | both | Azure | |
| x86_64 | " |macOS11|CMAKE | - | llvm | - | OpenMP | no_avx512 | - | static | Azure | |
| x86_64 | " |macOS11|CMAKE | - | gcc-10| gfortran| pthreads | list | - | shared | Azure | |
| x86_64 | " |macOS11|gmake | - | llvm | ifort | pthreads | - | - | both | Azure | |
| x86_64 | " |macOS11|gmake |arm| AndroidNDK-llvm | - | | - | - | both | Azure | |
| x86_64 | " |macOS11|gmake |arm64| XCode 12.4 | - | | + | - | both | Azure | |
| x86_64 | " |macOS11|gmake |arm | XCode 12.4 | - | | + | - | both | Azure | |
| x86_64 | " |Alpine Linux(musl)|gmake| - | gcc | gfortran | pthreads | + | - | both | Azure | |
| arm64 |Apple M1 |OSX |CMAKE/XCode| - | LLVM | - | OpenMP | - | - | static | Cirrus | |
| arm64 |Apple M1 |OSX |CMAKE/Xcode| - | LLVM | - | OpenMP | - | + | static | Cirrus | |
| arm64 |Apple M1 |OSX |CMAKE/XCode|x86_64| LLVM| - | - | + | - | static | Cirrus | |
| arm64 |Neoverse N1|Linux |gmake | - |gcc10.2| -| pthreads| - | - | both | Cirrus | |
| arm64 |Neoverse N1|Linux |gmake | - |gcc10.2| -| pthreads| - | + | both | Cirrus | |
| arm64 |Neoverse N1|Linux |gmake |- |gcc10.2| -| OpenMP | - | - | both |Cirrus | 8 |
| x86_64 | Ryzen| FreeBSD |gmake | - | gcc12.2|gfortran| pthreads| - | - | both | Cirrus | |
| x86_64 | Ryzen| FreeBSD |gmake | | gcc12.2|gfortran| pthreads| - | + | both | Cirrus | |
| x86_64 |GENERIC |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | |
| x86_64 |SICORTEX |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | |
| x86_64 |I6400 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | |
| x86_64 |P6600 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | |
| x86_64 |I6500 |QEMU |gmake| mips64 | gcc | gfortran | pthreads | - | - | static | Github | |
| x86_64 |Intel |Ubuntu |CMAKE| - | gcc-11.3 | gfortran | pthreads | + | - | static | Github | |
| x86_64 |Intel |Ubuntu |gmake| - | gcc-11.3 | gfortran | pthreads | + | - | both | Github | |
| x86_64 |Intel |Ubuntu |CMAKE| - | gcc-11.3 | flang-classic | pthreads | + | - | static | Github | |
| x86_64 |Intel |Ubuntu |gmake| - | gcc-11.3 | flang-classic | pthreads | + | - | both | Github | |
| x86_64 |Intel |macOS12 | CMAKE| - | AppleClang 14 | gfortran | pthreads | + | - | static | Github | |
| x86_64 |Intel |macOS12 | gmake| - | AppleClang 14 | gfortran | pthreads | + | - | both | Github | |
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | - | static | Github | |
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | + | static | Github | |
| x86_64 |Intel 32bit|Windows2022 | CMAKE/Ninja| - | mingw gcc 13 | gfortran | | + | - | static | Github | |
| x86_64 |Intel |Windows2022 | CMAKE/Ninja| - | LLVM 16 | - | | + | - | static | Github | |
| x86_64 |Intel | Windows2022 |CMAKE/Ninja| - | LLVM 16 | - | | + | + | static | Github | |
| x86_64 |Intel | Windows2022 |CMAKE/Ninja| - | gcc 13| - | | + | - | static | Github | |
| x86_64 |Intel| Ubuntu |gmake |mips64|gcc|gfortran|pthreads|+|-|both|Github| |
| x86_64 |generic|Ubuntu |gmake |riscv64|gcc|gfortran|pthreads|-|-|both|Github| |
| x86_64 |Intel|Ubuntu |gmake |mips32|gcc|gfortran|pthreads|-|-|both|Github | |
| x86_64 |Intel|Ubuntu |gmake |ia64|gcc|gfortran|pthreads|-|-|both|Github| |
| x86_64 |C910V|QEmu |gmake |riscv64|gcc|gfortran|pthreads|-|-|both|Github| |
|power |pwr9| Ubuntu |gmake | - |gcc|gfortran|OpenMP|-|-|both|OSUOSL| |
|zarch |z14 | Ubuntu |gmake | - |gcc|gfortran|OpenMP|-|-|both|OSUOSL| |

141
Developer-manual.md Normal file
View File

@ -0,0 +1,141 @@
## Source codes Layout
```
OpenBLAS/
├── benchmark Benchmark codes for BLAS
├── cmake CMakefiles
├── ctest Test codes for CBLAS interfaces
├── driver Implemented in C
│   ├── level2
│   ├── level3
│   ├── mapper
│   └── others Memory management, threading, etc
├── exports Generate shared library
├── interface Implement BLAS and CBLAS interfaces (calling driver or kernel)
│   ├── lapack
│   └── netlib
├── kernel Optimized assembly kernels for CPU architectures
│   ├── alpha Original GotoBLAS kernels for DEC Alpha
│   ├── arm ARMV5,V6,V7 kernels (including generic C codes used by other architectures)
│   ├── arm64 ARMV8
│   ├── generic General kernel codes written in plain C, parts used by many architectures.
│   ├── ia64 Original GotoBLAS kernels for Intel Itanium
│ ├── mips
│   ├── mips64
│   ├── power
| ├── riscv64
| ├── simd Common code for Universal Intrinsics, used by some x86_64 and arm64 kernels
│   ├── sparc
│   ├── x86
│ ├── x86_64
│   └── zarch
├── lapack Optimized LAPACK codes (replacing those in regular LAPACK)
│   ├── getf2
│   ├── getrf
│   ├── getrs
│   ├── laswp
│   ├── lauu2
│   ├── lauum
│   ├── potf2
│   ├── potrf
│   ├── trti2
│ ├── trtri
│   └── trtrs
├── lapack-netlib LAPACK codes from netlib reference implementation
├── reference BLAS Fortran reference implementation (unused)
├── relapack Elmar Peise's recursive LAPACK (implemented on top of regular LAPACK)
├── test Test codes for BLAS
└── utest Regression test
```
A call tree for `dgemm` is as following.
```
interface/gemm.c
driver/level3/level3.c
gemm assembly kernels at kernel/
```
To find the kernel currently used for a particular supported cpu, please check the corresponding `kernel/$(ARCH)/KERNEL.$(CPU)` file.
Here is an example for `kernel/x86_64/KERNEL.HASWELL`
```
...
DTRMMKERNEL = dtrmm_kernel_4x8_haswell.c
DGEMMKERNEL = dgemm_kernel_4x8_haswell.S
...
```
According to the above `KERNEL.HASWELL`, OpenBLAS Haswell dgemm kernel file is `dgemm_kernel_4x8_haswell.S`.
## Optimizing GEMM for a given hardware
Read the Goto paper to understand the algorithm.
Goto, Kazushige; van de Geijn, Robert A. (2008). ["Anatomy of High-Performance Matrix Multiplication"](http://delivery.acm.org/10.1145/1360000/1356053/a12-goto.pdf?ip=155.68.162.54&id=1356053&acc=ACTIVE%20SERVICE&key=A79D83B43E50B5B8%2EF070BBE7E45C3F17%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1517932837_edfe766f1e295d9a7830812371e1d173). ACM Transactions on Mathematical Software 34 (3): Article 12
(The above link is available only to ACM members, but this and many related papers is also available on the pages
of van de Geijn's FLAME project, http://www.cs.utexas.edu/~flame/web/FLAMEPublications.html )
The `driver/level3/level3.c` is the implementation of Goto's algorithm. Meanwhile, you can look at `kernel/generic/gemmkernel_2x2.c`, which is a naive `2x2` register blocking gemm kernel in C.
Then,
* Write optimized assembly kernels. consider instruction pipeline, available registers, memory/cache accessing
* Tuning cache block size, `Mc`, `Kc`, and `Nc`
Note that not all of the cpu-specific parameters in param.h are actively used in algorithms. DNUMOPT only appears as a scale factor in profiling output of the level3 syrk interface code, while its counterpart SNUMOPT (aliased as NUMOPT in common.h) is not used anywhere at all.
SYMV_P is only used in the generic kernels for the symv and chemv/zhemv functions - at least some of those are usually overridden by cpu-specific implementations, so if you start by cloning the existing implementation for a related cpu you need to check its KERNEL file to see if tuning SYMV_P would have any effect at all.
GEMV_UNROLL is only used by some older x86_64 kernels, so not all sections in param.h define it.
Similarly, not all of the cpu parameters like L2 or L3 cache sizes are necessarily used in current kernels for a given model - by all indications the cpu identification code was imported from some other project originally.
## Run OpenBLAS Test
We use netlib blas test, cblas test, and LAPACK test. Meanwhile, we use [BLAS-Tester](https://github.com/xianyi/BLAS-Tester), a modified test tool from ATLAS.
* Run `test` and `ctest` at OpenBLAS. e.g. `make test` or `make ctest`.
* Run regression test `utest` at OpenBLAS.
* Run LAPACK test. e.g. `make lapack-test`.
* Clone [BLAS-Tester](https://github.com/xianyi/BLAS-Tester), which can compare the OpenBLAS result with netlib reference BLAS.
The project makes use of several Continuous Integration (CI) services conveniently interfaced with github to automatically check compilability on a number of platforms.
Lastly, the testsuites included with "numerically heavy" projects like Julia, NumPy, Octave or QuantumEspresso can be used for regression testing.
## Benchmarking
Several simple C benchmarks for performance testing individual BLAS functions are available in the `benchmark` folder, and its `scripts` subdirectory contains corresponding versions for Python, Octave and R.
Other options include
* https://github.com/RoyiAvital/MatlabJuliaMatrixOperationsBenchmark (various matrix operations in Julia and Matlab)
* https://github.com/mmperf/mmperf/ (single-core matrix multiplication)
## Adding autodetection support for a new revision or variant of a supported cpu
Especially relevant for x86_64, a new cpu model may be a "refresh" (die shrink and/or different number of cores) within an existing
model family without significant changes to its instruction set. (e.g. Intel Skylake, Kaby Lake etc. still are fundamentally Haswell,
low end Goldmont etc. are Nehalem). In this case, compilation with the appropriate older TARGET will already lead to a satisfactory build.
To achieve autodetection of the new model, its CPUID (or an equivalent identifier) needs to be added in the `cpuid_<architecture>.c`
relevant for its general architecture, with the returned name for the new type set appropriately. For x86 which has the most complex
cpuid file, there are two functions that need to be edited - get_cpuname() to return e.g. CPUTYPE_HASWELL and get_corename() for the (broader)
core family returning e.g. CORE_HASWELL. (This information ends up in the Makefile.conf and config.h files generated by `getarch`. Failure to
set either will typically lead to a missing definition of the GEMM_UNROLL parameters later in the build, as `getarch_2nd` will be unable to
find a matching parameter section in param.h.)
For architectures where "DYNAMIC_ARCH" builds are supported, a similar but simpler code section for the corresponding runtime detection of the cpu exists in `driver/others/dynamic.c` (for x86) and `driver/others/dynamic_<arch>.c` for other architectures.
Note that for x86 the CPUID is compared after splitting it into its family, extended family, model and extended model parts, so the single decimal
number returned by Linux in /proc/cpuinfo for the model has to be converted back to hexadecimal before splitting into its constituent
digits, e.g. 142 = 8E , translates to extended model 8, model 14.
## Adding dedicated support for a new cpu model
Usually it will be possible to start from an existing model, clone its KERNEL configuration file to the new name to use for this TARGET and eventually replace individual kernels with versions better suited for peculiarities of the new cpu model. In addition, it is necessary to add
(or clone at first) the corresponding section of GEMM_UNROLL parameters in the toplevel param.h, and possibly to add definitions such as USE_TRMM
(governing whether TRMM functions use the respective GEMM kernel or a separate source file) to the Makefiles (and CMakeLists.txt) in the kernel
directory. The new cpu name needs to be added to TargetLists.txt and the cpu autodetection code used by the `getarch` helper program - contained in
the `cpuid_<architecture>.c` file amended to include the CPUID (or equivalent) information processing required (see preceding section).
## Adding support for an entirely new architecture
This endeavour is best started by cloning the entire support structure for 32bit ARM, and within that the ARMV5 cpu in particular as this is implemented through plain C kernels only. An example providing a convenient "shopping list" can be seen in pull request #1526.

27
Document.md Normal file
View File

@ -0,0 +1,27 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
[Installation Guide](Installation-Guide)
[User Manual](User-Manual)
[Developer Manual](Developer-manual)
[Use OpenBLAS in MS Visual Studio](How-to-use-OpenBLAS-in-Microsoft-Visual-Studio)
[Generate import library for MingW](How-to-generate-import-library-for-MingW)
[OpenBLAS Extensions](OpenBLAS-Extensions)
[Related packages that use OpenBLAS](Related-packages-that-use-OpenBLAS)
[Machine List](Machine-List)

31
Donation.md Normal file
View File

@ -0,0 +1,31 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
Thank you for the support.
You can read OpenBLAS statement of receipts and disbursement and cash balance on [google doc](https://docs.google.com/spreadsheet/ccc?key=0AghkTjXe2lDndE1UZml0dGpaUzJmZGhvenBZd1F2R1E&usp=sharing).
## Fundraiser
* [2013.8] [Testbed for OpenBLAS project](https://www.bountysource.com/fundraisers/443-testbed-for-openblas-project)
* completed.
Here is [Backer list](https://github.com/xianyi/OpenBLAS/blob/develop/BACKERS.md).
## For Personal Donation
You can create a bounty for an issue on [bountysource.com](https://www.bountysource.com/trackers/69691-openblas).
## For Hardware Vendors
We welcome the hardware donation, including the latest CPU and boards.

23
Download.md Normal file
View File

@ -0,0 +1,23 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
## Binary Packages
We provide binary packages for the following platform.
* Windows x86/x86_64
* ARM
You can download them from [file hosting on sourceforge.net](http://sourceforge.net/projects/openblas/files/).
## Source
Download the latest [stable version](https://github.com/xianyi/OpenBLAS/releases) from release page.

432
Faq.md Normal file
View File

@ -0,0 +1,432 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
### General questions
+ **[What is BLAS? Why is it important?](#whatblas)**
+ **[What functions are there and how can I call them from my C code?](#whatsinblas)**
+ **[What is OpenBLAS? Why did you create this project?](#what)**
+ **[What's the difference between OpenBLAS and GotoBLAS?](#gotoblas)**
+ **[Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?](#gemmpqr)**
+ **[How can I report a bug?](#reportbug)**
+ **[How to reference OpenBLAS.](#publication)**
+ **[How can I use OpenBLAS in multi-threaded applications?](#multi-threaded)**
+ **[Does OpenBLAS support sparse matrices and/or vectors ?](#sparse)**
+ **[What support is there for recent PC hardware ? What about GPU ?](#recent_hardware)**
+ **[How about the level 3 BLAS performance on Intel Sandy Bridge?](#sandybridge_perf)**
### OS and Compiler
+ **[How can I call an OpenBLAS function in Microsoft Visual Studio?](#MSVC)**
+ **[How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?](#C99_complex_number)**
+ **[I get a SEGFAULT with multi-threading on Linux. What's wrong?](#Linux_SEGFAULT)**
+ **[When I make the library, there is no such instruction: `xgetbv' error. What's wrong?](#xgetbv)**
+ **[My build fails due to the linker error "multiple definition of `dlamc3_'". What is the problem?](#patch_missing)**
+ **[My build worked fine and passed all tests, but running `make lapack-test` ends with segfaults](#xeigtst)**
+ **[How could I disable OpenBLAS threading affinity on runtime?](#no_affinity)**
+ **[How to solve undefined reference errors when statically linking against libopenblas.a](#static_link)**
+ **[Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1](#binutils)**
+ **[Building OpenBLAS in QEMU/KVM/XEN](#qemu)**
+ **[Building OpenBLAS for MIPS](#mips)**
+ **[Building OpenBLAS on POWER fails with the IBM XL compiler](#ppcxl)**
+ **[Replacing system BLAS in Ubuntu/Debian](#debianlts)**
+ **[I built OpenBLAS for use with some other software, but that software cannot find it](#findblas)**
+ **[I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it](#installincludes)**
+ **[Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs](#boundscheck)**
+ **[Build fails with lots of errors about undefined ?GEMM_UNROLL_M](#newcpu)**
+ **[CMAKE/OSX: Build fails with 'argument list too long'](#cmakeosx)**
+ **[Likely problems with AVX2 support in Docker Desktop for OSX](#xhyve)**
### Usage
+ **[Program is Terminated. Because you tried to allocate too many memory regions](#allocmorebuffers)**
+ **[How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH](#choose_target_dynamic)**
+ **[After updating the installed OpenBLAS, a program complains about "undefined symbol gotoblas"](#missgoto)**
+ **[How can I find out at runtime what options the library was built wih ?](#buildoptions)**
+ **[After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?](#wronglibrary)**
+ **[I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL](#cudahpl)**
+ **[Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board](#cpusoffline)**
+ **[Speed varies wildly between individual runs on a typical ARMV8 smartphone processor](#biglittle)**
+ **[I cannot get OpenBLAS to use more than a small subset of available cores on a big system](#numthreads)**
+ **[Getting "ELF load command address/offset not properly aligned" when loading libopenblas.so](#ELFoffset)**
+ **[Using OpenBLAS with OpenMP](#OpenMP)**
<hr noshade="noshade">
### General questions
#### <a name="whatblas"></a>What is BLAS? Why is it important?
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) stands for Basic Linear Algebra Subprograms. BLAS provides standard interfaces for [linear algebra](https://en.wikipedia.org/wiki/Linear_algebra), including BLAS1 (vector-vector operations), BLAS2 (matrix-vector operations), and BLAS3 (matrix-matrix operations). In general, BLAS is the computational kernel ("the bottom of the food chain") in linear algebra or scientific applications. Thus, if BLAS implementation is highly optimized, the whole application can get substantial benefit.
#### <a name="whatsinblas"></a>What functions are there and how can I call them from my C code?
As BLAS is a standardized interface, you can refer to the documentation of its reference implementation at [netlib.org](http://netlib.org/blas/index.html#_blas_routines). Calls from C go through its CBLAS interface,
so your code will need to include the provided cblas.h in addition to linking with -lopenblas.
A single-precision matrix multiplication will look like
```
#include <cblas.h>
...
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, 1.0, A, K, B, N, 0.0, result, N);
```
where M,N,K are the dimensions of your data - see https://petewarden.files.wordpress.com/2015/04/gemm_corrected.png
(This image is part of an article on GEMM in the context of deep learning that is well worth reading in full -
https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/)
#### <a name="what"></a>What is OpenBLAS? Why did you create this project?
OpenBLAS is an open source BLAS library forked from the GotoBLAS2-1.13 BSD version. Since Mr. Kazushige Goto left TACC, GotoBLAS is no longer being maintained. Thus, we created this project to continue developing OpenBLAS/GotoBLAS.
#### <a name="gotoblas"></a>What's the difference between OpenBLAS and GotoBLAS?
In OpenBLAS 0.2.0, we optimized level 3 BLAS on the Intel Sandy Bridge 64-bit OS. We obtained a performance comparable with that Intel MKL.
We optimized level 3 BLAS performance on the [ICT Loongson-3A](http://en.wikipedia.org/wiki/Loongson) CPU. It outperformed GotoBLAS by 135% in a single thread and 120% in 4 threads.
We fixed some GotoBLAS bugs including a SEGFAULT bug on the new Linux kernel, MingW32/64 bugs, and a ztrmm computing error bug on Intel Nehalem.
We also added some minor features, e.g. supporting "make install", compiling without LAPACK and upgrading the LAPACK version to 3.4.2.
You can find the full list of modifications in Changelog.txt.
#### <a name="gemmpqr"></a>Where do parameters GEMM_P, GEMM_Q, GEMM_R come from?
The detailed explanation is probably in the original publication authored by Kazushige Goto - Goto, Kazushige; van de Geijn, Robert A; Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS). Volume 34 Issue 3, May 2008
While this article is paywalled and too old for preprints to be available on arxiv.org, more recent
publications like https://arxiv.org/pdf/1609.00076 contain at least a brief description of the algorithm.
In practice, the values are derived by experimentation to yield the block sizes that give the highest performance. A general rule of thumb for selecting a starting point seems to be that PxQ is about half the size of L2 cache.
#### <a name="reportbug"></a>How can I report a bug?
Please file an issue at this [issue page](https://github.com/xianyi/OpenBLAS/issues) or send mail to the [OpenBLAS mailing list](https://groups.google.com/forum/#!forum/openblas-users).
Please provide the following information: CPU, OS, compiler, and OpenBLAS compiling flags (Makefile.rule). In addition, please describe how to reproduce this bug.
#### <a name="publication"></a>How to reference OpenBLAS.
You can reference our papers in [this page](Publications). Alternatively, you can cite the OpenBLAS homepage http://www.openblas.net.
#### <a name="multi-threaded"></a>How can I use OpenBLAS in multi-threaded applications?
If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following.
* export OPENBLAS_NUM_THREADS=1 in the environment variables.
Or
* Call openblas_set_num_threads(1) in the application on runtime.
Or
* Build OpenBLAS single thread version, e.g. make USE_THREAD=0 USE_LOCKING=1 (see comment below)
If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1
With the increased availability of fast multicore hardware it has unfortunately become clear that the thread management provided by OpenMP is not sufficient to prevent race conditions when OpenBLAS was built single-threaded by USE_THREAD=0 and there are concurrent calls from multiple threads to OpenBLAS functions. In this case,
it is vital to also specify USE_LOCKING=1 (introduced with OpenBLAS 0.3.7).
#### <a name="sparse"></a>Does OpenBLAS support sparse matrices and/or vectors ?
OpenBLAS implements only the standard (dense) BLAS and LAPACK functions with a select few extensions popularized by Intel's MKL. Some
cases can probably be made to work using e.g. GEMV or AXPBY, in general using a dedicated package like SuiteSparse (which can make use of OpenBLAS or equivalent for standard operations) is recommended.
#### <a name="recent_hardware"></a>What support is there for recent PC hardware ? What about GPU ?
As OpenBLAS is a volunteer project, it can take some time for the combination of a capable developer,
free time, and particular hardware to come along, even for relatively common processors. Starting from 0.3.1, support
is being added for AVX 512 (TARGET=SKYLAKEX), requiring a compiler that is capable of handling avx512 intrinsics.
While AMD Zen processors should be autodetected by the build system, as of 0.3.2 they are still handled exactly
like Intel Haswell. There once was an effort to build an OpenCL implementation that one can still find at https://github.com/xianyi/clOpenBLAS , but work on this stopped in 2015.
#### <a name="sandybridge_perf"></a>How about the level 3 BLAS performance on Intel Sandy Bridge?
We obtained a performance comparable with Intel MKL that actually outperformed Intel MKL in some cases.
Here is the result of the DGEMM subroutine's performance on Intel Core i5-2500K Windows 7 SP1 64-bit:
![Single Thread DGEMM Performance on Intel Desktop Sandy Bridge](http://xianyi.github.com/OpenBLAS/dgemm_snb_1thread.png)
<hr noshade="noshade">
### OS and Compiler
#### <a name="MSVC"></a>How can I call an OpenBLAS function in Microsoft Visual Studio?
Please read [this page](How-to-use-OpenBLAS-in-Microsoft-Visual-Studio).
#### <a name="C99_complex_number"></a>How can I use CBLAS and LAPACKE without C99 complex number support (e.g. in Visual Studio)?
Zaheer has fixed this bug. You can now use the structure instead of C99 complex numbers. Please read [this issue page](http://github.com/xianyi/OpenBLAS/issues/95) for details.
[This issue](https://github.com/xianyi/OpenBLAS/issues/305) is for using LAPACKE in Visual Studio.
#### <a name="Linux_SEGFAULT"></a>I get a SEGFAULT with multi-threading on Linux. What's wrong?
This may be related to a bug in the Linux kernel 2.6.32 (?). Try applying the patch segaults.patch to disable mbind using
patch < segfaults.patch
and see if the crashes persist. Note that this patch will lead to many compiler warnings.
#### <a name="xgetbv"></a>When I make the library, there is no such instruction: `xgetbv' error. What's wrong?
Please use GCC 4.4 and later version. This version supports xgetbv instruction. If you use the library for Sandy Bridge with AVX instructions, you should use GCC 4.6 and later version.
On Mac OS X, please use Clang 3.1 and later version. For example, make CC=clang
For the compatibility with old compilers (GCC < 4.4), you can enable NO_AVX flag. For example, make NO_AVX=1
#### <a name="patch_missing"></a>My build fails due to the linker error "multiple definition of `dlamc3_'". What is the problem?
This linker error occurs if GNU patch is missing or if our patch for LAPACK fails to apply.
Background: OpenBLAS implements optimized versions of some LAPACK functions, so we need to disable the reference versions. If this process fails we end with duplicated implementations of the same function.
#### <a name="xeigtst"></a>My build worked fine and passed all tests, but running `make lapack-test` ends with segfaults
Some of the LAPACK tests, notably in xeigtstz, try to allocate around 10MB on the stack. You may need to use
`ulimit -s` to change the default limits on your system to allow this.
#### <a name="no_affinity"></a>How could I disable OpenBLAS threading affinity on runtime?
You can define the OPENBLAS_MAIN_FREE or GOTOBLAS_MAIN_FREE environment variable to disable threading affinity on runtime. For example, before the running,
```
export OPENBLAS_MAIN_FREE=1
```
Alternatively, you can disable affinity feature with enabling NO_AFFINITY=1 in Makefile.rule.
#### <a name="static_link"></a>How to solve undefined reference errors when statically linking against libopenblas.a
On Linux, if OpenBLAS was compiled with threading support (`USE_THREAD=1` by default), custom programs statically linked against `libopenblas.a` should also link to the pthread library e.g.:
```
gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread
```
Failing to add the `-lpthread` flag will cause errors such as:
```
/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':
memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `openblas_fork_handler':
memory.c:(.text+0x440): undefined reference to `pthread_atfork'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_memory_alloc':
memory.c:(.text+0x7a5): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0x825): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(memory.o): In function `blas_shutdown':
memory.c:(.text+0x9e1): undefined reference to `pthread_mutex_lock'
memory.c:(.text+0xa6e): undefined reference to `pthread_mutex_unlock'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_server':
blas_server.c:(.text+0x273): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x287): undefined reference to `pthread_mutex_unlock'
blas_server.c:(.text+0x33f): undefined reference to `pthread_cond_wait'
/opt/OpenBLAS/libopenblas.a(blas_server.o): In function `blas_thread_init':
blas_server.c:(.text+0x416): undefined reference to `pthread_mutex_lock'
blas_server.c:(.text+0x4be): undefined reference to `pthread_mutex_init'
blas_server.c:(.text+0x4ca): undefined reference to `pthread_cond_init'
blas_server.c:(.text+0x4e0): undefined reference to `pthread_create'
blas_server.c:(.text+0x50f): undefined reference to `pthread_mutex_unlock'
...
```
The `-lpthread` is not required when linking dynamically against `libopenblas.so.0`.
#### <a name="binutils"></a>Building OpenBLAS for Haswell or Dynamic Arch on RHEL-6, CentOS-6, Rocks-6.1,Scientific Linux 6
Minimum requirement to actually run AVX2-enabled software like OpenBLAS is kernel-2.6.32-358, shipped with EL6U4 in 2013
The `binutils` package from RHEL6 does not know the instruction `vpermpd` or any other AVX2 instruction. You can download a newer `binutils` package from Enterprise Linux software collections, following instructions here: <br>
https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/ <br>
After configuring repository you need to install devtoolset-?-binutils to get later usable binutils package
```
$ yum search devtoolset-\?-binutils
$ sudo yum install devtoolset-3-binutils
```
once packages are installed check the correct name for SCL redirection set to enable new version
```
$ scl --list
devtoolset-3
rh-python35
```
Now just prefix your build commands with respective redirection:
```
$ scl enable devtoolset-3 -- make DYNAMIC_ARCH=1
```
AVX-512 (SKYLAKEX) support requires devtoolset-8-gcc-gfortran (which exceeds formal requirement for AVX-512 because of packaging issues in earlier packages) which dependency-installs respective binutils and gcc or later and kernel 2.6.32-696 aka 6U9 or 3.10.0-327 aka 7U2 or later to run. In absence of abovementioned toolset OpenBLAS will fall back to AVX2 instructions in place of AVX512 sacrificing some performance on SKYLAKE-X platform.
#### <a name="qemu"></a>Building OpenBLAS in QEMU/KVM/XEN
By default, QEMU reports the CPU as "QEMU Virtual CPU version 2.2.0", which shares CPUID with existing 32bit CPU even in 64bit virtual machine, and OpenBLAS recognizes it as PENTIUM2. Depending on the exact combination of CPU features the hypervisor choses to expose, this may not correspond to any CPU that exists, and OpenBLAS will error when trying to build. To fix this, pass `-cpu host` or `-cpu passthough` to QEMU, or another CPU model.
Similarly, the XEN hypervisor may not pass through all features of the host cpu while reporting the cpu type itself correctly, which can
lead to compiler error messages about an "ABI change" when compiling AVX512 code. Again changing the Xen configuration by running e.g.
"xen-cmdline --set-xen cpuid=avx512" should get around this (as would building OpenBLAS for an older cpu lacking that particular feature, e.g. TARGET=HASWELL)
#### <a name="mips"></a>Building OpenBLAS for MIPS
For mips targets you will need latest toolchains
P5600 - MTI GNU/Linux Toolchain
I6400, P6600 - IMG GNU/Linux Toolchain
The download link is below
(http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-03/downloads.html)
You can use following commandlines for builds
IMG_TOOLCHAIN_DIR={full IMG GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
IMG_GCC_PREFIX=mips-img-linux-gnu
IMG_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}
I6400 Build (n32):
make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400
I6400 Build (n64):
make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=I6400
P6600 Build (n32):
make BINARY=32 BINARY32=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL -mabi=n32" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P6600
P6600 Build (n64):
make BINARY=64 BINARY64=1 CC=$IMG_TOOLCHAIN-gcc AR=$IMG_TOOLCHAIN-ar FC="$IMG_TOOLCHAIN-gfortran -EL" RANLIB=$IMG_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS="$CFLAGS" LDFLAGS="$CFLAGS" TARGET=P6600
MTI_TOOLCHAIN_DIR={full MTI GNU/Linux Toolchain path including "bin" directory -- for example, /opt/linux_toolchain/bin}
MTI_GCC_PREFIX=mips-mti-linux-gnu
MTI_TOOLCHAIN=${IMG_TOOLCHAIN_DIR}/${IMG_GCC_PREFIX}
P5600 Build:
make BINARY=32 BINARY32=1 CC=$MTI_TOOLCHAIN-gcc AR=$MTI_TOOLCHAIN-ar FC="$MTI_TOOLCHAIN-gfortran -EL" RANLIB=$MTI_TOOLCHAIN-ranlib HOSTCC=gcc CFLAGS="-EL" FFLAGS=$CFLAGS LDFLAGS=$CFLAGS TARGET=P5600
#### <a name="ppcxl"></a>Building OpenBLAS on POWER fails with IBM XL
Trying to compile OpenBLAS with IBM XL ends with error messages about unknown register names
like "vs32". Working around these by using known alternate names for the vector registers only leads to another assembler error about unsupported constraints. This is a known deficiency in the IBM compiler at least up to and including 16.1.0 (and in the POWER version of clang, from which it is derived) - use gcc instead. (See issues #1078
and #1699 for related discussions)
#### <a name="debianlts"></a>Replacing system BLAS/updating APT OpenBLAS in Mint/Ubuntu/Debian
Debian and Ubuntu LTS versions provide OpenBLAS package which is not updated after initial release, and under circumstances one might want to use more recent version of OpenBLAS e.g. to get support for newer CPUs
Ubuntu and Debian provides 'alternatives' mechanism to comfortably replace BLAS and LAPACK libraries systemwide.
After successful build of OpenBLAS (with DYNAMIC_ARCH set to 1)
```
$ make clean
$ make DYNAMIC_ARCH=1
$ sudo make DYNAMIC_ARCH=1 install
```
One can redirect BLAS and LAPACK alternatives to point to source-built OpenBLAS
First you have to install NetLib LAPACK reference implementation (to have alternatives to replace):
```
$ sudo apt install libblas-dev liblapack-dev
```
Then we can set alternative to our freshly-built library:
```
$ sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0 41 \
--slave /usr/lib/liblapack.so.3 liblapack.so.3 /opt/OpenBLAS/lib/libopenblas.so.0
```
Or remove redirection and switch back to APT-provided BLAS implementation order:
```
$ sudo update-alternatives --remove libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0
```
In recent versions of the distributions, the installation path for the libraries has been changed to include the name of the host architecture, like /usr/lib/x86_64-linux-gnu/blas/libblas.so.3 or libblas.so.3.x86_64-linux-gnu. Use ```$ update-alternatives --display libblas.so.3```
to find out what layout your system has.
#### <a name="findblas"></a>I built OpenBLAS for use with some other software, but that software cannot find it
Openblas installs as a single library named libopenblas.so, while some programs may be searching for a separate libblas.so and liblapack.so so you may need to create appropriate symbolic links (`ln -s libopenblas.so libblas.so;
ln -s libopenblas.so liblapack.so`) or copies. Also make sure that the installation location (usually /opt/OpenBLAS/lib or /usr/local/lib) is among the library search paths of your system.
#### <a name="installincludes"></a>I included cblas.h in my program, but the compiler complains about a missing common.h or functions from it
You probably tried to include a cblas.h that you simply copied from the OpenBLAS source, instead you need to run
`make install` after building OpenBLAS and then use the modified cblas.h that this step builds in the installation
path (usually either /usr/local/include, /opt/OpenBLAS/include or whatever you specified as PREFIX= on the `make install`)
#### <a name="boundscheck"></a>Compiling OpenBLAS with gcc's -fbounds-check actually triggers aborts in programs
This is due to different interpretations of the (informal) standard for passing characters as arguments between C and FORTRAN functions. As the method for storing text differs in the two languages, when C calls Fortran the text length is passed as an "invisible" additional parameter.
Historically, this has not been required when the text is just a single character, so older code like the Reference-LAPACK bundled with OpenBLAS
does not do it. Recently gcc's checking has changed to require it, but there is no consensus yet if and how the existing LAPACK (and many other codebases) should adapt. (And for actual compilation, gcc has mostly backtracked and provided compatibility options - hence the default build settings in the OpenBLAS Makefiles add -fno-optimize-sibling-calls to the gfortran options to prevent miscompilation with "affected" versions. See ticket 2154 in the issue tracker for more details and links)
<hr noshade="noshade">
#### <a name="newcpu"></a>Build fails with lots of errors about undefined ?GEMM_UNROLL_M
Your cpu is apparently too new to be recognized by the build scripts, so they failed to assign appropriate parameters for the block algorithm.
Do a `make clean` and try again with TARGET set to one of the cpu models listed in `TargetList.txt` - for x86_64 this will usually be HASWELL.
#### <a name="cmakeosx"></a>CMAKE/OSX: Build fails with 'argument list too long'
This is a limitation in the maximum length of a command on OSX, coupled with how CMAKE works. You should be able to work around this
by adding the option `-DCMAKE_Fortran_USE_RESPONSE_FILE_FOR_OBJECTS=1` to your CMAKE arguments.
#### <a name="xhyve"></a>Likely problems with AVX2 support in Docker Desktop for OSX
There have been a few reports of wrong calculation results and build-time test failures when building in a container environment managed by the OSX version of Docker Desktop, which uses the xhyve virtualizer underneath. Judging from these reports, AVX2 support in xhyve appears to be subtly broken but a corresponding ticket in the xhyve issue tracker has not drawn any reaction or comment since 2019. Therefore it is strongly recommended to build OpenBLAS with the NO_AVX2=1 option when inside a container under (or for later use with) the Docker Desktop environment on Intel-based Apple hardware.
### Usage
#### <a name="allocmorebuffers"></a>Program is Terminated. Because you tried to allocate too many memory regions
In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following.
```
#define NUM_BUFFERS (MAX_CPU_NUMBER * 2)
```
This error indicates that the program exceeded the number of buffers.
Please build OpenBLAS with larger `NUM_THREADS`. For example, `make NUM_THREADS=32` or `make NUM_THREADS=64`.
In `Makefile.system`, we will set `MAX_CPU_NUMBER=NUM_THREADS`.
#### <a name="choose_target_dynamic"></a>How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH
The environment variable which control the kernel selection is `OPENBLAS_CORETYPE` (see `driver/others/dynamic.c`)
e.g. `export OPENBLAS_CORETYPE=Haswell`. And the function `char* openblas_get_corename()` returns the used target.
#### <a name="missgoto"></a>After updating the installed OpenBLAS, a program complains about "undefined symbol gotoblas"
This symbol gets defined only when OpenBLAS is built with "make DYNAMIC_ARCH=1" (which is what distributors will choose to ensure support for more than just one CPU type).
#### <a name="buildoptions"></a>How can I find out at runtime what options the library was built with ?
OpenBLAS has two utility functions that may come in here:
openblas_get_parallel() will return 0 for a single-threaded library, 1 if multithreading without OpenMP, 2 if built with USE_OPENMP=1
openblas_get_config() will return a string containing settings such as USE64BITINT or DYNAMIC_ARCH that were active at build time, as well as the target cpu (or in case of a dynamic_arch build, the currently detected one).
#### <a name="wronglibrary"></a>After making OpenBLAS, I find that the static library is multithreaded, but the dynamic one is not ?
The shared OpenBLAS library you built is probably working fine as well, but your program may be picking up a different (probably single-threaded) version from one of the standard system paths like /usr/lib on startup.
Running `ldd /path/to/your/program` will tell you which library the linkage loader will actually use.
Specifying the "correct" library location with the `-L` flag (like `-L /opt/OpenBLAS/lib`) when linking your program only defines which library will be used to see if all symbols _can_ be resolved, you will need to add an rpath entry to the binary (using `-Wl,rpath=/opt/OpenBLAS/lib`) to make it request searching that location. Alternatively, remove the "wrong old" library (if you can), or set LD_LIBRARY_PATH to the desired location before running your program.
#### <a name="cudahpl"></a>I want to use OpenBLAS with CUDA in the HPL 2.3 benchmark code but it keeps looking for Intel MKL
You need to edit file src/cuda/cuda_dgemm.c in the NVIDIA version of HPL, change the "handle2" and "handle" dlopen calls to use libopenblas.so instead of libmkl_intel_lp64.so, and add an trailing underscore in the dlsym lines for dgemm_mkl and dtrsm_mkl (like `dgemm_mkl = (void(*)())dlsym(handle, “dgemm_”);`)
#### <a name="cpusoffline"></a>Multithreaded OpenBLAS runs no faster or is even slower than singlethreaded on my ARMV7 board
The power saving mechanisms of your board may have shut down some cores, making them invisible to OpenBLAS in its startup phase. Try bringing them online before starting your calculation.
#### <a name="biglittle"></a>Speed varies wildly between individual runs on a typical ARMV8 smartphone processor
Check the technical specifications, it could be that the SoC combines fast and slow cpus and threads can end up on either. In that case, binding the process to specific cores e.g. by setting `OMP_PLACES=cores` may help. (You may need to experiment with OpenMP options, it has been reported that using `OMP_NUM_THREADS=2 OMP_PLACES=cores` caused
a huge drop in performance on a 4+4 core chip while `OMP_NUM_THREADS=2 OMP_PLACES=cores(2)` worked as intended - as did OMP_PLACES=cores with 4 threads)
#### <a name="numthreads"></a>I cannot get OpenBLAS to use more than a small subset of available cores on a big system
Multithreading support in OpenBLAS requires the use of internal buffers for sharing partial results, the number and size of which is defined at compile time. Unless you specify NUM_THREADS in your make or cmake command, the build scripts try to autodetect the number of cores available in your build host to size the library to match. This unfortunately means that if you move the resulting binary from a small "front-end node" to a larger "compute node" later, it will still be limited to the hardware capabilities of the original system. The solution is to set NUM_THREADS to a number big enough to encompass the biggest systems you expect to run the binary on - at runtime, it will scale down the maximum number of threads it uses to match the number of cores physically available.
#### <a name="ELFoffset"></a>Getting "ELF load command address/offset not properly aligned" when loading libopenblas.so
If you get a message "error while loading shared libraries: libopenblas.so.0: ELF load command address/offset not properly aligned" when starting a program that is (dynamically) linked to OpenBLAS, this is very likely due to a bug in the GNU linker (ld) that is part of the
GNU binutils package. This error was specifically observed on older versions of Ubuntu Linux updated with the (at the time) most recent binutils version 2.38, but an internet search turned up sporadic reports involving various other libraries dating back several years. A bugfix was created by the binutils developers and should be available in later versions of binutils.(See issue 3708 for details)
#### <a name="OpenMP"></a>Using OpenBLAS with OpenMP
OpenMP provides its own locking mechanisms, so when your code makes BLAS/LAPACK calls from inside OpenMP parallel regions it is imperative
that you use an OpenBLAS that is built with USE_OPENMP=1, as otherwise deadlocks might occur. Furthermore, OpenBLAS will automatically restrict itself to using only a single thread when called from an OpenMP parallel region. When it is certain that calls will only occur
from the main thread of your program (i.e. outside of omp parallel constructs), a standard pthreads build of OpenBLAS can be used as well. In that case it may be useful to tune the linger behaviour of idle threads in both your OpenMP program (e.g. set OMP_WAIT_POLICY=passive) and OpenBLAS (by redefining the THREAD_TIMEOUT variable at build time, or setting the environment variable OPENBLAS_THREAD_TIMEOUT smaller than the default 26) so that the two alternating thread pools do not unnecessarily hog the cpu during the handover.

View File

@ -0,0 +1,14 @@
Werner found a lot of bugs about lapack testing. In 0.2.9 version, we used a work-around,e.g. fallback to the old kernel, replacing the optimized kernel with reference kernel. We must fix them in future release.
* ~~Nehalem dgemm kernel. Fallback to core2 kernel.~~
* ~~Sandy bridge sgemm kernel. Fallback to Nehalem kernels.~~
* ~~Sandy bridge cgemm kernel. Fallback to Nehalem kernels.~~
* sbmv, zsbmv, smp bug (interface/sbmv.c zsbmv.c). Now, it is only sequential.
* zhbmv, smp bug.
* ~~scal, zscal x86/x86_64 kernels. Fallback to C implementation.~~
* ~~potri, zpotri. Fallback to LAPACK reference implementation.~~
* ~~lauu2, zlauu2. Fallback to LAPACK reference implementation.~~
* ~~trtri, ztrtri. Fallback to LAPACK reference implementation.~~
* ~~lauum, zlauum. Fallback to LAPACK reference implementation.~~
* ~~trti2, ztrti2. Fallback to LAPACK reference implementation.~~
* Disable SMP in ger.c and zger.c

56
Home.md Normal file
View File

@ -0,0 +1,56 @@
# OpenBLAS Wiki
<hr noshade="noshade">
<center>
<a href="wiki/Home"> [Home]</a>
<a href="wiki/Document"> [Document]</a>
<a href="wiki/faq"> [FAQ]</a>
<a href="wiki/publications"> [Publications]</a>
<a href="wiki/download"> [Download]</a>
<a href="wiki/Mailing-List">[Mailing List]</a>
<a href="wiki/Donation">[Donation]</a>
</center>
<hr noshade="noshade">
[![Build Status](https://travis-ci.org/xianyi/OpenBLAS.svg?branch=develop)](https://travis-ci.org/xianyi/OpenBLAS)
### Introduction
OpenBLAS is an optimized Basic Linear Algebra Subprograms (BLAS) library based on [GotoBLAS2](https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2) 1.13 BSD version.
### License
OpenBLAS is licensed under the 3-clause BSD license. Full license text follows:
Copyright (c) 2011-2015, The OpenBLAS Project
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. Neither the name of the OpenBLAS project nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE OPENBLAS PROJECT OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
### Acknowledgements
This work is partially supported by
* Research and Development of Compiler System and Toolchain for Domestic CPU, National S&T Major Projects: Core Electronic Devices, High-end General Chips and Fundamental Software (No.2009ZX01036-001-002)
* National High-tech R&D Program of China (Grant No.2012AA010903)

View File

@ -0,0 +1,190 @@
## Prerequisites
In addition to the Android NDK, you will need a C compiler on the build host as this is currently
required by the OpenBLAS build environment.
# Building with android NDK using clang compiler
Around version 11 Android NDKs stopped supporting gcc, so you would need to use clang to compile OpenBLAS. clang is supported from OpenBLAS 0.2.20 version onwards. See below sections on how to build with clang for ARMV7 and ARMV8 targets. The same basic principles as described below for ARMV8 should also apply to building an x86 or x86_64 version (substitute something like NEHALEM for the target instead of ARMV8 and replace all the aarch64 in the toolchain paths obviously)
"Historic" notes:
Since version 19 the default toolchain is provided as a standalone toolchain, so building one yourself following [building a standalone toolchain](http://developer.android.com/ndk/guides/standalone_toolchain.html) should no longer be necessary.
If you want to use static linking with an old NDK version older than about r17, you need to choose an API level below 23 currently due to NDK bug 272 (https://github.com/android-ndk/ndk/issues/272 , the libc.a lacks a definition of stderr) that will probably be fixed in r17 of the NDK.
## Build ARMV7 with clang
```
# Set path to ndk-bundle
export NDK_BUNDLE_DIR=/path/to/ndk-bundle
# Set the PATH to contain paths to clang and arm-linux-androideabi-* utilities
export PATH=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
# Set LDFLAGS so that the linker finds the appropriate libgcc
export LDFLAGS="-L${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x"
# Set the clang cross compile flags
export CLANG_FLAGS="-target arm-linux-androideabi -marm -mfpu=vfp -mfloat-abi=softfp --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/"
#OpenBLAS Compile
make TARGET=ARMV7 ONLY_CBLAS=1 AR=ar CC="clang ${CLANG_FLAGS}" HOSTCC=gcc ARM_SOFTFP_ABI=1 -j4
```
On a Mac, it may also be necessary to give the complete path to the `ar` utility in the make command above, like so:
```
AR=${NDK_BUNDLE_DIR}/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc-ar
```
otherwise you may get a linker error complaining about a "malformed archive header name at 8" when the native OSX ar command was invoked instead. Note that with recent NDK versions, the AR tool may be named `llvm-ar` rather than what is assumed above.
## Build ARMV8 with clang
```
# Set path to ndk-bundle
export NDK_BUNDLE_DIR=/path/to/ndk-bundle/
# Export PATH to contain directories of clang and aarch64-linux-android-* utilities
export PATH=${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/bin/:${NDK_BUNDLE_DIR}/toolchains/llvm/prebuilt/linux-x86_64/bin:$PATH
# Setup LDFLAGS so that loader can find libgcc and pass -lm for sqrt
export LDFLAGS="-L${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x -lm"
# Setup the clang cross compile options
export CLANG_FLAGS="-target aarch64-linux-android --sysroot ${NDK_BUNDLE_DIR}/platforms/android-23/arch-arm64 -gcc-toolchain ${NDK_BUNDLE_DIR}/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/"
# Compile
make TARGET=ARMV8 ONLY_CBLAS=1 AR=ar CC="clang ${CLANG_FLAGS}" HOSTCC=gcc -j4
```
Note: Using TARGET=CORTEXA57 in place of ARMV8 will pick up better optimized routines. Implementations for CORTEXA57 target is compatible with all other armv8 targets.
Note: For NDK 23b, something as simple as
```
export PATH=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
make HOSTCC=gcc CC=/opt/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android31-clang ONLY_CBLAS=1 TARGET=ARMV8
```
appears to be sufficient on Linux. On OSX, setting AR to the ar provided in the "bin" path of the NDK (probably llvm-ar) is also necessary.
## Alternative script which was tested on OSX with NDK(21.3.6528147)
This script will build openblas for 3 architecture (ARMV7,ARMV8,X86) and put them with `sudo make install` to `/opt/OpenBLAS/lib`.
Of course you can also copy only the section that is of interest to you - also notice that the AR= line may need adapting to the
name of the ar tool provided in your $TOOLCHAIN/bin - for example llvm-ar in some recent NDK versions.
```
export NDK=YOUR_PATH_TO_SDK/Android/sdk/ndk/21.3.6528147
export TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/darwin-x86_64
make clean
make \
TARGET=ARMV7 \
ONLY_CBLAS=1 \
CC="$TOOLCHAIN"/bin/armv7a-linux-androideabi21-clang \
AR="$TOOLCHAIN"/bin/arm-linux-androideabi-ar \
HOSTCC=gcc \
ARM_SOFTFP_ABI=1 \
-j4
sudo make install
make clean
make \
TARGET=CORTEXA57 \
ONLY_CBLAS=1 \
CC=$TOOLCHAIN/bin/aarch64-linux-android21-clang \
AR=$TOOLCHAIN/bin/aarch64-linux-android-ar \
HOSTCC=gcc \
-j4
sudo make install
make clean
make \
TARGET=ATOM \
ONLY_CBLAS=1 \
CC="$TOOLCHAIN"/bin/i686-linux-android21-clang \
AR="$TOOLCHAIN"/bin/i686-linux-android-ar \
HOSTCC=gcc \
ARM_SOFTFP_ABI=1 \
-j4
sudo make install
# This will build for x86_64
make clean
make \
TARGET=ATOM BINARY=64\
ONLY_CBLAS=1 \
CC="$TOOLCHAIN"/bin/x86_64-linux-android21-clang \
AR="$TOOLCHAIN"/bin/x86_64-linux-android-ar \
HOSTCC=gcc \
ARM_SOFTFP_ABI=1 \
-j4
sudo make install
```
Also you can find full list of target architectures in [TargetsList.txt](https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt)
***
anything below this line should be irrelevant nowadays unless you need to perform software archeology
***
## Building OpenBLAS with very old gcc-based versions of the NDK, without Fortran
The prebuilt Android NDK toolchains do not include Fortran, hence parts like LAPACK cannot be built. You can still build OpenBLAS without it. For instructions on how to build OpenBLAS with Fortran, see the [next section](#building-openblas-with-fortran).
To use easily the prebuilt toolchains, follow [building a standalone toolchain](http://developer.android.com/ndk/guides/standalone_toolchain.html) for your desired architecture.
This would be `arm-linux-androideabi-gcc-4.9` for ARMV7 and `aarch64-linux-android-gcc-4.9` for ARMV8.
You can build OpenBLAS (0.2.19 and earlier) with:
```
# Add the toolchain to your path
export PATH=/path/to/standalone-toolchain/bin:$PATH
# Build without Fortran for ARMV7
make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1 libs
# Build without Fortran for ARMV8
make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1 libs
```
Since we are cross-compiling, we make the `libs` recipe, not `all`. Otherwise you will get errors when trying to link/run tests as versions up to and including 0.2.19 cannot build a shared library for Android.
From 0.2.20 on, you should leave off the "libs" to get a full build, and you may want to use the softfp ABI instead of the deprecated hardfp one on ARMV7 so you would use
```
# Add the toolchain to your path
export PATH=/path/to/standalone-toolchain/bin:$PATH
# Build without Fortran for ARMV7
make TARGET=ARMV7 ARM_SOFTFP_ABI=1 HOSTCC=gcc CC=arm-linux-androideabi-gcc NOFORTRAN=1
# Build without Fortran for ARMV8
make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc NOFORTRAN=1
```
If you get an error about stdio.h not being found, you need to specify your sysroot in the CFLAGS argument to `make` like
```CFLAGS=--sysroot=$NDK/platforms/android-16/arch-arm```
When you are done, install OpenBLAS into the desired directory. Be sure to also use all command line options
here that you specified for building, otherwise errors may occur as it tries to install things you did not build:
```
make PREFIX=/path/to/install-dir TARGET=... install
```
## Building OpenBLAS with Fortran
Instructions on how to build the GNU toolchains with Fortran can be found [here](https://github.com/buffer51/android-gfortran). The [Releases section](https://github.com/buffer51/android-gfortran/releases) provides prebuilt versions, use the standalone one.
You can build OpenBLAS with:
```
# Add the toolchain to your path
export PATH=/path/to/standalone-toolchain-with-fortran/bin:$PATH
# Build with Fortran for ARMV7
make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc FC=arm-linux-androideabi-gfortran libs
# Build with LAPACK for ARMV8
make TARGET=ARMV8 BINARY=64 HOSTCC=gcc CC=aarch64-linux-android-gcc FC=aarch64-linux-android-gfortran libs
```
As mentioned above you can leave off the `libs` argument here when building 0.2.20 and later, and you may want to add ARM_SOFTFP_ABI=1 when building for ARMV7.
## Linking OpenBLAS (0.2.19 and earlier) for ARMV7
If you are using `ndk-build`, you need to set the ABI to hard floating points in your Application.mk:
```
APP_ABI := armeabi-v7a-hard
```
This will set the appropriate flags for you. If you are not using `ndk-build`, you will want to add the following flags:
```
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
```
From 0.2.20 on, it is also possible to build for the softfp ABI by specifying ARM_SOFTFP_ABI=1 during the build.
In that case, also make sure that all your dependencies are compiled with -mfloat-abi=softfp as well, as mixing
"hard" and "soft" floating point ABIs in a program will make it crash.

View File

@ -0,0 +1,77 @@
This page describes how to natively build OpenBLAS library for windows on arm64 targets.
See [below](#xcomp) for how to cross-compile OpenBLAS for WoA on an x86_64 Windows host.
_We hope that the procedure can be simplified for/by LLVM 17+ and its flang-new compiler for the FORTRAN parts (LAPACK) of the code but we have no means of verifying that as of January 2024._
# Prerequisite
Following tools needs to be installed
## 1. Download and install clang for windows on arm
Find the latest LLVM build for WoA from [LLVM release page](https://releases.llvm.org/)
E.g: LLVM 12 build for WoA64 can be found [here](https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.0/LLVM-12.0.0-woa64.exe)
Run the LLVM installer and ensure that LLVM is added to environment PATH.
## 2. Download and install classic flang for windows on arm
Classic flang is the only available FORTRAN compiler for windows on arm for now and a pre-release build can be found [here](https://github.com/kaadam/flang/releases/tag/v0.1)
There is no installer for classic flang and the zip package can be extracted and the path needs to be added to environment PATH.
E.g: on PowerShell
`$env:Path += ";C:\flang_woa\bin"`
# Build
The following steps describe how to build the static library for OpenBLAS with and without LAPACK
## 1. Build OpenBLAS static library with BLAS and LAPACK routines with Make
Following command can be used to build OpenBLAS static library with BLAS and LAPACK routines
`make CC="clang-cl" HOSTCC="clang-cl" AR="llvm-ar" BUILD_WITHOUT_LAPACK=0 NOFORTRAN=0 DYNAMIC_ARCH=0 TARGET=ARMV8 ARCH=arm64 BINARY=64 USE_OPENMP=0 PARALLEL=1 RANLIB="llvm-ranlib" MAKE=make F_COMPILER=FLANG FC=FLANG FFLAGS_NOOPT="-march=armv8-a -cpp" FFLAGS="-march=armv8-a -cpp" NEED_PIC=0 HOSTARCH=arm64 libs netlib`
## 2. Build static library with BLAS routines using CMake
Classic flang has compatibility issues with cmake hence only BLAS routines can be compiled with CMake
`mkdir build`
`cd build`
`cmake .. -G Ninja -DCMAKE_C_COMPILER=clang -DBUILD_WITHOUT_LAPACK=1 -DNOFORTRAN=1 -DDYNAMIC_ARCH=0 -DTARGET=ARMV8 -DARCH=arm64 -DBINARY=64 -DUSE_OPENMP=0 -DCMAKE_SYSTEM_PROCESSOR=ARM64 -DCMAKE_CROSSCOMPILING=1 -DCMAKE_SYSTEM_NAME=Windows`
`cmake --build . --config Release`
# Known issue
## `getarch.exe` execution error
If you notice that platform-specific headers by `getarch.exe` are not generated correctly, It could be due to a known debug runtime DLL issue for arm64 platforms. Please check out [link](https://linaro.atlassian.net/wiki/spaces/WOAR/pages/28677636097/Debug+run-time+DLL+issue#Workaround) for the workaround.
# <a id="xcomp"></a>Alternative: cross-compiling on Windows x86
## Prerequisites:
a working installation of LLVM and Ninja (the versions included in the latest Visual Studio 2022 will do, if you install its optional Llvm.Clang component)
## Build
1. Load into the appropriate cross compiling developer shell (in this case powershell, for cmd.exe it would be vcvarsamd64_arm64.bat):
'C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\Tools\Launch-VsDevShell.ps1' -Arch arm64 -HostArch amd64
2. Change to the folder where you unpacked the OpenBLAS sources if you haven't already, and create a subfolder for building.
Then change into that folder.
3. Invoke the following CMake command
```
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DTARGET=ARMV8 -DCMAKE_CROSSCOMPILING=ON -DCMAKE_SYSTEM_NAME="Windows" -DARCH=arm -DBINARY=64 -DCMAKE_SYSTEM_PROCESSOR=ARM64 -DCMAKE_C_COMPILER=clang-cl -DCMAKE_C_COMPILER_TARGET=arm64-pc-windows-msvc -DCMAKE_ASM_COMPILER_TARGET=arm64-pc-windows-msvc.
```
(Note the use of clang-cl instead of clang, and adding in the _COMPILER_TARGET entries. Without C_COMPILER_TARGET, clang-cl will be trying to compile x64 machine code and not make it past the cmake generation step. Without ASM_COMPILER_TARGET the assembly files in openblas will fail to assemble because the assembler is trying to interpret them as x64 assembly.)
4. Invoke Ninja for the actual compilation

View File

@ -0,0 +1,14 @@
# How to build OpenBLAS for iPhone/iOS
As none of the current developers uses iOS, the following instructions are what was found to work in our Azure CI setup, but as far as we know this builds a fully working OpenBLAS for this platform.
Go to the directory where you unpacked OpenBLAS,and enter the following commands:
```
CC=/Applications/Xcode_12.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
CFLAGS= -O2 -Wno-macro-redefined -isysroot /Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk -arch arm64 -miphoneos-version-min=10.0
make TARGET=ARMV8 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1
```
Adjust MIN_IOS_VERSION as necessary for your installation, e.g. change the version number
to the minimum iOS version you want to target and execute this file to build the library.

View File

@ -0,0 +1 @@
On newer versions of Xcode and on arm64, you might need to compile with a newer macOS target (11.0) than the default (10.8) with `MACOSX_DEPLOYMENT_TARGET=11.0`, or switch your command-line tools to use an older SDK (e.g., [13.1](https://developer.apple.com/download/all/?q=Xcode%2013)).

View File

@ -0,0 +1,36 @@
Microsoft Windows has this thing called "import libraries". You don't need it in MinGW because the `ld` linker from GNU Binutils is smart, but you may still want it for whatever reason.
## Make the `.def`
Import libraries are compiled from a list of what symbols to use, `.def`. This should be already in your `exports` directory: `cd OPENBLAS_TOP_DIR/exports`.
## Making a MinGW import library
MinGW import libraries have the suffix `.a`, same as static libraries. (It's actually more common to do `.dll.a`...)
You need to first prepend `libopenblas.def` with a line `LIBRARY libopenblas.dll`:
cat <(echo "LIBRARY libopenblas.dll") libopenblas.def > libopenblas.def.1
mv libopenblas.def.1 libopenblas.def
Now it probably looks like:
LIBRARY libopenblas.dll
EXPORTS
caxpy=caxpy_ @1
caxpy_=caxpy_ @2
...
Then, generate the import library: `dlltool -d libopenblas.def -l libopenblas.a`
Again, there is basically **no point** in making an import library for use in MinGW. It actually slows down linking.
## Making a MSVC import library
Unlike MinGW, MSVC absolutely requires an import library. Now the C ABI of MSVC and MinGW are actually identical, so linking is actually okay. (Any incompatibility in the C ABI would be a bug.)
The import libraries of MSVC have the suffix `.lib`. They are generated from a `.def` file using MSVC's `lib.exe`. See [the MSVC instructions](https://github.com/xianyi/OpenBLAS/wiki/How-to-use-OpenBLAS-in-Microsoft-Visual-Studio#generate-import-library-before-0210-version).
## Notes
* Always remember that MinGW is **not the same** as MSYS2 or Cygwin. MSYS2 and Cygwin are full POSIX environments with a lot of magic such as `fork()` and its own `malloc()`. MinGW, which builds on the normal Microsoft C Runtime, has none of that. Be clear about which one you are building for.

View File

@ -0,0 +1,162 @@
As of OpenBLAS v0.2.15, we support MinGW and Visual Studio (using CMake to generate visual studio solution files &ndash; note that you will need at least version 3.11 of CMake for linking to work correctly) to build OpenBLAS on Windows.
Note that you need a Fortran compiler if you plan to build and use the LAPACK functions included with OpenBLAS. The sections below describe using either `flang` as an add-on to clang/LLVM or `gfortran` as part of MinGW for this purpose. If you want to use the Intel Fortran compiler `ifort` for this, be sure to also use the Intel C compiler `icc` for building the C parts, as the ABI imposed by `ifort` is incompatible with `msvc`.
## 1. Native (MSVC) ABI
A fully-optimized OpenBLAS that can be statically or dynamically linked to your application can currently be built for the 64-bit architecture with the LLVM compiler infrastructure. We're going to use Miniconda3 to grab all of the tools we need, since some of them are in an experimental status. Before you begin, you'll need to have Microsoft Visual Studio 2015 or newer installed.
1. Install Miniconda3 for 64 bits using `winget install --id Anaconda.Miniconda3` or easily download from [conda.io](https://docs.conda.io/en/latest/miniconda.html).
2. Open the "Anaconda Command Prompt," now available in the Start Menu, or at `%USERPROFILE%\miniconda3\shell\condabin\conda-hook.ps1`.
3. In that command prompt window, use `cd` to change to the directory where you want to build OpenBLAS
4. Now install all of the tools we need:
```
conda update -n base conda
conda config --add channels conda-forge
conda install -y cmake flang clangdev perl libflang ninja
```
5. Still in the Anaconda Command Prompt window, activate the MSVC environment for 64 bits with `vcvarsall x64`. On Windows 11 with Visual Studio 2022, this would be done by invoking:
```shell
"c:\Program Files\Microsoft Visual Studio\2022\Preview\vc\Auxiliary\Build\vcvars64.bat"
```
With VS2019, the command should be the same &ndash; except for the year number, obviously. For other/older versions of MSVC,
the VS documentation or a quick search on the web should turn up the exact wording you need.
Confirm that the environment is active by typing `link` &ndash; this should return a long list of possible options for the `link` command. If it just
returns "command not found" or similar, review and retype the call to vcvars64.bat.
**NOTE:** if you are working from a Visual Studio Command prompt window instead (so that you do not have to do the vcvars call), you need to invoke
`conda activate` so that CONDA_PREFIX etc. get set up correctly before proceeding to step 6. Failing to do so will lead to link errors like
libflangmain.lib not getting found later in the build.
6. Now configure the project with CMake. Starting in the project directory, execute the following:
```
set "LIB=%CONDA_PREFIX%\Library\lib;%LIB%"
set "CPATH=%CONDA_PREFIX%\Library\include;%CPATH%"
mkdir build
cd build
cmake .. -G "Ninja" -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_C_COMPILER=clang-cl -DCMAKE_Fortran_COMPILER=flang -DCMAKE_MT=mt -DBUILD_WITHOUT_LAPACK=no -DNOFORTRAN=0 -DDYNAMIC_ARCH=ON -DCMAKE_BUILD_TYPE=Release
```
You may want to add further options in the `cmake` command here &ndash; for instance, the default only produces a static .lib version of the library. If you would rather have a DLL, add -DBUILD_SHARED_LIBS=ON above. Note that this step only creates some command files and directories, the actual build happens next.
7. Build the project:
```
cmake --build . --config Release
```
This step will create the OpenBLAS library in the "lib" directory, and various build-time tests in the `test`, `ctest` and `openblas_utest` directories. However it will not separate the header files you might need for building your own programs from those used internally. To put all relevant files in a more convenient arrangement, run the next step.
8. Install all relevant files created by the build
```
cmake --install . --prefix c:\opt -v
```
This will copy all files that are needed for building and running your own programs with OpenBLAS to the given location, creating appropriate subdirectories for the individual kinds of files. In the case of "C:\opt" as given above, this would be C:\opt\include\openblas for the header files,
C:\opt\bin for the libopenblas.dll and C:\opt\lib for the static library. C:\opt\share holds various support files that enable other cmake-based build scripts to find OpenBLAS automatically.
### Visual studio 2017+ (C++2017 standard)
In newer visual studio versions, Microsoft has changed [how it handles complex types](https://docs.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support?view=msvc-170#types-used-in-complex-math). Even when using a precompiled version of OpenBLAS, you might need to define `LAPACK_COMPLEX_CUSTOM` in order to define complex types properly for MSVC. For example, some variant of the following might help:
```
#if defined(_MSC_VER)
#include <complex.h>
#define LAPACK_COMPLEX_CUSTOM
#define lapack_complex_float _Fcomplex
#define lapack_complex_double _Dcomplex
#endif
```
For reference, see https://github.com/xianyi/OpenBLAS/issues/3661, https://github.com/Reference-LAPACK/lapack/issues/683, and https://stackoverflow.com/questions/47520244/using-openblas-lapacke-in-visual-studio.
### CMake and Visual Studio
To build OpenBLAS for the 32-bit architecture, you'll need to use the builtin Visual Studio compilers.
**[Notice]** This method may produce binaries which demonstrate significantly lower performance than those built with the other methods. (The Visual Studio compiler does not support the dialect of assembly used in the cpu-specific optimized files, so only the "generic" TARGET which is
written in pure C will get built. For the same reason it is not possible (and not necessary) to use -DDYNAMIC_ARCH=ON in a Visual Studio build) You may consider building for the 32-bit architecture using the GNU (MinGW) ABI.
#### 1. Install CMake at Windows
#### 2. Use CMake to generate Visual Studio solution files
```
# Do this from Powershell so cmake can find visual studio
cmake -G "Visual Studio 14 Win64" -DCMAKE_BUILD_TYPE=Release .
```
### Build the solution at Visual Studio
Note that this step depends on perl, so you'll need to install perl for windows, and put perl on your path so VS can start perl (http://stackoverflow.com/questions/3051049/active-perl-installation-on-windows-operating-system).
Step 2 will build the OpenBLAS solution, open it in VS, and build the projects. Note that the dependencies do not seem to be automatically configured: if you try to build libopenblas directly, it will fail with a message saying that some .obj files aren't found, but if you build the projects libopenblas depends on before building libopenblas, the build will succeed.
### Build OpenBLAS for Universal Windows Platform
OpenBLAS can be built for use on the [Universal Windows Platform](https://en.wikipedia.org/wiki/Universal_Windows_Platform) using a two step process since commit [c66b842](https://github.com/xianyi/OpenBLAS/commit/c66b842d66c5516e52804bf5a0544d18b1da1b44).
#### 1. Follow steps 1 and 2 above to build the Visual Studio solution files for Windows. This builds the helper executables which are required when building the OpenBLAS Visual Studio solution files for UWP in step 2.
#### 2. Remove the generated CMakeCache.txt and CMakeFiles directory from the OpenBLAS source directory and re-run CMake with the following options:
```
# do this to build UWP compatible solution files
cmake -G "Visual Studio 14 Win64" -DCMAKE_SYSTEM_NAME=WindowsStore -DCMAKE_SYSTEM_VERSION="10.0" -DCMAKE_SYSTEM_PROCESSOR=AMD64 -DVS_WINRT_COMPONENT=TRUE -DCMAKE_BUILD_TYPE=Release .
```
#### Build the solution with Visual Studio
This will build the OpenBLAS binaries with the required settings for use with UWP.
## 2. GNU (MinGW) ABI
The resulting library can be used in Visual Studio, but it can only be linked dynamically. This configuration has not been thoroughly tested and should be considered experimental.
### Incompatible x86 calling conventions
Due to incompatibilities between the calling conventions of MinGW and Visual Studio you will need to make the following modifications ( **32-bit only** ):
1. Use the newer GCC 4.7.0. The older GCC (<4.7.0) has an ABI incompatibility for returning aggregate structures larger than 8 bytes with MSVC.
### Build OpenBLAS on Windows OS
1. Install the MinGW (GCC) compiler suite, either 32-bit (http://www.mingw.org/) or 64-bit (http://mingw-w64.sourceforge.net/). Be sure to install its gfortran package as well (unless you really want to build the BLAS part of OpenBLAS only) and check that gcc and gfortran are the same version &ndash; mixing compilers from different sources or release versions can lead to strange error messages in the linking stage. In addition, please install MSYS with MinGW.
1. Build OpenBLAS in the MSYS shell. Usually, you can just type "make". OpenBLAS will detect the compiler and CPU automatically.
1. After the build is complete, OpenBLAS will generate the static library "libopenblas.a" and the shared dll library "libopenblas.dll" in the folder. You can type "make PREFIX=/your/installation/path install" to install the library to a certain location.
**[Notice]** We suggest using official MinGW or MinGW-w64 compilers. A user reported that s/he met `Unhandled exception` by other compiler suite. https://groups.google.com/forum/#!topic/openblas-users/me2S4LkE55w
Note also that older versions of the alternative builds of mingw-w64 available through http://www.msys2.org may contain a defect that leads to a compilation failure accompanied by the error message
```
<command-line>:0:4: error: expected identifier or '(' before numeric constant
```
If you encounter this, please upgrade your msys2 setup or see https://github.com/xianyi/OpenBLAS/issues/1503 for a workaround.
### Generate import library (before 0.2.10 version)
1. First, you will need to have the `lib.exe` tool in the Visual Studio command prompt.
1. Open the command prompt and type `cd OPENBLAS_TOP_DIR/exports`, where OPENBLAS_TOP_DIR is the main folder of your OpenBLAS installation.
1. For a 32-bit library, type `lib /machine:i386 /def:libopenblas.def`. For 64-bit, type `lib /machine:X64 /def:libopenblas.def`.
1. This will generate the import library "libopenblas.lib" and the export library "libopenblas.exp" in OPENBLAS_TOP_DIR/exports. Although these two files have the same name, they are totally different.
### Generate import library (0.2.10 and after version)
1. OpenBLAS already generated the import library "libopenblas.dll.a" for "libopenblas.dll".
### generate windows native PDB files from gcc/gfortran build
Tool to do so is available at https://github.com/rainers/cv2pdb
### Use OpenBLAS .dll library in Visual Studio
1. Copy the import library (before 0.2.10: "OPENBLAS_TOP_DIR/exports/libopenblas.lib", 0.2.10 and after: "OPENBLAS_TOP_DIR/libopenblas.dll.a") and .dll library "libopenblas.dll" into the same folder(The folder of your project that is going to use the BLAS library. You may need to add the libopenblas.dll.a to the linker input list: properties->Linker->Input).
1. Please follow the documentation about using third-party .dll libraries in MS Visual Studio 2008 or 2010. Make sure to link against a library for the correct architecture. For example, you may receive an error such as "The application was unable to start correctly (0xc000007b)" which typically indicates a mismatch between 32/64-bit libraries.
**[Notice]** If you need CBLAS, you should include cblas.h in /your/installation/path/include in Visual Studio. Please read [this page](http://github.com/xianyi/OpenBLAS/issues/95).
### Limitations
* Both static and dynamic linking are supported with MinGW. With Visual Studio, however, only dynamic linking is supported and so you should use the import library.
* Debugging from Visual Studio does not work because MinGW and Visual Studio have incompatible formats for debug information (PDB vs. DWARF/STABS). You should either debug with GDB on the command-line or with a visual frontend, for instance [Eclipse](http://www.eclipse.org/cdt/) or [Qt Creator](http://qt.nokia.com/products/developer-tools/).

View File

@ -0,0 +1,37 @@
Cortex-M is a widely used microcontroller that is present in a variety of industrial and consumer electronics.
A common variant of the Cortex-M is the STM32F4xx series. Here, we will give instructions for building for
the STM32F4xx.
First, install the embedded arm gcc compiler from the arm website. Then, create the following toolchain file and build as follows.
```cmake
# cmake .. -G Ninja -DCMAKE_C_COMPILER=arm-none-eabi-gcc -DCMAKE_TOOLCHAIN_FILE:PATH="toolchain.cmake" -DNOFORTRAN=1 -DTARGET=ARMV5 -DEMBEDDED=1
set(CMAKE_SYSTEM_NAME Generic)
set(CMAKE_SYSTEM_PROCESSOR arm)
set(CMAKE_C_COMPILER "arm-none-eabi-gcc.exe")
set(CMAKE_CXX_COMPILER "arm-none-eabi-g++.exe")
set(CMAKE_EXE_LINKER_FLAGS "--specs=nosys.specs" CACHE INTERNAL "")
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
```
In your embedded application, the following functions need to be provided for OpenBLAS to work correctly:
```C
void free(void* ptr);
void* malloc(size_t size);
```
Note: if you are developing for an embedded platform, it is your responsibility to make sure that the device
has sufficient memory for malloc calls. [Libmemory][1] provides one implementation of malloc for embedded
platforms.
[1]: https://github.com/embeddedartistry/libmemory

71
Installation-Guide.md Normal file
View File

@ -0,0 +1,71 @@
## Quick Installation
[Precompiled packages](https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages) have recently become available for a number of platforms through their normal installation procedures, so for users of desktop devices at least, the instructions below are mostly relevant when you want to try the most recent development snapshot from git.
### Linux
Just type `make` to compile the library.
Notes
* OpenBLAS doesn't support g77. Please use gfortran or other Fortran compilers. e.g. `make FC=gfortran`.
* When building in an emulator (KVM,QEMU etc.) make sure that the combination of CPU features exposed to
the virtual environment matches that of an existing CPU to allow detection of the cpu model to succeed.
(With qemu, this can be done by passing `-cpu host` or a supported model name at invocation)
### Windows
See [[How-to-use-OpenBLAS-in-Microsoft-Visual-Studio]].
The precompiled binaries available with each release (in https://github.com/xianyi/OpenBLAS/releases) are
created with MinGW (as described in Section 2 of the wiki page mentioned above) using an option list of
"NUM_THREADS=64 TARGET=GENERIC DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 CONSISTENT_FPCSR=1" - they should work on
any x86 or x86_64 computer. The zip archive contains the include files, static and dll libraries as well
as configuration files for getting them found via CMAKE or pkgconfig - just create a suitable folder for
your OpenBLAS installation and unzip it there. (Note that you will need to edit the provided openblas.pc
and OpenBLASConfig.cmake to reflect the installation path on your computer, as distributed they have "win"
or "win64" reflecting the local paths on the system they were built on). Some programs will expect the DLL
name to be lapack.dll, blas.dll, or (in the case of the statistics package "R") even Rblas.dll to act as a
direct replacement for whatever other implementation of BLAS and LAPACK they use by default. Just copy the
openblas.dll to the desired name(s).
Note that the provided binaries are built with INTERFACE64=0, meaning they use standard 32bit integers for
array indexing and the like (as is the default for most if not all BLAS and LAPACK implementations). If the
documentation of whatever program you are using with OpenBLAS mentions 64bit integers (INTERFACE64=1) for
addressing huge matrix sizes, you will need to build OpenBLAS from source (or open an issue ticket to make
the demand for such a precompiled build known).
### Mac OSX
If your CPU is Sandy Bridge, please use Clang version 3.1 and above. The Clang 3.0 will generate the wrong AVX binary code of OpenBLAS.
#### Build on Apple M1
* without Fortran compiler cannot build LAPACK)
```
$ make CC=cc NOFORTRAN=1
```
* with Fortran compiler (you could `brew install gfortran`) https://github.com/xianyi/OpenBLAS/issues/3032
```
$ export MACOSX_DEPLOYMENT_TARGET=11.0
$ make CC=cc FC=gfortran
```
### FreeBSD
You will need to install the following tools from the FreeBSD ports tree:
* lang/gcc [1]
* lang/perl5.12
* ftp/curl
* devel/gmake
* devel/patch
To compile run the command:
$ gmake CC=gcc46 FC=gfortran46
Note that you need to build with GNU make and manually specify the compiler, otherwhise gcc 4.2 from the base system would be used.
[1]: [Removal of Fortran from the FreeBSD base system](http://www.bsdunix.ch/serendipity/index.php?/archives/345-Removal-of-Fortran-from-the-FreeBSD-base-system.html)
### Android
See [this page](https://github.com/xianyi/OpenBLAS/wiki/How-to-build-OpenBLAS-for-Android)
### MIPS
See [this page](https://github.com/xianyi/OpenBLAS/wiki/Faq#mips)

9
Machine-List.md Normal file
View File

@ -0,0 +1,9 @@
We deployed the following machine to support developing OpenBLAS. The interested developer can apply the remote access by sending the mail to [OpenBLAS-dev mailing list](https://groups.google.com/forum/#!forum/openblas-dev).
* 2-way Intel Xeon E5-2680 (2.7GHz, Intel Sandy Bridge), 32GB memory, 1TB hard disk, Ubuntu 14.04 64-bit.
* 1-way Intel Core i7-4770 (3.4GHz, Intel Haswell), 8GB memory, 512GB hard disk, Ubuntu 14.04 64-bit.
* 1-way AMD FX-8320 (3.5GHz, AMD Piledriver), 8GB memory, 512GB hard disk, Ubuntu 14.04 64-bit.
* [Odroid-U2 board](http://www.hardkernel.com/main/products/prdt_info.php?g_code=G135341370451). 1-way Exynos4412 Prime 1.7Ghz ARM Cortex-A9 Quad Cores, 2GB memory, 16GB SD card, Ubuntu 14.04 32-bit.
* AMD A10-7850K APU (Kaveri, Steamroller+GCN GPU), 8GB memory, 1TB hard disk, Ubuntu 14.04.1 64-bit
* NVIDIA TK1 (ARM Cortex-A15), ARM 32-bit, Ubuntu
* NVIDIA TX1 (ARM Cortex-A57), AArch64, Ubuntu

15
Mailing-List.md Normal file
View File

@ -0,0 +1,15 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
OpenBLAS users: https://groups.google.com/forum/#!forum/openblas-users
OpenBLAS developers: https://groups.google.com/forum/#!forum/openblas-dev

28
OpenBLAS-Extensions.md Normal file
View File

@ -0,0 +1,28 @@
* BLAS-like extensions
| Routine | Data Types | Description |
| ------------- |:------------- | :---------------|
| ?axpby | s,d,c,z | like axpy with a multiplier for y |
| ?gemm3m | c,z | gemm3m |
| ?imatcopy | s,d,c,z | in-place transpositon/copying |
| ?omatcopy | s,d,c,z | out-of-place transpositon/copying |
| ?geadd | s,d,c,z | matrix add |
| ?gemmt | s,d,c,z | gemm but only a triangular part updated|
* BLAS-like and Conversion functions for bfloat16 (available when OpenBLAS was compiled with BUILD_BFLOAT16=1)
* `void cblas_sbstobf16` converts a float array to an array of bfloat16 values by rounding
* `void cblas_sbdtobf16` converts a double array to an array of bfloat16 values by rounding
* `void cblas_sbf16tos` converts a bfloat16 array to an array of floats
* `void cblas_dbf16tod` converts a bfloat16 array to an array of doubles
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
* Utility functions
* openblas_get_num_threads
* openblas_set_num_threads
* `int openblas_get_num_procs(void)` returns the number of processors available on the system (may include "hyperthreading cores")
* `int openblas_get_parallel(void)` returns 0 for sequential use, 1 for platform-based threading and 2 for OpenMP-based threading
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the cpu affinity mask of the given thread to the provided cpuset. (Only available under Linux, with semantics identical to pthread_setaffinity_np)

View File

@ -0,0 +1,84 @@
# Precompiled installation packages
_note this list is not comprehensive, is not meant to validate nor endorse a particular third-party build over others and may not always lead to the newest version_
## Multiple operating systems
Binaries designed to be compatible with [Julia](https://github.com/JuliaLang/julia) are regularly provided in
https://github.com/staticfloat/OpenBLASBuilder/releases . Note that their 64bit builds with INTERFACE64=1 have
`_64` appended to the function symbol names, so your code needs to refer to e.g. gemm_64_ rather than gemm_
the [Conda-Forge](https://github.com/conda-forge) project maintains packages for the conda package manager at
https://github.com/conda-forge/openblas-feedstock
## FreeBSD
> pkg install openblas
see https://www.freebsd.org/ports/index.html
## Linux
### Debian/Ubuntu/Mint/Kali
OpenBLAS package is available in default repositories and can act as default BLAS in system
Example installation commands:
```
$ sudo apt update
$ apt search openblas
$ sudo apt install libopenblas-dev
$ sudo update-alternatives --config libblas.so.3
```
Alternatively, if distributor's package proves unsatisfactory, you may try latest version of OpenBLAS, [Following guide in OpenBLAS FAQ](https://github.com/xianyi/OpenBLAS/wiki/faq#debianlts)
### openSuSE/SLE:
Recent OpenSUSE versions include OpenBLAS in default repositories and also permit OpenBLAS to act as replacement of system-wide BLAS.
Example installation commands:
```
$ sudo zypper ref
$ zypper se openblas
$ sudo zypper in openblas-devel
$ sudo update-alternatives --config libblas.so.3
```
Should you be using older OpenSUSE or SLE that provides no OpenBLAS, you can attach optional or experimental openSUSE repository as a new package source to acquire recent build of OpenBLAS following [instructions on openSUSE software site](https://software.opensuse.org/package/openblas)
### Fedora/CentOS/RHEL
Fedora provides OpenBLAS in default installation repositories.
To install it try following:
```
$ dnf search openblas
$ dnf install openblas-devel
```
For CentOS/RHEL/Scientific Linux packages are provided via [Fedora EPEL repository](https://fedoraproject.org/wiki/EPEL)
After adding repository and repository keys installation is pretty straightforward:
```
$ yum search openblas
$ yum install openblas-devel
```
No alternatives mechanism is provided for BLAS, and packages in system repositories are linked against NetLib BLAS or ATLAS BLAS libraries. You may wish to re-package RPMs to use OpenBLAS instead [as described here](https://fedoraproject.org/wiki/How_to_create_an_RPM_package)
### Mageia
Mageia offers ATLAS and NetLIB LAPACK in base repositories.
You can build your own OpenBLAS replacement, and once installed in /opt
TODO: populate /usr/lib64 /usr/include accurately to replicate netlib with update-alternatives
### Arch/Manjaro/Antergos
```
$ sudo pacman -S openblas
```
## Solaris (via pkgsrc)
## OSX
https://www.macports.org/ports.php?by=name&substr=openblas
`brew install openblas`
or using the conda package manager from
https://github.com/conda-forge/miniforge#download
(which also has packages for the new M1 cpu)
`conda install openblas`
## Windows
http://sourceforge.net/projects/openblas/files
https://www.nuget.org/packages?q=openblas

19
Publications.md Normal file
View File

@ -0,0 +1,19 @@
<hr noshade="noshade">
<center>
<a href="Home"> [Home]</a>
<a href="Document"> [Document]</a>
<a href="faq"> [FAQ]</a>
<a href="publications"> [Publications]</a>
<a href="download"> [Download]</a>
<a href="Mailing-List">[Mailing List]</a>
<a href="Donation">[Donation]</a>
</center>
<hr noshade="noshade">
### 2013
* Wang Qian, Zhang Xianyi, Zhang Yunquan, Qing Yi, **AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs**, In the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver CO, November 2013. [[pdf](http://xianyi.github.io/paper/augem_SC13.pdf)]
### 2012
* Zhang Xianyi, Wang Qian, Zhang Yunquan, **Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor**, 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), 17-19 Dec. 2012.

View File

@ -0,0 +1,13 @@
## Softwares
* <a href='http://julialang.org/'>Julia - a high-level, high-performance dynamic programming language for technical computing</a><br />
* Ceemple v1.0.3 (C++ technical computing environment), including OpenBLAS, Qt, Boost, OpenCV and others. The only solution with immediate-recompilation of C++ code. Available from <a href='http://www.ceemple.com'>Ceemple C++ Technical Computing</a>.
* [netlib-java](https://github.com/fommil/netlib-java) and various upstream libraries, allowing OpenBLAS to be used from languages on the Java Virtual Machine.
## Academia Users
## Industry Users
## HPC Centers Deployed OpenBLAS
## Others

187
User-Manual.md Normal file
View File

@ -0,0 +1,187 @@
##### Table of Contents
[Compile the library](#compile-the-library)
[Link the library](#link-the-library)
[Code examples](#code-examples)
[Troubleshooting](#troubleshooting)
[BLAS reference manual](#blas-reference-manual)
## Compile the library
### Normal compile
* type `make` to detect the CPU automatically.
or
* type `make TARGET=xxx` to set target CPU, e.g. `make TARGET=NEHALEM`. The full target list is in file TargetList.txt.
### Cross compile
Please set `CC` and `FC` with the cross toolchains. Then, set `HOSTCC` with your host C compiler. At last, set `TARGET` explicitly.
Examples:
* On x86 box, compile the library for ARM Cortex-A9 linux.
Install only gnueabihf versions. Please check https://github.com/xianyi/OpenBLAS/issues/936#issuecomment-237596847
make CC=arm-linux-gnueabihf-gcc FC=arm-linux-gnueabihf-gfortran HOSTCC=gcc TARGET=CORTEXA9
* On X86 box, compile this library for loongson3a CPU.
```
make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
```
* On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.
```
make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu- NO_LAPACKE=1 NO_SHARED=1 BINARY=32
```
### Debug version
make DEBUG=1
### Install to the directory (optional)
Example:
make install PREFIX=your_installation_directory
The default directory is /opt/OpenBLAS. Note that any flags passed to `make` during build should also be passed to `make install` to circumvent any install errors, i.e. some headers not being copied over correctly.
For more information, please read [Installation Guide](Installation-Guide).
## Link the library
* Link shared library
```
gcc -o test test.c -I/your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -Wl,-rpath,/your_path/OpenBLAS/lib -lopenblas
```
The `-Wl,-rpath,/your_path/OpenBLAS/lib` option to linker can be omitted if you ran `ldconfig` to update linker cache, put `/your_path/OpenBLAS/lib` in `/etc/ld.so.conf` or a file in `/etc/ld.so.conf.d`, or installed OpenBLAS in a location part of `ld.so` default search path. Otherwise, linking at runtime will fail.
If the library is multithreaded, please add `-lpthread`. If the library contains LAPACK functions, please add `-lgfortran` or other Fortran libs, although if you only make calls to LAPACKE routines, i.e. your code has `#include "lapacke.h"` and makes calls to methods like `LAPACKE_dgeqrf`, `-lgfortran` is not needed.
* Link static library
```
gcc -o test test.c /your/path/libopenblas.a
```
You can download `test.c` from https://gist.github.com/xianyi/5780018
## Code examples
### Call CBLAS interface
This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656
```c
#include <cblas.h>
#include <stdio.h>
void main()
{
int i=0;
double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);
for(i=0; i<9; i++)
printf("%lf ", C[i]);
printf("\n");
}
```
```
gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran
```
### Call BLAS Fortran interface
This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018
```c
#include "stdio.h"
#include "stdlib.h"
#include "sys/time.h"
#include "time.h"
extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);
int main(int argc, char* argv[])
{
int i;
printf("test!\n");
if(argc<4){
printf("Input Error\n");
return 1;
}
int m = atoi(argv[1]);
int n = atoi(argv[2]);
int k = atoi(argv[3]);
int sizeofa = m * k;
int sizeofb = k * n;
int sizeofc = m * n;
char ta = 'N';
char tb = 'N';
double alpha = 1.2;
double beta = 0.001;
struct timeval start,finish;
double duration;
double* A = (double*)malloc(sizeof(double) * sizeofa);
double* B = (double*)malloc(sizeof(double) * sizeofb);
double* C = (double*)malloc(sizeof(double) * sizeofc);
srand((unsigned)time(NULL));
for (i=0; i<sizeofa; i++)
A[i] = i%3+1;//(rand()%100)/10.0;
for (i=0; i<sizeofb; i++)
B[i] = i%3+1;//(rand()%100)/10.0;
for (i=0; i<sizeofc; i++)
C[i] = i%3+1;//(rand()%100)/10.0;
//#if 0
printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc);
gettimeofday(&start, NULL);
dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
gettimeofday(&finish, NULL);
duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;
double gflops = 2.0 * m *n*k;
gflops = gflops/duration*1.0e-6;
FILE *fp;
fp = fopen("timeDGEMM.txt", "a");
fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops);
fclose(fp);
free(A);
free(B);
free(C);
return 0;
}
```
```
gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a
./time_dgemm <m> <n> <k>
```
## Troubleshooting
* Please read [Faq](https://github.com/xianyi/OpenBLAS/wiki/Faq) at first.
* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
## BLAS reference manual
If you want to understand every BLAS function and definition, please read [Intel MKL reference manual](https://software.intel.com/en-us/intel-mkl/documentation) or [netlib.org](http://netlib.org/blas/)
Here are [OpenBLAS extension functions](OpenBLAS-Extensions)

View File

@ -0,0 +1,103 @@
**This page is made by someone who is not the developer and should not be considered as an official documentation of the build system. For getting the full picture, it is best to read the Makefiles and understand them yourself.**
### Makefile dep graph
```
Makefile
|
|----- Makefile.system # !!! this is included by many of the Makefiles in the subdirectories !!!
| |
| |===== Makefile.prebuild # This is triggered (not included) once by Makefile.system
| | | # and runs before any of the actual library code is built.
| | | # (builds and runs the "getarch" tool for cpu identification,
| | | # runs the compiler detection scripts c_check and f_check)
| | |
| | ----- (Makefile.conf) [ either this or Makefile_kernel.conf is generated ]
| | | { Makefile.system#L243 }
| | ----- (Makefile_kernel.conf) [ temporary Makefile.conf during DYNAMIC_ARCH builds ]
| |
| |----- Makefile.rule # defaults for build options that can be given on the make command line
| |
| |----- Makefile.$(ARCH) # architecture-specific compiler options and OpenBLAS buffer size values
|
|~~~~~ exports/
|
|~~~~~ test/
|
|~~~~~ utest/
|
|~~~~~ ctest/
|
|~~~~~ cpp_thread_test/
|
|~~~~~ kernel/
|
|~~~~~ ${SUBDIRS}
|
|~~~~~ ${BLASDIRS}
|
|~~~~~ ${NETLIB_LAPACK_DIR}{,/timing,/testing/{EIG,LIN}}
|
|~~~~~ relapack/
```
### Important Variables
Most of the tunable variables are found in [Makefile.rule](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.rule), along with their detailed descriptions.<br/>
Most of the variables are detected automatically in [Makefile.prebuild](https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.prebuild), if they are not set in the environment.
#### CPU related
```
ARCH - Target architecture (eg. x86_64)
TARGET - Target CPU architecture, in case of DYNAMIC_ARCH=1 means library will not be usable on less capable CPUs
TARGET_CORE - TARGET_CORE will override TARGET internally during each cpu-specific cycle of the build for DYNAMIC_ARCH
DYNAMIC_ARCH - For building library for multiple TARGETs (does not lose any optimizations, but increases library size)
DYNAMIC_LIST - optional user-provided subset of the DYNAMIC_CORE list in Makefile.system
```
#### Toolchain related
```
CC - TARGET C compiler used for compilation (can be cross-toolchains)
FC - TARGET Fortran compiler used for compilation (can be cross-toolchains, set NOFORTRAN=1 if used cross-toolchain has no fortran compiler)
AR, AS, LD, RANLIB - TARGET toolchain helpers used for compilation (can be cross-toolchains)
HOSTCC - compiler of build machine, needed to create proper config files for target architecture
HOST_CFLAGS - flags for build machine compiler
```
#### Library related
```
BINARY - 32/64 bit library
BUILD_SHARED - Create shared library
BUILD_STATIC - Create static library
QUAD_PRECISION - enable support for IEEE quad precision [ largely unimplemented leftover from GotoBLAS, do not use ]
EXPRECISION - Obsolete option to use float80 of SSE on BSD-like systems
INTERFACE64 - Build with 64bit integer representations to support large array index values [ incompatible with standard API ]
BUILD_SINGLE - build the single-precision real functions of BLAS [and optionally LAPACK]
BUILD_DOUBLE - build the double-precision real functions
BUILD_COMPLEX - build the single-precision complex functions
BUILD_COMPLEX16 - build the double-precision complex functions
(all four types are included in the build by default when none was specifically selected)
BUILD_BFLOAT16 - build the "half precision brainfloat" real functions
USE_THREAD - Use a multithreading backend (default to pthread)
USE_LOCKING - implement locking for thread safety even when USE_THREAD is not set (so that the singlethreaded library can
safely be called from multithreaded programs)
USE_OPENMP - Use OpenMP as multithreading backend
NUM_THREADS - define this to the maximum number of parallel threads you expect to need (defaults to the number of cores in the build cpu)
NUM_PARALLEL - define this to the number of OpenMP instances that your code may use for parallel calls into OpenBLAS (default 1,see below)
```
OpenBLAS uses a fixed set of memory buffers internally, used for communicating and compiling partial results from individual threads.
For efficiency, the management array structure for these buffers is sized at build time - this makes it necessary to know in advance how
many threads need to be supported on the target system(s).
With OpenMP, there is an additional level of complexity as there may be calls originating from a parallel region in the calling program. If OpenBLAS gets called from a single parallel region, it runs single-threaded automatically to avoid overloading the system by fanning out its own set of threads.
In the case that an OpenMP program makes multiple calls from independent regions or instances in parallel, this default serialization is not
sufficient as the additional caller(s) would compete for the original set of buffers already in use by the first call.
So if multiple OpenMP runtimes call into OpenBLAS at the same time, then only one of them will be able to make progress while all the rest of them spin-wait for the one available buffer. Setting NUM_PARALLEL to the upper bound on the number of OpenMP runtimes that you can have in a process ensures that there are a sufficient number of buffer sets available