Go to file
Martin Kroeker 8617d75548
Revert "Avoid taking root of negative number in symv_thread.c"
2019-10-01 23:50:41 +02:00
benchmark sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 2019-06-04 07:11:30 +00:00
cmake Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248) 2019-09-10 08:27:06 +02:00
cpp_thread_test upload thread safety test folder 2019-06-01 21:30:06 +02:00
ctest Unset special make variables in ctest Makefile as well 2019-07-24 15:26:09 +02:00
driver Revert "Avoid taking root of negative number in symv_thread.c" 2019-10-01 23:50:41 +02:00
exports Change install_name on osx to match linux 2019-07-08 17:14:35 -05:00
interface more bugfix 2019-09-10 17:30:57 -04:00
kernel Merge pull request #2271 from quickwritereader/strmm_fix 2019-09-29 13:53:45 +02:00
lapack Merge pull request #2252 from thrasibule/trtrs 2019-09-12 21:45:47 +02:00
lapack-netlib turn on optimized code 2019-09-08 11:14:49 -04:00
reference Revert reference/ fixes 2019-05-04 15:01:29 -04:00
relapack Merge pull request #2099 from martin-frbg/rela-gbtrf 2019-04-29 09:25:19 +02:00
test Misc. typo fixes 2019-04-29 17:03:56 -04:00
utest Make the new DGEMM regression test properly depend on CBLAS and LAPACKE 2019-08-13 22:29:48 +02:00
.drone.yml Fix typo 2019-05-12 15:26:18 -05:00
.gitignore Don't change timestamps 2017-08-01 13:43:59 +05:30
.travis.yml Restore ppc64 CI job and remove the travis_wait that caused the problem with it 2019-09-20 10:29:35 +02:00
BACKERS.md Added backers. 2013-09-05 15:39:45 +08:00
CMakeLists.txt Increment version to 0.3.8.dev 2019-08-11 23:28:13 +02:00
CONTRIBUTORS.md sgemm/strmm 2019-04-29 08:49:50 +00:00
Changelog.txt Update with changes from 0.3.7 2019-08-11 23:31:36 +02:00
GotoBLAS_00License.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_01Readme.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_02QuickInstall.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_03FAQ.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_04FAQ.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_05LargePage.txt Correct typo /proc/ instead of /pros/ 2015-03-20 23:25:11 +01:00
GotoBLAS_06WeirdPerformance.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
LICENSE Update organization info. 2014-11-25 15:28:58 +08:00
Makefile Change install_name on osx to match linux 2019-07-08 17:14:35 -05:00
Makefile.alpha Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.arm Makefile.arm: remove -march flags 2019-05-05 18:37:28 +02:00
Makefile.arm64 add TARGET support for HiSilicon tsv110 CPUs 2019-03-04 16:41:21 +08:00
Makefile.generic Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.ia64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.install Change install_name on osx to match linux 2019-07-08 17:14:35 -05:00
Makefile.mips MIPS P5600(32 bit) and I6400(64 bit) cores support added. 2016-04-22 14:03:18 +05:30
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.power Add gfortran workaround for ABI violations 2019-06-06 10:24:16 +02:00
Makefile.prebuild Add mips32r2 api target 2018-05-02 20:17:26 +02:00
Makefile.rule Increment version to 0.3.8.dev 2019-08-11 23:28:47 +02:00
Makefile.sparc Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.system Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263) 2019-09-22 22:35:22 +02:00
Makefile.tail Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86_64 Update Makefile.x86_64 2019-06-14 08:08:11 +02:00
Makefile.zarch [ZARCH] Z14 support, BLAS 1/2 single precision implementations, Some missing double precision implementations, Gemv optimization 2018-08-06 18:20:40 +03:00
README.md DOC: Add Azure CI status badge 2019-05-08 15:15:50 -07:00
TargetList.txt Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
USAGE.md Underline importance of NUM_THREADS setting for BUFFER allocation 2018-04-04 22:26:51 +02:00
appveyor.yml Fix C compiler handling and BINARY=32 mode in CMAKE builds (#2248) 2019-09-10 08:27:06 +02:00
azure-pipelines.yml TST: add SkylakeX AVX512 CI test 2019-05-14 11:32:23 -07:00
c_check c_check: Unlink correct file 2019-06-05 17:31:01 +02:00
cblas.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common.h Add option USE_LOCKING for single-threaded build with locking support 2019-05-15 23:18:43 +02:00
common_alpha.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_arm.h arm: Determine the abi from compiler if not specified on command line 2017-06-30 18:20:59 +05:30
common_arm64.h build: LLVM: Add Flang compiler support and enable OpenMP for Clang 2017-05-25 17:03:20 +01:00
common_c.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_d.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_ia64.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_interface.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_lapack.h add missing defines and headers 2019-09-08 11:14:49 -04:00
common_level1.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_level2.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_level3.h Add a "sgemm direct" mode for small matrixes 2018-12-13 13:47:31 +00:00
common_linux.h Init IBM z system (s390x) porting. 2016-04-15 18:02:24 -04:00
common_macro.h add missing defines and headers 2019-09-08 11:14:49 -04:00
common_mips.h mips: remove incorrect blas_lock implementations 2017-05-05 17:28:03 +01:00
common_mips64.h Update common_mips64.h 2018-10-09 11:20:16 +08:00
common_param.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_power.h Fix build for PPC970 on FreeBSD pt. 1 2019-06-28 10:29:44 +00:00
common_q.h Add declarations for ?sum 2019-03-31 22:12:23 +02:00
common_reference.h Update organization info. 2014-11-25 15:28:58 +08:00
common_s.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_sparc.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_stackalloc.h Misc. typo fixes 2019-04-29 17:03:56 -04:00
common_thread.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_x.h Add declarations for ?sum 2019-03-31 22:12:23 +02:00
common_x86.h Misc. typo fixes 2019-04-29 17:03:56 -04:00
common_x86_64.h Fix mov syntax 2019-06-16 18:35:43 +02:00
common_z.h Add declarations for ?sum and cblas_?sum 2019-03-30 21:58:03 +01:00
common_zarch.h dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
cpuid.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid.h Add support for Hygon Dhyana 2019-01-16 14:25:19 +08:00
cpuid_alpha.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_arm.c Update cpuid_arm.c 2018-12-28 14:33:18 +01:00
cpuid_arm64.c Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267) 2019-09-25 23:13:24 +02:00
cpuid_ia64.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_mips.c Update cpuid_mips.c 2018-12-28 14:34:38 +01:00
cpuid_mips64.c Update cpuid_mips64.c 2018-12-28 14:36:39 +01:00
cpuid_power.c power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself 2019-03-29 15:49:40 +00:00
cpuid_sparc.c Fix my copypaste blunder with get_corename 2018-02-01 22:06:04 +01:00
cpuid_x86.c Add Intel Goldmont Plus CPUID 2019-08-19 14:19:21 +02:00
cpuid_zarch.c Add cache sizes for Z14 2019-01-31 21:16:44 +01:00
ctest.c ctest.c : add __POWERPC__ for PowerMac 2019-03-06 20:55:06 -08:00
ctest1.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ctest2.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
dynamic.c Add Intel Goldmont Plus CPUID 2019-08-19 14:19:21 +02:00
f_check Misc. typo fixes 2019-04-29 17:03:56 -04:00
ftest.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ftest2.f Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ftest3.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gen_config_h.c Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
getarch.c Merge branch 'develop' into develop 2019-03-29 19:36:29 +01:00
getarch_2nd.c Delete LOCAL_BUFFER_SIZE for other architectures. 2016-04-12 11:49:28 -04:00
l1param.h Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
l2param.h Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
make.inc (Plain make) build system fixes for AIX 2017-09-18 01:29:21 +02:00
openblas.pc.in Rename blas.pc.in to openblas.pc.in 2017-02-12 14:34:03 +01:00
openblas_config_template.h Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
param.h Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267) 2019-09-25 23:13:24 +02:00
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.win32 Added the tip for Windows. 2012-08-09 20:37:55 +08:00
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain. 2014-04-27 13:05:26 +08:00
segfaults.patch Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symcopy.h Changed a number of inline calls to use __inline. 2015-02-11 11:13:17 -06:00
version.h Update organization info. 2014-11-25 15:28:58 +08:00

README.md

OpenBLAS

Join the chat at https://gitter.im/xianyi/OpenBLAS

Travis CI: Build Status

AppVeyor: Build status

Build Status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documentation on the OpenBLAS wiki pages: https://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide official binary packages for the following platform:

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage, https://xianyi.github.com/OpenBLAS/, or check out the code using Git from https://github.com/xianyi/OpenBLAS.git.

Dependencies

Building OpenBLAS requires the following to be installed:

  • GNU Make
  • A C compiler, e.g. GCC or Clang
  • A Fortran compiler (optional, for LAPACK)
  • IBM MASS (optional, see below)

Normal compile

Simply invoking make (or gmake on BSD) will detect the CPU automatically. To set a specific target CPU, use make TARGET=xxx, e.g. make TARGET=NEHALEM. The full target list is in the file TargetList.txt.

Cross compile

Set CC and FC to point to the cross toolchains, and set HOSTCC to your host C compiler. The target must be specified explicitly when cross compiling.

Examples:

  • On an x86 box, compile this library for a loongson3a CPU:

    make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A
    
  • On an x86 box, compile this library for a loongson3a CPU with loongcc (based on Open64) compiler:

    make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32
    

Debug version

A debug version can be built using make DEBUG=1.

Compile with MASS support on Power CPU (optional)

The IBM MASS library consists of a set of mathematical functions for C, C++, and Fortran applications that are tuned for optimum performance on POWER architectures. OpenBLAS with MASS requires a 64-bit, little-endian OS on POWER. The library can be installed as shown:

  • On Ubuntu:

    wget -q http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/public.gpg -O- | sudo apt-key add -
    echo "deb http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/ trusty main" | sudo tee /etc/apt/sources.list.d/ibm-xl-compiler-eval.list
    sudo apt-get update
    sudo apt-get install libxlmass-devel.8.1.5
    
  • On RHEL/CentOS:

    wget http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/rhel7/repodata/repomd.xml.key
    sudo rpm --import repomd.xml.key
    wget http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/rhel7/ibm-xl-compiler-eval.repo
    sudo cp ibm-xl-compiler-eval.repo /etc/yum.repos.d/
    sudo yum install libxlmass-devel.8.1.5
    

After installing the MASS library, compile OpenBLAS with USE_MASS=1. For example, to compile on Power8 with MASS support: make USE_MASS=1 TARGET=POWER8.

Install to a specific directory (optional)

Use PREFIX= when invoking make, for example

make install PREFIX=your_installation_directory

The default installation directory is /opt/OpenBLAS.

Supported CPUs and Operating Systems

Please read GotoBLAS_01Readme.txt.

Additional supported CPUs

x86/x86-64

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
  • Intel Skylake: Optimized Level-3 and Level-2 BLAS with AVX512 and FMA on x86-64.
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 ?GEMM FMA4 kernels. (Thanks to Werner Saar)
  • AMD PILEDRIVER: Uses Bulldozer codes with some optimizations.
  • AMD STEAMROLLER: Uses Bulldozer codes with some optimizations.
  • AMD ZEN: Uses Haswell codes with some optimizations.

MIPS64

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

ARM

  • ARMv6: Optimized BLAS for vfpv2 and vfpv3-d16 (e.g. BCM2835, Cortex M0+)
  • ARMv7: Optimized BLAS for vfpv3-d32 (e.g. Cortex A8, A9 and A15)

ARM64

  • ARMv8: Experimental
  • ARM Cortex-A57: Experimental

PPC/PPC64

  • POWER8: Optimized BLAS, only for PPC64LE (Little Endian), only with USE_OPENMP=1
  • POWER9: Optimized Level-3 BLAS (real) and some Level-1,2. PPC64LE with OpenMP only.

IBM zEnterprise System

  • Z13: Optimized Level-3 BLAS and Level-1,2 (double precision)
  • Z14: Optimized Level-3 BLAS and Level-1,2 (single precision)

Supported OS

Usage

Statically link with libopenblas.a or dynamically link with -lopenblas if OpenBLAS was compiled as a shared library.

Setting the number of threads using environment variables

Environment variables are used to specify a maximum number of threads. For example,

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this library with USE_OPENMP=1, you should set the OMP_NUM_THREADS environment variable; OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.

Setting the number of threads at runtime

We provide the following functions to control the number of threads at runtime:

void goto_set_num_threads(int num_threads);
void openblas_set_num_threads(int num_threads);

If you compile this library with USE_OPENMP=1, you should use the above functions too.

Reporting bugs

Please submit an issue in https://github.com/xianyi/OpenBLAS/issues.

Contact

Change log

Please see Changelog.txt to view the differences between OpenBLAS and GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read the FAQ first.
  • Please use GCC version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MinGW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. Clang 3.0 will generate the wrong AVX binary code.
  • Please use GCC version 6 or LLVM version 6 and above to compile Skylake AVX512 kernels.
  • The number of CPUs/cores should less than or equal to 256. On Linux x86_64 (amd64), there is experimental support for up to 1024 CPUs/cores and 128 numa nodes if you build the library with BIGNUMA=1.
  • OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting out the line NO_AFFINITY=1 in Makefile.rule. However, note that this may cause a conflict with R parallel.
  • On Loongson 3A, make test may fail with a pthread_create error (EAGAIN). However, it will be okay when you run the same test case on the shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.