Go to file
Martin Kroeker e3c50643bb
Fix my copypaste blunder with get_corename
2018-02-01 22:06:04 +01:00
benchmark Increasing flexibility of GEMM benchmark. 2017-09-28 12:56:29 -07:00
cmake CMake: Remove unused wall option when FC=flang 2018-01-26 14:09:48 -06:00
ctest Fix compiler warnings in ctest 2017-12-03 18:19:30 +01:00
driver Merge pull request #1419 from brada4/develop 2018-01-31 23:48:34 +01:00
exports update cmakefiles for lapack 3.8.0 2017-11-23 21:22:01 +01:00
interface fix couple of dead assignment warnings 2017-12-22 00:56:35 +01:00
kernel Merge branch 'develop' into develop 2018-01-31 16:17:04 -08:00
lapack add missing brackets to silence indentation warnings gcc721 2018-01-19 23:11:12 +01:00
lapack-netlib Add conditionals around ar calls for optional modules 2017-12-21 20:42:30 +01:00
reference Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
relapack Add cmake build list file for ReLAPACK 2017-10-12 17:00:00 +02:00
test Remove _static usages for tests 2017-08-20 00:13:46 +10:00
utest Add trivial smoketest for xpotrf 2017-09-30 19:07:54 +02:00
.gitignore Don't change timestamps 2017-08-01 13:43:59 +05:30
.travis.yml Prefix make jobs with travis_wait (#1378) 2017-12-02 12:59:27 +01:00
BACKERS.md Added backers. 2013-09-05 15:39:45 +08:00
CMakeLists.txt CMake: Use the correct library name on windows 2018-01-11 11:34:53 -06:00
CONTRIBUTORS.md Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision) 2017-09-06 16:41:08 +04:00
Changelog.txt Update doc for 0.2.20 version. 2017-07-24 11:55:10 +08:00
GotoBLAS_00License.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_01Readme.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_02QuickInstall.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_03FAQ.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_04FAQ.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_05LargePage.txt Correct typo /proc/ instead of /pros/ 2015-03-20 23:25:11 +01:00
GotoBLAS_06WeirdPerformance.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
LICENSE Update organization info. 2014-11-25 15:28:58 +08:00
Makefile Update LAPACK to 3.8.0 2017-11-23 18:13:35 +01:00
Makefile.alpha Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.arm arm: Determine the abi from compiler if not specified on command line 2017-06-30 18:20:59 +05:30
Makefile.arm64 arm64: Change mtune/mcpu options for THUNDERX2T99 target 2017-07-01 11:17:10 -07:00
Makefile.generic Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.ia64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.install Make: escape paths to pkg-config file 2018-01-05 17:08:55 -05:00
Makefile.mips MIPS P5600(32 bit) and I6400(64 bit) cores support added. 2016-04-22 14:03:18 +05:30
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.power build: fix libxlmass errors building on Power CPU 2017-05-24 14:51:52 +08:00
Makefile.prebuild Added mips I6500 core 2017-09-22 11:57:43 +05:30
Makefile.rule Bump develop version for 0.3.0. 2017-07-24 12:06:29 +08:00
Makefile.sparc Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.system When forcing USE_THREAD=0, override USE_OPENMP as well 2018-01-23 21:33:21 +01:00
Makefile.tail Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86_64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.zarch dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
README.md README: Use the SVG Travis badge 2017-11-29 15:21:12 -08:00
TargetList.txt Added mips I6500 core 2017-09-22 11:57:43 +05:30
USAGE.md collected usage notes 2016-02-27 16:57:22 +01:00
appveyor.yml Appveyor: enable building fortran with ninja 2017-12-29 19:58:35 -06:00
c_check arm: Remove unnecessary files/code 2017-07-02 03:06:36 +05:30
cblas.h Fix declaration of cblas_Xdotc_sub and cblas_Xdotu_sub 2017-11-18 18:56:30 +01:00
common.h typedefs only for c 2017-07-29 20:38:16 +05:30
common_alpha.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_arm.h arm: Determine the abi from compiler if not specified on command line 2017-06-30 18:20:59 +05:30
common_arm64.h build: LLVM: Add Flang compiler support and enable OpenMP for Clang 2017-05-25 17:03:20 +01:00
common_c.h Improved Ximatcopy when lda==ldb. 2015-09-07 14:36:16 +02:00
common_d.h Improved Ximatcopy when lda==ldb. 2015-09-07 14:36:16 +02:00
common_ia64.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_interface.h Add ATLAS-style ?geadd function 2015-02-16 13:46:20 +01:00
common_lapack.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_level1.h Changed _Complex types in common_level1.h to use the typedef. 2015-02-11 11:12:14 -06:00
common_level2.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_level3.h Improved Ximatcopy when lda==ldb. 2015-09-07 14:36:16 +02:00
common_linux.h Init IBM z system (s390x) porting. 2016-04-15 18:02:24 -04:00
common_macro.h ARM64: Add the VULCAN Target 2017-01-10 15:01:17 +05:30
common_mips.h mips: remove incorrect blas_lock implementations 2017-05-05 17:28:03 +01:00
common_mips64.h mips: remove incorrect blas_lock implementations 2017-05-05 17:28:03 +01:00
common_param.h Correct zgeadd_k prototype 2017-11-29 19:57:35 +01:00
common_power.h optimized dgemm for 20 threads 2016-05-16 14:14:25 +02:00
common_q.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_reference.h Update organization info. 2014-11-25 15:28:58 +08:00
common_s.h Improved Ximatcopy when lda==ldb. 2015-09-07 14:36:16 +02:00
common_sparc.h add fallback blas_lock implementation 2015-08-16 18:59:17 +02:00
common_stackalloc.h Refs #727. Align stack buffer address on 32-bytes. 2016-02-11 03:52:02 +08:00
common_thread.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_x.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_x86.h Fix a minor compiler error in VisualStudio with CMake 2016-03-20 18:58:18 +08:00
common_x86_64.h build: Flang has the same interface as PGI 2017-05-27 06:26:48 +01:00
common_z.h Improved Ximatcopy when lda==ldb. 2015-09-07 14:36:16 +02:00
common_zarch.h dtrmm and dgemm for z13 2017-01-04 19:32:33 +04:00
cpuid.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid.h Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
cpuid_alpha.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_arm.c Fix for issue #1024: arm-linux-androideabi-g++ Compiler Error in /cpuid_arm.c 2016-12-02 09:28:31 -08:00
cpuid_arm64.c fix detection of generic ARMv8 CPUs 2017-08-18 14:53:29 +02:00
cpuid_ia64.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_mips.c MIPS P5600(32 bit) and I6400(64 bit) cores support added. 2016-04-22 14:03:18 +05:30
cpuid_mips64.c Added mips I6500 core 2017-09-22 11:57:43 +05:30
cpuid_power.c added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8 2016-03-01 07:33:56 +01:00
cpuid_sparc.c Fix my copypaste blunder with get_corename 2018-02-01 22:06:04 +01:00
cpuid_x86.c Fix coretype detection for Bay Trail Atom 2017-08-27 13:06:54 +02:00
cpuid_zarch.c detect CPU on zArch 2017-04-20 21:13:41 +02:00
ctest.c Merge branch 'z13' into develop 2017-01-09 05:52:42 -05:00
ctest1.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ctest2.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
f_check (Plain make) build system fixes for AIX 2017-09-18 01:29:21 +02:00
ftest.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ftest2.f Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ftest3.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
gen_config_h.c Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
getarch.c Use get_corename for SPARC as well 2018-02-01 18:20:38 +01:00
getarch_2nd.c Delete LOCAL_BUFFER_SIZE for other architectures. 2016-04-12 11:49:28 -04:00
l1param.h Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
l2param.h Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
make.inc (Plain make) build system fixes for AIX 2017-09-18 01:29:21 +02:00
openblas.pc.in Rename blas.pc.in to openblas.pc.in 2017-02-12 14:34:03 +01:00
openblas_config_template.h Fix complex support for MSVC headers 2017-07-28 11:50:29 +05:30
param.h Added mips I6500 core 2017-09-22 11:57:43 +05:30
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.win32 Added the tip for Windows. 2012-08-09 20:37:55 +08:00
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain. 2014-04-27 13:05:26 +08:00
segfaults.patch Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symcopy.h Changed a number of inline calls to use __inline. 2015-02-11 11:13:17 -06:00
version.h Update organization info. 2014-11-25 15:28:58 +08:00

README.md

OpenBLAS

Join the chat at https://gitter.im/xianyi/OpenBLAS

Travis CI: Build Status

AppVeyor: Build status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documents on OpenBLAS wiki pages http://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide binary packages for the following platform.

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage. http://xianyi.github.com/OpenBLAS/

Or, check out codes from git://github.com/xianyi/OpenBLAS.git

Normal compile

  • type "make" to detect the CPU automatically. or
  • type "make TARGET=xxx" to set target CPU, e.g. "make TARGET=NEHALEM". The full target list is in file TargetList.txt.

Cross compile

Please set CC and FC with the cross toolchains. Then, set HOSTCC with your host C compiler. At last, set TARGET explicitly.

Examples:

On X86 box, compile this library for loongson3a CPU.

make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A

On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.

make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32

Debug version

make DEBUG=1

Compile with MASS Support on Power CPU (Optional dependency)

IBM MASS library consists of a set of mathematical functions for C, C++, and Fortran-language applications that are tuned for optimum performance on POWER architectures. OpenBLAS with MASS requires 64-bit, little-endian OS on POWER. The library can be installed as below -

After installing MASS library, compile openblas with USE_MASS=1.

Example:

Compiling on Power8 with MASS support -

make USE_MASS=1 TARGET=POWER8

Install to the directory (optional)

Example:

make install PREFIX=your_installation_directory

The default directory is /opt/OpenBLAS

Support CPU & OS

Please read GotoBLAS_01Readme.txt

Additional support CPU:

x86/x86-64:

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 and Level-2 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 and Level-2 BLAS with AVX2 and FMA on x86-64.
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 ?GEMM FMA4 kernels. (Thank Werner Saar)
  • AMD PILEDRIVER: Uses Bulldozer codes with some optimizations.
  • AMD STEAMROLLER: Uses Bulldozer codes with some optimizations.

MIPS64:

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

ARM:

  • ARMV6: Optimized BLAS for vfpv2 and vfpv3-d16 ( e.g. BCM2835, Cortex M0+ )
  • ARMV7: Optimized BLAS for vfpv3-d32 ( e.g. Cortex A8, A9 and A15 )

ARM64:

  • ARMV8: Experimental
  • ARM Cortex-A57: Experimental

PPC/PPC64

  • POWER8: Optmized Level-3 BLAS and some Level-1, only with USE_OPENMP=1

IBM zEnterprise System:

  • Z13: Optimized Level-3 BLAS and Level-1,2 (double precision)

Support OS:

Usages

Link with libopenblas.a or -lopenblas for shared library.

Set the number of threads with environment variables.

Examples:

export OPENBLAS_NUM_THREADS=4

or

export GOTO_NUM_THREADS=4

or

export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this lib with USE_OPENMP=1, you should set OMP_NUM_THREADS environment variable. OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS with USE_OPENMP=1.

Set the number of threads on runtime.

We provided the below functions to control the number of threads on runtime.

void goto_set_num_threads(int num_threads);

void openblas_set_num_threads(int num_threads);

If you compile this lib with USE_OPENMP=1, you should use the above functions, too.

Report Bugs

Please add a issue in https://github.com/xianyi/OpenBLAS/issues

Contact

ChangeLog

Please see Changelog.txt to obtain the differences between GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read Faq at first.
  • Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
  • The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
  • OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause the conflict with R parallel.
  • On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.