Go to file
Zhang Xianyi 66198faab6 Refs #63. delete prefix for mingw64 toolchain. 2014-04-27 13:05:26 +08:00
benchmark Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ctest Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
driver Add cast to function pointer to remove warning 2014-02-25 11:08:32 +01:00
exports Refs #324. Upgrade LAPACK to 3.5.0 version. 2013-12-09 16:50:02 +08:00
interface bugfix for sdsdot 2014-02-28 14:59:36 +01:00
kernel reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
lapack Merge remote branch 'origin/develop' into armv7 2013-12-01 13:16:41 +01:00
lapack-netlib Fixed #334 a makefile bug in lapacke. 2014-01-19 23:28:11 +08:00
reference Fixed a build bug with NO_LAPACK=1 and SANNITY_CHECK=1. 2011-05-03 14:42:11 +08:00
test Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
utest Make sure that fork_test.c is not built under windows 2014-02-19 19:14:13 +01:00
.gitignore Refs #247. Included lapack source codes. Avoid downloading tar.gz from netlib.org 2013-07-09 18:13:48 +08:00
.travis.yml Updated travis. 2013-07-12 21:41:12 +08:00
BACKERS.md Added backers. 2013-09-05 15:39:45 +08:00
CONTRIBUTORS.md Release 0.2.9 rc1 version. 2013-12-13 20:48:05 +08:00
Changelog.txt #351. Release 0.2.9 rc2. 2014-03-06 17:44:03 +08:00
GotoBLAS_00License.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_01Readme.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_02QuickInstall.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_03FAQ.txt Refs #85 #104. Use patch instead of git to apply this segfaults.patch. 2012-05-08 23:50:46 +08:00
GotoBLAS_04FAQ.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_05LargePage.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_06WeirdPerformance.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
LICENSE Fixed a typo in license file. 2012-03-27 14:17:13 +08:00
Makefile refs #287. Don't enable OpenMP for netlib LAPACK sequential Fortran codes. 2013-11-02 15:09:33 +08:00
Makefile.alpha Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.arm added cpu detection and target ARMV6, used in raspberry pi 2013-11-21 20:18:51 +01:00
Makefile.arm64 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
Makefile.generic Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.ia64 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.install More robust OPENBLAS_ prefixing of macros in openblas_config.h 2014-02-24 13:21:06 +01:00
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.power Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.prebuild get rid of the generated cblas_noconst.h file 2013-08-28 16:52:24 +02:00
Makefile.rule #351. Release 0.2.9 rc2. 2014-03-06 17:44:03 +08:00
Makefile.sparc Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.system Refs #329 #287. Only disable -fopenmp for LAPACK Fortran codes on Windows. 2014-01-24 15:39:46 +08:00
Makefile.tail Avoid argument list too long issue in make clean. 2013-11-07 13:06:42 +08:00
Makefile.x86 Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.x86_64 Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
README.md Added backers. 2013-09-05 15:39:45 +08:00
TargetList.txt TargetList.txt: minor re-ordering 2013-03-17 23:03:05 +08:00
c_check modified c_check 2013-12-01 17:35:18 +01:00
cblas.h get rid of the generated cblas_noconst.h file 2013-08-28 16:52:24 +02:00
cblas_noconst.h added missing file cblas_noconst.h to the armv7 branch 2013-11-03 11:04:16 +01:00
common.h Used SwitchToThread for YIELDING on AMD piledriver with Windows. 2014-01-28 16:40:19 +08:00
common_alpha.h Refs #262. Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1 2013-09-22 09:37:59 +08:00
common_arm.h redefined functions for TIMING and YIELDING for ARMV7 processor 2013-11-03 10:34:04 +01:00
common_arm64.h added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
common_c.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_d.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_ia64.h Refs #262. Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1 2013-09-22 09:37:59 +08:00
common_interface.h Fixed #141. make f77blas.h compatible with compilers which lack C99 complex number. 2012-10-08 12:48:20 +08:00
common_lapack.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_level1.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_level2.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_level3.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_linux.h Refs #248. Fixed the LSB compatiable issue for BLAS only. 2013-07-09 15:38:03 +08:00
common_macro.h Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. 2011-09-05 15:13:52 +00:00
common_mips64.h Refs #262. Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1 2013-09-22 09:37:59 +08:00
common_param.h refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. 2011-09-05 17:37:07 +08:00
common_power.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_q.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_reference.h Added the test case for samax. 2012-04-26 16:17:17 +08:00
common_s.h bugfix for sdsdot 2014-02-28 14:59:36 +01:00
common_sparc.h Refs #262. Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1 2013-09-22 09:37:59 +08:00
common_thread.h Refs #281. Detect __CYGWIN__ macro for Cygwin x86_64. 2013-08-24 14:50:17 +08:00
common_x.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_x86.h merged common_x86.h and common_x86_64.h from develop 2013-12-01 11:23:36 +01:00
common_x86_64.h merged common_x86.h and common_x86_64.h from develop 2013-12-01 11:23:36 +01:00
common_z.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
cpuid.S Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
cpuid.h Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
cpuid_alpha.c refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. 2011-09-05 17:37:07 +08:00
cpuid_arm.c added cpu detection and target ARMV6, used in raspberry pi 2013-11-21 20:18:51 +01:00
cpuid_ia64.c refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. 2011-09-05 17:37:07 +08:00
cpuid_mips.c Fixed the detection bug on Loongson 3A server. 2012-09-21 10:14:07 +00:00
cpuid_power.c Refs #220. Support Power7 by old Power6 kernels. 2013-05-21 22:59:45 +08:00
cpuid_sparc.c refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. 2011-09-05 17:37:07 +08:00
cpuid_x86.c Refs #335. Added the fallback of L2 size detection for some virtual machines. 2014-01-08 11:16:21 +08:00
ctest.c Refs #355. Fixed ARM detection bug. 2014-03-22 15:08:18 +08:00
ctest1.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ctest2.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
f_check Refs #266. Fixed the compiling bug with Open64 5.0. 2013-07-31 14:41:39 +08:00
ftest.f Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ftest2.f Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ftest3.f Refs #266. Fixed the compiling bug with Open64 5.0. 2013-07-31 14:41:39 +08:00
getarch.c Merge remote branch 'origin/haswell' into develop 2013-12-01 18:09:12 +01:00
getarch_2nd.c Fixed typo in getarch_2nd.c. 2013-07-29 15:42:00 +08:00
l1param.h Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
l2param.h Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
make.inc Refs #176. Fixed make.inc overriding RANLIB bug when cross-compiling LAPACK. 2013-01-03 01:47:31 +08:00
openblas_config_template.h Fixed #315. Added OPENBLAS_ prefix to openblas_config.h. 2013-11-02 15:59:00 +08:00
param.h reduced stack usage on windows to 16K 2014-04-24 14:09:26 +02:00
patch.for_lapack-3.1.1 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
patch.for_lapack-3.4.0 Refs #88. Fixed the build bug about LAPACKE C Interface to LAPACKE. 2012-04-13 23:12:06 +08:00
patch.for_lapack-3.4.1 Refs #130 Prevent reading ipiv array beyond the bound in ?laswp. Use laswp instead of laswp_oncopy in getrf. 2012-08-09 20:06:51 +08:00
patch.for_lapack-3.4.2 Added the patch for lapacke example. 2012-11-13 00:53:26 +08:00
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.win32 Added the tip for Windows. 2012-08-09 20:37:55 +08:00
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain. 2014-04-27 13:05:26 +08:00
segfaults.patch Refs #85 #104. Use patch instead of git to apply this segfaults.patch. 2012-05-08 23:50:46 +08:00
symcopy.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
version.h changed library name to openblas and modified environment variable. 2011-01-24 17:58:05 +00:00

README.md

OpenBLAS

Build Status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documents on OpenBLAS wiki pages http://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide binary packages for the following platform.

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage. http://xianyi.github.com/OpenBLAS/

Or, check out codes from git://github.com/xianyi/OpenBLAS.git

Normal compile

  • type "make" to detect the CPU automatically. or
  • type "make TARGET=xxx" to set target CPU, e.g. "make TARGET=NEHALEM". The full target list is in file TargetList.txt.

Cross compile

Please set CC and FC with the cross toolchains. Then, set HOSTCC with your host C compiler. At last, set TARGET explicitly.

Examples:

On X86 box, compile this library for loongson3a CPU.

make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A

On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.

make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32

Debug version

make DEBUG=1

Install to the directory (optional)

Example:

make install PREFIX=your_installation_directory

The default directory is /opt/OpenBLAS

Support CPU & OS

Please read GotoBLAS_01Readme.txt

Additional support CPU:

x86/x86-64:

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 BLAS with AVX on x86-64 (identical to Sandy Bridge).
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 S/DGEMM AVX kernels. (Thank Werner Saar)
  • AMD PILEDRIVER: Used Bulldozer codes.

MIPS64:

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

Support OS:

Usages

Link with libopenblas.a or -lopenblas for shared library.

Set the number of threads with environment variables.

Examples:

export OPENBLAS_NUM_THREADS=4

or

export GOTO_NUM_THREADS=4

or

export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this lib with USE_OPENMP=1, you should set OMP_NUM_THREADS environment variable. OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS with USE_OPENMP=1.

Set the number of threads on runtime.

We provided the below functions to control the number of threads on runtime.

void goto_set_num_threads(int num_threads);

void openblas_set_num_threads(int num_threads);

If you compile this lib with USE_OPENMP=1, you should use the above functions, too.

Report Bugs

Please add a issue in https://github.com/xianyi/OpenBLAS/issues

Contact

ChangeLog

Please see Changelog.txt to obtain the differences between GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read Faq at first.
  • Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
  • The number of CPUs/Cores should less than or equal to 256.
  • On Linux, OpenBLAS sets the processor affinity by default. This may cause the conflict with R parallel. You can build the library with NO_AFFINITY=1.
  • On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.