Go to file
wernsaar b06550519e added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
benchmark added benchmarks for lapack potrf, potrs and potri functions 2014-08-01 21:08:37 +02:00
ctest Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
driver added experimental support for big numa machines 2014-08-02 13:40:16 +02:00
exports Fixed #407. Support outputing the CPU corename on runtime. 2014-07-08 12:48:08 +08:00
interface adjust number of threads for small size in cgemv and zgemv 2014-07-15 16:27:02 +02:00
kernel added optimized cgemv_t c-kernel 2014-08-12 12:15:41 +02:00
lapack Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
lapack-netlib added additional test value to dstest.in 2014-07-13 18:29:19 +02:00
reference Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
test Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
utest Refs #406. Fixed utest building bug. 2014-07-08 17:26:49 +08:00
.gitignore .gitignore: add some more entries concerned with kernel 2014-06-27 13:58:42 -07:00
.travis.yml Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
BACKERS.md Added backers. 2013-09-05 15:39:45 +08:00
CONTRIBUTORS.md Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Changelog.txt OpenBLAS 0.2.10 version. 2014-07-16 18:04:18 +08:00
GotoBLAS_00License.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_01Readme.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_02QuickInstall.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_03FAQ.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_04FAQ.txt rename documents in GotoBLAS. 2011-01-24 15:57:23 +00:00
GotoBLAS_05LargePage.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
GotoBLAS_06WeirdPerformance.txt Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
LICENSE Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile changed string GFORTRAN to lowercase 2014-07-16 17:08:43 +02:00
Makefile.alpha Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.arm added ARMV5 as reference platform 2014-05-13 17:25:19 +02:00
Makefile.arm64 added experimental support for ARMV8 2013-11-24 15:47:00 +01:00
Makefile.generic Respect user's LDFLAGS 2013-07-25 14:08:37 -07:00
Makefile.ia64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.install Don't create an absolute symlink when installing on Darwin 2014-07-16 15:31:27 -04:00
Makefile.mips64 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
Makefile.power Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.prebuild get rid of the generated cblas_noconst.h file 2013-08-28 16:52:24 +02:00
Makefile.rule added experimental support for big numa machines 2014-08-02 13:40:16 +02:00
Makefile.sparc Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.system added experimental support for big numa machines 2014-08-02 13:40:16 +02:00
Makefile.tail Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
Makefile.x86_64 Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
README.md Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
TargetList.txt TargetList.txt: minor re-ordering 2013-03-17 23:03:05 +08:00
c_check Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cblas.h Fixed #407. Support outputing the CPU corename on runtime. 2014-07-08 12:48:08 +08:00
cblas_noconst.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common.h bugfix for linux affinity code 2014-08-01 23:10:08 +02:00
common_alpha.h Refs #262. Fixed compatibility issues of GNU stack markings with PathScale EKOPath(tm) Compiler Suite: Version 4.0.12.1 2013-09-22 09:37:59 +08:00
common_arm.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_arm64.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_c.h Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
common_d.h Ref #51: added blas extension domatcopy as not opimized reference 2014-06-09 17:11:07 +02:00
common_ia64.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_interface.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_lapack.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_level1.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_level2.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_level3.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_linux.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_macro.h Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
common_mips64.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_param.h allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk 2014-07-24 18:43:31 +02:00
common_power.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_q.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_reference.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_s.h Ref #51: added blas extension somatcopy 2014-06-09 20:21:13 +02:00
common_sparc.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_thread.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_x.h Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
common_x86.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_x86_64.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
common_z.h Ref #51: added blas extensions zomatcopy and comatcopy 2014-06-10 10:34:54 +02:00
cpuid.S Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid.h Init code base for Intel Haswell. 2013-08-13 00:54:59 +08:00
cpuid_alpha.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_arm.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_ia64.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_mips.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_power.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
cpuid_sparc.c refs #55. Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime. 2011-09-05 17:37:07 +08:00
cpuid_x86.c Refs #401. Added NO_AVX2 flag for old binutils (e.g. RHEL6) 2014-07-16 08:38:25 +08:00
ctest.c Refs #355. Fixed ARM detection bug. 2014-03-22 15:08:18 +08:00
ctest1.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ctest2.c Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
f_check Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ftest.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
ftest2.f Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
ftest3.f Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
getarch.c Fixed #407. Support outputing the CPU corename on runtime. 2014-07-08 12:48:08 +08:00
getarch_2nd.c Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
l1param.h Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
l2param.h Support AMD Piledriver by bulldozer kernels. 2013-07-06 12:06:43 -03:00
lapack-devel.log Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
make.inc modification, to run blas-test on Windows 2014-06-29 10:15:29 +02:00
openblas_config_template.h Fixed #315. Added OPENBLAS_ prefix to openblas_config.h. 2013-11-02 15:59:00 +08:00
param.h optimization of sandybridge cgemm-kernel 2014-07-29 19:07:21 +02:00
quickbuild.32bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.64bit Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00
quickbuild.win32 Added the tip for Windows. 2012-08-09 20:37:55 +08:00
quickbuild.win64 Refs #63. delete prefix for mingw64 toolchain. 2014-04-27 13:05:26 +08:00
segfaults.patch Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
symcopy.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00
version.h Remove all trailing whitespace except lapack-netlib 2014-06-27 12:05:18 -07:00

README.md

OpenBLAS

Build Status

Introduction

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Please read the documents on OpenBLAS wiki pages http://github.com/xianyi/OpenBLAS/wiki.

Binary Packages

We provide binary packages for the following platform.

  • Windows x86/x86_64

You can download them from file hosting on sourceforge.net.

Installation from Source

Download from project homepage. http://xianyi.github.com/OpenBLAS/

Or, check out codes from git://github.com/xianyi/OpenBLAS.git

Normal compile

  • type "make" to detect the CPU automatically. or
  • type "make TARGET=xxx" to set target CPU, e.g. "make TARGET=NEHALEM". The full target list is in file TargetList.txt.

Cross compile

Please set CC and FC with the cross toolchains. Then, set HOSTCC with your host C compiler. At last, set TARGET explicitly.

Examples:

On X86 box, compile this library for loongson3a CPU.

make BINARY=64 CC=mips64el-unknown-linux-gnu-gcc FC=mips64el-unknown-linux-gnu-gfortran HOSTCC=gcc TARGET=LOONGSON3A

On X86 box, compile this library for loongson3a CPU with loongcc (based on Open64) compiler.

make CC=loongcc FC=loongf95 HOSTCC=gcc TARGET=LOONGSON3A CROSS=1 CROSS_SUFFIX=mips64el-st-linux-gnu-   NO_LAPACKE=1 NO_SHARED=1 BINARY=32

Debug version

make DEBUG=1

Install to the directory (optional)

Example:

make install PREFIX=your_installation_directory

The default directory is /opt/OpenBLAS

Support CPU & OS

Please read GotoBLAS_01Readme.txt

Additional support CPU:

x86/x86-64:

  • Intel Xeon 56xx (Westmere): Used GotoBLAS2 Nehalem codes.
  • Intel Sandy Bridge: Optimized Level-3 BLAS with AVX on x86-64.
  • Intel Haswell: Optimized Level-3 BLAS with AVX on x86-64 (identical to Sandy Bridge).
  • AMD Bobcat: Used GotoBLAS2 Barcelona codes.
  • AMD Bulldozer: x86-64 S/DGEMM AVX kernels. (Thank Werner Saar)
  • AMD PILEDRIVER: Used Bulldozer codes.

MIPS64:

  • ICT Loongson 3A: Optimized Level-3 BLAS and the part of Level-1,2.
  • ICT Loongson 3B: Experimental

Support OS:

Usages

Link with libopenblas.a or -lopenblas for shared library.

Set the number of threads with environment variables.

Examples:

export OPENBLAS_NUM_THREADS=4

or

export GOTO_NUM_THREADS=4

or

export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this lib with USE_OPENMP=1, you should set OMP_NUM_THREADS environment variable. OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS with USE_OPENMP=1.

Set the number of threads on runtime.

We provided the below functions to control the number of threads on runtime.

void goto_set_num_threads(int num_threads);

void openblas_set_num_threads(int num_threads);

If you compile this lib with USE_OPENMP=1, you should use the above functions, too.

Report Bugs

Please add a issue in https://github.com/xianyi/OpenBLAS/issues

Contact

ChangeLog

Please see Changelog.txt to obtain the differences between GotoBLAS2 1.13 BSD version.

Troubleshooting

  • Please read Faq at first.
  • Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
  • Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
  • The number of CPUs/Cores should less than or equal to 256.
  • On Linux, OpenBLAS sets the processor affinity by default. This may cause the conflict with R parallel. You can build the library with NO_AFFINITY=1.
  • On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.

Contributing

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the OpenBLAS repository to start making your changes.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request. Make sure to add yourself to CONTRIBUTORS.md.

Donation

Please read this wiki page.