Commit Graph

6983 Commits

Author SHA1 Message Date
Zhang Xianyi d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 2013-03-01 14:36:47 +08:00
Zhang Xianyi 3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 2013-02-26 00:48:21 +08:00
Zhang Xianyi 4c2123c334 Fixed the overflowing bug in single thread cholesky factorization. 2013-02-23 13:00:52 +08:00
Zhang Xianyi 5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi 6ae2f868fd Set the affinity. Only use 1 core of each module on bulldozer. 2013-02-09 18:19:02 +01:00
Zhang Xianyi a1ead62f28 Disable the warning of sgemm bulldozer kernel. 2013-02-09 17:03:13 +01:00
Zhang Xianyi 0133580148 Used sgemm bulldozer kernel on 64 bit. 2013-02-09 16:29:14 +01:00
Zhang Xianyi 274246651d Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer 2013-02-09 16:25:07 +01:00
Zhang Xianyi 299b5a44dc Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer 2013-02-09 16:22:04 +01:00
Zaheer Chothia a9500d0079 Missing line continuation -- follow-up to last commit (64ad8b9809). 2013-02-01 09:34:12 +01:00
Zaheer Chothia 64ad8b9809 Refs #193. Don't use C99 complex numbers when building C++ code. 2013-02-01 09:24:44 +01:00
Zaheer Chothia 875d520ccf Refs #193. cblas: move #include out of extern "C" block.
Standard headers may contain C++ templates which are not permitted inside an
extern "C" block. This might be the case when we include <complex.h>.
2013-01-31 08:48:27 +01:00
Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 2013-01-25 20:56:14 +08:00
Zhang Xianyi 36e0982966 Refs #187. Use perl to generate cblas_noconst.h instead of sed.
Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187
2013-01-22 00:29:54 +08:00
Zhang Xianyi 8cdb795438 Refs #187. Use binary code for xgetbv, which is compatible with old compiler. 2013-01-22 00:25:08 +08:00
Zaheer Chothia 4db6660de4 Refs #185. Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey!
The 'const' modifications were done automatically using this scripts:
https://kaldi.svn.sourceforge.net/svnroot/kaldi/sandbox/dan/tools/for_openblas
2013-01-20 22:52:51 +01:00
Zhang Xianyi 0b08f7479e Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86. 2013-01-20 21:22:12 +08:00
Zaheer Chothia 200e4acf15 cblas: typedef enums for improved compatibility with Intel MKL.
Netlib style:
    enum CBLAS_XYZ {X=1, Y=2, Z=3};

Intel MKL style:
    typedef enum {X=1, Y=2, Z=3} CBLAS_XYZ;

With this hybrid style, code written in the latter form won't need any
modifications to be built with OpenBLAS.  This change should not affect existing
code, although a warning may be emitted for C code which does the following
(does not occur with C++):
    typedef enum CBLAS_XYZ CBLAS_XYZ;
    warning: redefinition of typedef 'CBLAS_XYZ' [-pedantic]
2013-01-19 22:57:13 +01:00
Zhang Xianyi 99d1978df7 Fixed #180. the typos in kernel/x86_64/sgemv_t.S 2013-01-12 12:31:14 +08:00
Zhang Xianyi 08bf6674d5 Refs #177. Fixed sgemv_t compiling bug on Win64. 2013-01-05 11:36:39 +08:00
Zhang Xianyi 8b122ff9dc Refs #176. Fixed make.inc overriding RANLIB bug when cross-compiling LAPACK. 2013-01-03 01:47:31 +08:00
Zhang Xianyi 69200884e1 Refs #173. Fixed overflow internal buffer bug of gemv_n on x86 2012-12-25 09:27:49 +08:00
Zhang Xianyi 0d1518add9 Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86 2012-12-25 09:10:17 +08:00
Zhang Xianyi 91ed4e4450 Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel. 2012-12-23 23:14:17 +08:00
Zhang Xianyi fd3046b32a Refs #173. Fixed overflow internal buffer bug of gemv_t on x86. 2012-12-23 21:47:22 +08:00
Zhang Xianyi a4ee6f3915 Fixed #172. Support Intel Xeon E7540. 2012-12-18 08:57:46 +08:00
Zhang Xianyi a0363e9b48 Merge branch 'master' into develop 2012-12-18 08:51:30 +08:00
Zhang Xianyi b471d52e61 Merge pull request #170 from juliantaylor/athlon-defaults
set parameters for CORE_ATHLON
2012-12-15 15:50:02 -08:00
Julian Taylor 9fb341a9f8 set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
2012-12-15 16:05:33 +01:00
Zhang Xianyi fba6b590f2 Merge branch 'master' into develop 2012-12-15 22:49:37 +08:00
Zhang Xianyi 97f68f7f3a Merge pull request #169 from juliantaylor/sanity-check-cpu
add a sanity check on the detected cpu type
2012-12-15 06:46:48 -08:00
Julian Taylor 1138817dd2 add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
2012-12-15 13:29:46 +01:00
Zhang Xianyi 13f8fc0b1a Write FMA4 flag to the configure file. 2012-12-11 10:55:10 +01:00
Zhang Xianyi bdf8d9411e Refs #163. Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.

Example:
 #include <stdio.h>
extern char * openblas_get_config();
void main()
{
  printf("%s\n",openblas_get_config());
  return;
}
2012-12-10 15:52:51 +08:00
Zhang Xianyi bb10cb8442 Refs #165. fall back of DTB_DEFAULT_ENTRIES for some virtual machines. 2012-12-10 11:51:39 +08:00
wernsaar d48cff8cf1 Added optimized sgemm_kernel 2012-12-08 18:50:53 +01:00
Zhang Xianyi f19af5ecc0 Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
2012-12-07 01:05:11 +08:00
Zhang Xianyi bfaaa975e6 Added BULLDOZER target. So far it uses barcelona kernels. 2012-12-07 00:53:31 +08:00
Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 2012-12-06 07:29:54 -05:00
Zhang Xianyi 7110d17146 Added -lgomp for generating DLL on Windows. 2012-11-28 12:52:28 +08:00
Zhang Xianyi e01b3d4b54 Merge branch 'develop' 2012-11-27 07:24:53 +08:00
Zhang Xianyi cea1a885b5 Refs #154. Fixed the build bug of dgemv_t on MinW64. 2012-11-27 07:24:04 +08:00
Zhang Xianyi f78eb335d6 Merge branch 'develop' 2012-11-26 17:32:56 +08:00
Zhang Xianyi 2345bdec68 Update the doc for 0.2.5 version. 2012-11-26 17:32:25 +08:00
Zhang Xianyi 5f0117385e Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.

Thank @wangqian for this patch.
2012-11-19 22:32:27 +08:00
Zhang Xianyi 6caf1bab73 Fixed #160. Merge branch 'master' of https://github.com/sebastien-villemot/OpenBLAS into develop 2012-11-15 18:21:04 +08:00
Sébastien Villemot 01e3c984ce Fix compilation with TARGET=GENERIC
Patch applied to Debian package
2012-11-14 21:04:05 +01:00
Zhang Xianyi 6751f7b9a7 Fixed #157. Only detect the number of physical CPU cores on Mac OSX. 2012-11-13 15:48:57 +08:00
Zhang Xianyi d5717a97ea Compile lapacke with ILP64 modle when INTERFACE64=1 2012-11-13 00:54:20 +08:00