wernsaar
8eaa04acbb
added zgemm_kernel_2x2_bulldozer.S
2013-06-11 12:00:49 +02:00
wernsaar
d854b30ae6
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
2013-06-09 17:26:42 +02:00
wernsaar
d65bbec99b
added new sgemm kernel for BULLDOZER
2013-06-09 15:57:42 +02:00
wernsaar
e4c39c7c26
changed stack touching
2013-06-08 10:43:08 +02:00
wernsaar
ba800f0883
correct GEMM_THREAD in param.h
2013-06-08 10:03:59 +02:00
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
2013-06-08 09:40:17 +02:00
wernsaar
731220f870
changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit
2013-04-30 10:07:17 +02:00
wernsaar
69aa6c8fb1
bad performance with some data
2013-04-28 11:14:23 +02:00
wernsaar
60b263f3d2
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
2013-04-27 17:23:08 +02:00
wernsaar
7ac306e0da
added trsm_kernel_RT_4x4_bulldozer.S
2013-04-27 16:48:48 +02:00
wernsaar
4cb454cdf2
added trsm_kernel_LT_4x4_bulldozer.S
2013-04-27 14:30:00 +02:00
wernsaar
19ad2fb128
prefetch improved. Defined 2 different kernels for inner loop
2013-04-27 13:40:49 +02:00
wernsaar
6821677489
minor improvements and code cleanup
2013-04-26 20:05:42 +02:00
wernsaar
7641f6e253
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
...
Changed the copy functions to generic to solve prefetch conflicts
2013-04-16 19:05:06 +02:00
wernsaar
6e3f6f25a5
New version of dgemm_kernel_4x4_bulldozer.S
...
The peak performance with 8 cores is now 90 GFlops
2013-04-12 17:55:51 +02:00
wernsaar
f300ce3df5
new optimization of dgemm kernel for bulldozer: 10% performance increase
2013-03-06 17:26:03 +01:00
wernsaar
66e64131ed
optimized again bulldozer dgemm kernel
2013-03-05 19:51:37 +01:00
wernsaar
9405f26f4b
new dgemm_kernel for bulldozer
2013-03-04 17:37:38 +01:00
Zhang Xianyi
54e7b37630
Merge branch 'develop'
2013-03-02 14:42:06 +08:00
Zhang Xianyi
529f1b5006
Refs#194. Export the missing LAPACK s/dlamc3 functions.
2013-03-02 14:41:18 +08:00
Zhang Xianyi
e5ac3007e0
Merge branch 'develop'
2013-03-02 14:24:23 +08:00
Zhang Xianyi
0d0405b434
Updated the doc for 0.2.6 version.
2013-03-02 14:22:27 +08:00
Zhang Xianyi
f1ce74ffdd
Improved the print when OS don't support AVX.
2013-03-02 14:15:54 +08:00
Zhang Xianyi
d744c9590a
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
2013-03-01 14:36:47 +08:00
Zhang Xianyi
3cc6ae793e
Refs #174 . Return sb pointer when OpenMP or Windows.
2013-02-26 00:48:21 +08:00
Zhang Xianyi
4c2123c334
Fixed the overflowing bug in single thread cholesky factorization.
2013-02-23 13:00:52 +08:00
Zhang Xianyi
5155e3f509
Refs #174 . Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
...
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
2013-02-10 01:19:42 +08:00
Zaheer Chothia
a9500d0079
Missing line continuation -- follow-up to last commit ( 64ad8b9809).
2013-02-01 09:34:12 +01:00
Zaheer Chothia
64ad8b9809
Refs #193 . Don't use C99 complex numbers when building C++ code.
2013-02-01 09:24:44 +01:00
Zaheer Chothia
875d520ccf
Refs #193 . cblas: move #include out of extern "C" block.
...
Standard headers may contain C++ templates which are not permitted inside an
extern "C" block. This might be the case when we include <complex.h>.
2013-01-31 08:48:27 +01:00
Zhang Xianyi
d311236dfd
Refs #189 . Fixed the bug of s/cdot about invalid reading NAN on x86_64.
2013-01-25 20:56:14 +08:00
Zhang Xianyi
36e0982966
Refs #187 . Use perl to generate cblas_noconst.h instead of sed.
...
Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187
2013-01-22 00:29:54 +08:00
Zhang Xianyi
8cdb795438
Refs #187 . Use binary code for xgetbv, which is compatible with old compiler.
2013-01-22 00:25:08 +08:00
Zaheer Chothia
4db6660de4
Refs #185 . Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey!
...
The 'const' modifications were done automatically using this scripts:
https://kaldi.svn.sourceforge.net/svnroot/kaldi/sandbox/dan/tools/for_openblas
2013-01-20 22:52:51 +01:00
Zhang Xianyi
0b08f7479e
Refs #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.
2013-01-20 21:22:12 +08:00
Zaheer Chothia
200e4acf15
cblas: typedef enums for improved compatibility with Intel MKL.
...
Netlib style:
enum CBLAS_XYZ {X=1, Y=2, Z=3};
Intel MKL style:
typedef enum {X=1, Y=2, Z=3} CBLAS_XYZ;
With this hybrid style, code written in the latter form won't need any
modifications to be built with OpenBLAS. This change should not affect existing
code, although a warning may be emitted for C code which does the following
(does not occur with C++):
typedef enum CBLAS_XYZ CBLAS_XYZ;
warning: redefinition of typedef 'CBLAS_XYZ' [-pedantic]
2013-01-19 22:57:13 +01:00
Zhang Xianyi
99d1978df7
Fixed #180 . the typos in kernel/x86_64/sgemv_t.S
2013-01-12 12:31:14 +08:00
Zhang Xianyi
08bf6674d5
Refs #177 . Fixed sgemv_t compiling bug on Win64.
2013-01-05 11:36:39 +08:00
Zhang Xianyi
8b122ff9dc
Refs #176 . Fixed make.inc overriding RANLIB bug when cross-compiling LAPACK.
2013-01-03 01:47:31 +08:00
Zhang Xianyi
69200884e1
Refs #173 . Fixed overflow internal buffer bug of gemv_n on x86
2012-12-25 09:27:49 +08:00
Zhang Xianyi
0d1518add9
Refs #173 . Fixed overflow internal buffer bug of sgemv_t on x86
2012-12-25 09:10:17 +08:00
Zhang Xianyi
91ed4e4450
Refs #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
2012-12-23 23:14:17 +08:00
Zhang Xianyi
fd3046b32a
Refs #173 . Fixed overflow internal buffer bug of gemv_t on x86.
2012-12-23 21:47:22 +08:00
Zhang Xianyi
a4ee6f3915
Fixed #172 . Support Intel Xeon E7540.
2012-12-18 08:57:46 +08:00
Zhang Xianyi
a0363e9b48
Merge branch 'master' into develop
2012-12-18 08:51:30 +08:00
Zhang Xianyi
b471d52e61
Merge pull request #170 from juliantaylor/athlon-defaults
...
set parameters for CORE_ATHLON
2012-12-15 15:50:02 -08:00
Julian Taylor
9fb341a9f8
set parameters for CORE_ATHLON
...
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
2012-12-15 16:05:33 +01:00
Zhang Xianyi
fba6b590f2
Merge branch 'master' into develop
2012-12-15 22:49:37 +08:00
Zhang Xianyi
97f68f7f3a
Merge pull request #169 from juliantaylor/sanity-check-cpu
...
add a sanity check on the detected cpu type
2012-12-15 06:46:48 -08:00