Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
...
I used a smaller threshold since the stack size is 1MB on windows.
2013-07-11 03:20:02 +08:00
Zhang Xianyi
6df39ad9e7
Refs #248 . Support LAPACK and LAPACKE with lsbcc.
...
For LAPACKE, use LAPACK_COMPLEX_STRUCTURE.
The reson is lsbcc didn't define complex I in complex.h.
2013-07-10 16:02:27 +08:00
Zhang Xianyi
3a96e4cbcb
Merge pull request #249 from wernsaar/develop
...
replaced defined(DOUBLE) by !defined(XDOUBLE)
2013-07-10 01:01:03 -07:00
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
2013-07-09 18:17:50 +02:00
Zhang Xianyi
3eb5af1955
Refs #247 . Included lapack source codes. Avoid downloading tar.gz from netlib.org
...
Based on 3.4.2 version, apply patch.for_lapack-3.4.2.
2013-07-09 18:13:48 +08:00
Zhang Xianyi
fbb75e58b1
Fixed the typo in getarch.c
2013-07-09 16:26:59 +08:00
Zhang Xianyi
f54f5bac9e
Refs #248 . Fixed the LSB compatiable issue for BLAS only.
...
For example, make CC=lsbcc NO_LAPACK=1.
2013-07-09 15:38:03 +08:00
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
...
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
2013-07-08 01:07:05 +08:00
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
2013-07-06 12:06:43 -03:00
Zhang Xianyi
0c4074e10b
Added Travis CI status image.
2013-07-05 15:28:41 +08:00
Zhang Xianyi
cc522aa21d
Use quiet make for Travis CI.
2013-07-05 14:52:57 +08:00
Zhang Xianyi
9c78fad721
Install gfortran in Travis CI.
2013-07-05 11:11:18 +08:00
Zhang Xianyi
6028232ad1
Added travis.yml file.
2013-07-04 23:30:53 +08:00
Zhang Xianyi
feb9a3889a
Improved make clean on Mac OS X.
2013-07-02 14:37:30 +08:00
Zhang Xianyi
32dbeb636d
Refs #221 . Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.
2013-07-02 14:17:55 +08:00
Zhang Xianyi
57944538b6
Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.
2013-07-01 16:09:05 +08:00
Zhang Xianyi
3ce2c62b0b
Merge pull request #242 from danluu/readme.haswell
...
Update README to reflect Haswell support, etc.
2013-06-30 09:40:32 -07:00
Dan Luu
50464997a3
Fix miscellaneous typos
2013-06-30 11:36:13 -05:00
Zhang Xianyi
8e7cad1650
Fixed #217 openblas_config.h bug on Windows 64.
2013-07-01 00:35:14 +08:00
Dan Luu
590e6aeafc
Add Haswell support
2013-06-30 11:35:00 -05:00
Dan Luu
88ef307cef
Refs #241 . Add Haswell support (using sandybridge optimizations)
2013-06-30 22:35:14 +08:00
Zhang Xianyi
6e8501c8a1
Fixed #239 bug in param.h about BARCELONA and BULLDOZER.
2013-06-29 10:36:01 +08:00
Zhang Xianyi
fa916a0fac
Fixed #238 bug in lsame on x86.
2013-06-28 22:43:41 +08:00
Zhang Xianyi
fb298b34ae
Merge pull request #235 from wernsaar/develop
...
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
2013-06-21 17:59:26 -07:00
wernsaar
16012767f4
added dcopy_bulldozer.S
2013-06-21 16:06:51 +02:00
wernsaar
bcbac31b47
added ddot_bulldozer.S
2013-06-20 16:15:09 +02:00
wernsaar
8dc0c72583
added daxpy_bulldozer.S
2013-06-20 14:07:54 +02:00
wernsaar
89405a1a0b
cleanup of dgemm_ncopy_8_bulldozer.S
2013-06-19 19:31:38 +02:00
wernsaar
4f2b12b8a8
added dgemv_t_bulldozer.S
2013-06-19 17:32:42 +02:00
Zhang Xianyi
646e168d26
Merge pull request #233 from wernsaar/develop
...
added dgemv_n and some faster gemm_copy routines to BULLDOZER.
2013-06-18 20:02:36 -07:00
wernsaar
93dbbe1fb8
added dgemm_ncopy_8_bulldozer.S
2013-06-18 13:29:23 +02:00
wernsaar
a135f5d9ed
added gemm_tcopy_2_bulldozer.S
2013-06-18 11:01:33 +02:00
wernsaar
d0b6299b13
added dgemm_tcopy_8_bulldozer.S
2013-06-17 14:19:09 +02:00
wernsaar
9e58dd509e
added gemm_ncopy_2_bulldozer.S
2013-06-17 12:55:12 +02:00
wernsaar
7c8227101b
cleanup of dgemv_n_bulldozer.S and optimization of inner loop
2013-06-16 12:50:45 +02:00
wernsaar
f67fa62851
added dgemv_n_bulldozer.S
2013-06-15 16:42:37 +02:00
Zhang Xianyi
cd1d473ba0
Merge pull request #230 from wernsaar/develop
...
Refs #230 . New dgemm and sgemm Kernel for BULLDOZER
2013-06-13 07:29:27 -07:00
Zhang Xianyi
56f160134d
Refs #231 . Change the default C compiler to clang on Mac OSX.
2013-06-13 22:15:19 +08:00
wernsaar
0ded1fcc1c
performance optimizations in sgemm_kernel_16x2_bulldozer.S
2013-06-13 11:35:15 +02:00
wernsaar
a789b588cd
added cgemm_kernel_4x2_bulldozer.S
2013-06-12 15:55:27 +02:00
wernsaar
8eaa04acbb
added zgemm_kernel_2x2_bulldozer.S
2013-06-11 12:00:49 +02:00
wernsaar
d854b30ae6
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
2013-06-09 17:26:42 +02:00
wernsaar
d65bbec99b
added new sgemm kernel for BULLDOZER
2013-06-09 15:57:42 +02:00
wernsaar
e4c39c7c26
changed stack touching
2013-06-08 10:43:08 +02:00
wernsaar
ba800f0883
correct GEMM_THREAD in param.h
2013-06-08 10:03:59 +02:00
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
2013-06-08 09:40:17 +02:00
Zhang Xianyi
960b0c88a7
Refs #227 . Detected LLVM/Clang compiler.
2013-06-06 23:43:40 +08:00
Zhang Xianyi
65ffead0cf
Refs #124 . Check XSAVE flag on x86 CPU.
2013-06-06 22:50:43 +08:00
Zhang Xianyi
f2fb8c7035
Change LIBSUFFIX from .lib to .a on windows.
2013-06-04 16:05:28 +08:00
Zhang Xianyi
9f59f384d8
Refs #223 . Fixed s/dgemv bug on windows.
2013-06-04 16:01:05 +08:00