wernsaar
|
d854b30ae6
|
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
|
2013-06-09 17:26:42 +02:00 |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
2013-06-09 15:57:42 +02:00 |
wernsaar
|
e4c39c7c26
|
changed stack touching
|
2013-06-08 10:43:08 +02:00 |
wernsaar
|
ba800f0883
|
correct GEMM_THREAD in param.h
|
2013-06-08 10:03:59 +02:00 |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
2013-06-08 09:40:17 +02:00 |
Zhang Xianyi
|
960b0c88a7
|
Refs #227. Detected LLVM/Clang compiler.
|
2013-06-06 23:43:40 +08:00 |
Zhang Xianyi
|
65ffead0cf
|
Refs #124. Check XSAVE flag on x86 CPU.
|
2013-06-06 22:50:43 +08:00 |
Zhang Xianyi
|
f2fb8c7035
|
Change LIBSUFFIX from .lib to .a on windows.
|
2013-06-04 16:05:28 +08:00 |
Zhang Xianyi
|
9f59f384d8
|
Refs #223. Fixed s/dgemv bug on windows.
|
2013-06-04 16:01:05 +08:00 |
wangqian
|
23965f164c
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
|
2013-05-29 19:48:31 +08:00 |
wangqian
|
6a72840945
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
|
2013-05-29 13:23:12 +08:00 |
Zhang Xianyi
|
947457fb7c
|
Fixed the bug about testing the exist of lapack tar package.
|
2013-05-24 15:52:35 +08:00 |
Zhang Xianyi
|
79120bf9a0
|
Refs #205. Merge boegel's codes about downloading LAPACK.
|
2013-05-24 15:29:10 +08:00 |
Zhang Xianyi
|
acb11905d5
|
Fixed #199. Saved USE_THREAD switch for make install.
|
2013-05-24 15:15:52 +08:00 |
Zhang Xianyi
|
109500178c
|
Refs #220. Support Power7 by old Power6 kernels.
|
2013-05-21 22:59:45 +08:00 |
Zhang Xianyi
|
e50a664865
|
Refs #215. Fixed the compatible between <complex.h> and <complex> in C++.
|
2013-05-17 16:41:05 +08:00 |
Zhang Xianyi
|
357078b93e
|
Refs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4.
|
2013-05-03 09:08:54 +08:00 |
wernsaar
|
731220f870
|
changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit
|
2013-04-30 10:07:17 +02:00 |
wernsaar
|
69aa6c8fb1
|
bad performance with some data
|
2013-04-28 11:14:23 +02:00 |
wernsaar
|
60b263f3d2
|
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
|
2013-04-27 17:23:08 +02:00 |
wernsaar
|
7ac306e0da
|
added trsm_kernel_RT_4x4_bulldozer.S
|
2013-04-27 16:48:48 +02:00 |
wernsaar
|
4cb454cdf2
|
added trsm_kernel_LT_4x4_bulldozer.S
|
2013-04-27 14:30:00 +02:00 |
wernsaar
|
19ad2fb128
|
prefetch improved. Defined 2 different kernels for inner loop
|
2013-04-27 13:40:49 +02:00 |
Zhang Xianyi
|
5d96e4f224
|
Refs #210. Disable checking /lib/libpthread.so*.
|
2013-04-27 15:02:04 +08:00 |
wernsaar
|
6821677489
|
minor improvements and code cleanup
|
2013-04-26 20:05:42 +02:00 |
Xianyi Zhang
|
dbbda55e67
|
Updated the mailing list for OpenBLAS.
|
2013-04-25 00:45:42 +08:00 |
Xianyi Zhang
|
6c34a7f43c
|
Updated the mailing list for OpenBLAS.
|
2013-04-25 00:44:22 +08:00 |
Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
2013-04-17 23:56:09 -07:00 |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
2013-04-16 19:05:06 +02:00 |
Zhang Xianyi
|
48bdc1ad3b
|
Added NO_PARALLEL_MAKE flag to disable parallel make.
|
2013-04-15 21:37:30 +08:00 |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
2013-04-15 00:20:55 -07:00 |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
2013-04-12 17:55:51 +02:00 |
Zhang Xianyi
|
990efcab6e
|
Merge branch 'loongson3b' into loongson3a
|
2013-04-11 16:11:03 +00:00 |
Zhang Xianyi
|
75a5dc3975
|
Added the configure for the host loongcc compiling on Loongson3.
|
2013-04-11 16:10:47 +00:00 |
Xianyi Zhang
|
986d542acb
|
Merge branch 'loongson3a' into loongson3b
|
2013-04-11 16:07:59 +08:00 |
Xianyi Zhang
|
6958c1a1aa
|
Fixed the SEGFAULT bug with Loongcc and Loongson3.
|
2013-04-11 15:33:43 +08:00 |
Zhang Xianyi
|
a068d54981
|
Refs #209. Export the missing cblas_cdotc_sub functions.
|
2013-04-08 23:21:28 +08:00 |
Xianyi Zhang
|
d692ee07f7
|
Merge branch 'loongson3a' into loongson3b
|
2013-04-08 14:56:39 +08:00 |
Xianyi Zhang
|
1a57717b1a
|
Added the configuration of Loongcc compiler for Loongson 3 CPU.
|
2013-04-07 15:42:07 +08:00 |
Xianyi Zhang
|
6b01d58712
|
Disable the optimization of muli-threading gemm on the Loongson3A.
|
2013-03-30 20:12:43 +00:00 |
Xianyi Zhang
|
35b943f17f
|
Merge branch 'develop' into loongson3a
|
2013-03-27 14:36:15 +00:00 |
Zhang Xianyi
|
e029242870
|
Merge pull request #206 from wlbksy/patch-1
Fix #204 wget in mingw/msys sometimes download file with trailing name,
|
2013-03-23 09:57:41 -07:00 |
wlbksy
|
7a9b94b519
|
Fix #204
|
2013-03-23 14:41:26 +08:00 |
Kenneth Hoste
|
66b919d99f
|
adjusted Makefile to allow for provided required LAPACK source files rather than downloading them
|
2013-03-22 19:45:11 +01:00 |
Zhang Xianyi
|
f4846afbad
|
Merge pull request #201 from Explorer09/develop
|
2013-03-18 07:31:30 -07:00 |
Explorer09
|
53588bc786
|
getarch.c: Minor re-ordering of architecture list
|
2013-03-17 23:09:23 +08:00 |
Explorer09
|
b47f13ee4c
|
getarch.c: Minor re-ordering of architecture list
|
2013-03-17 23:07:48 +08:00 |
Explorer09
|
309f90e563
|
TargetList.txt: minor re-ordering
|
2013-03-17 23:03:05 +08:00 |
Explorer09
|
773c01f496
|
Typo correction in README.md
|
2013-03-17 22:48:24 +08:00 |
Zhang Xianyi
|
d831b2ff8b
|
Override CFLAGS in LAPACK make.in.
|
2013-03-10 01:01:16 +08:00 |