Commit Graph

645 Commits

Author SHA1 Message Date
wernsaar d854b30ae6 Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3 2013-06-09 17:26:42 +02:00
wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 2013-06-09 15:57:42 +02:00
wernsaar e4c39c7c26 changed stack touching 2013-06-08 10:43:08 +02:00
wernsaar ba800f0883 correct GEMM_THREAD in param.h 2013-06-08 10:03:59 +02:00
wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 2013-06-08 09:40:17 +02:00
Zhang Xianyi 960b0c88a7 Refs #227. Detected LLVM/Clang compiler. 2013-06-06 23:43:40 +08:00
Zhang Xianyi 65ffead0cf Refs #124. Check XSAVE flag on x86 CPU. 2013-06-06 22:50:43 +08:00
Zhang Xianyi f2fb8c7035 Change LIBSUFFIX from .lib to .a on windows. 2013-06-04 16:05:28 +08:00
Zhang Xianyi 9f59f384d8 Refs #223. Fixed s/dgemv bug on windows. 2013-06-04 16:01:05 +08:00
wangqian 23965f164c Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64. 2013-05-29 19:48:31 +08:00
wangqian 6a72840945 Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86. 2013-05-29 13:23:12 +08:00
Zhang Xianyi 947457fb7c Fixed the bug about testing the exist of lapack tar package. 2013-05-24 15:52:35 +08:00
Zhang Xianyi 79120bf9a0 Refs #205. Merge boegel's codes about downloading LAPACK. 2013-05-24 15:29:10 +08:00
Zhang Xianyi acb11905d5 Fixed #199. Saved USE_THREAD switch for make install. 2013-05-24 15:15:52 +08:00
Zhang Xianyi 109500178c Refs #220. Support Power7 by old Power6 kernels. 2013-05-21 22:59:45 +08:00
Zhang Xianyi e50a664865 Refs #215. Fixed the compatible between <complex.h> and <complex> in C++. 2013-05-17 16:41:05 +08:00
Zhang Xianyi 357078b93e Refs #216. Revert the default value of GEMM_MULTITHREAD_THRESHOLD to 4. 2013-05-03 09:08:54 +08:00
wernsaar 731220f870 changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit 2013-04-30 10:07:17 +02:00
wernsaar 69aa6c8fb1 bad performance with some data 2013-04-28 11:14:23 +02:00
wernsaar 60b263f3d2 removed trsm_kernel_RT_4x4_bulldozer.S. wrong results 2013-04-27 17:23:08 +02:00
wernsaar 7ac306e0da added trsm_kernel_RT_4x4_bulldozer.S 2013-04-27 16:48:48 +02:00
wernsaar 4cb454cdf2 added trsm_kernel_LT_4x4_bulldozer.S 2013-04-27 14:30:00 +02:00
wernsaar 19ad2fb128 prefetch improved. Defined 2 different kernels for inner loop 2013-04-27 13:40:49 +02:00
Zhang Xianyi 5d96e4f224 Refs #210. Disable checking /lib/libpthread.so*. 2013-04-27 15:02:04 +08:00
wernsaar 6821677489 minor improvements and code cleanup 2013-04-26 20:05:42 +02:00
Xianyi Zhang dbbda55e67 Updated the mailing list for OpenBLAS. 2013-04-25 00:45:42 +08:00
Xianyi Zhang 6c34a7f43c Updated the mailing list for OpenBLAS. 2013-04-25 00:44:22 +08:00
Zhang Xianyi 3326f3152c Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
2013-04-17 23:56:09 -07:00
wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
2013-04-16 19:05:06 +02:00
Zhang Xianyi 48bdc1ad3b Added NO_PARALLEL_MAKE flag to disable parallel make. 2013-04-15 21:37:30 +08:00
Zhang Xianyi 3ad29452d1 Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
2013-04-15 00:20:55 -07:00
wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
2013-04-12 17:55:51 +02:00
Zhang Xianyi 990efcab6e Merge branch 'loongson3b' into loongson3a 2013-04-11 16:11:03 +00:00
Zhang Xianyi 75a5dc3975 Added the configure for the host loongcc compiling on Loongson3. 2013-04-11 16:10:47 +00:00
Xianyi Zhang 986d542acb Merge branch 'loongson3a' into loongson3b 2013-04-11 16:07:59 +08:00
Xianyi Zhang 6958c1a1aa Fixed the SEGFAULT bug with Loongcc and Loongson3. 2013-04-11 15:33:43 +08:00
Zhang Xianyi a068d54981 Refs #209. Export the missing cblas_cdotc_sub functions. 2013-04-08 23:21:28 +08:00
Xianyi Zhang d692ee07f7 Merge branch 'loongson3a' into loongson3b 2013-04-08 14:56:39 +08:00
Xianyi Zhang 1a57717b1a Added the configuration of Loongcc compiler for Loongson 3 CPU. 2013-04-07 15:42:07 +08:00
Xianyi Zhang 6b01d58712 Disable the optimization of muli-threading gemm on the Loongson3A. 2013-03-30 20:12:43 +00:00
Xianyi Zhang 35b943f17f Merge branch 'develop' into loongson3a 2013-03-27 14:36:15 +00:00
Zhang Xianyi e029242870 Merge pull request #206 from wlbksy/patch-1
Fix #204 wget in mingw/msys sometimes download file with trailing name,
2013-03-23 09:57:41 -07:00
wlbksy 7a9b94b519 Fix #204 2013-03-23 14:41:26 +08:00
Kenneth Hoste 66b919d99f adjusted Makefile to allow for provided required LAPACK source files rather than downloading them 2013-03-22 19:45:11 +01:00
Zhang Xianyi f4846afbad Merge pull request #201 from Explorer09/develop 2013-03-18 07:31:30 -07:00
Explorer09 53588bc786 getarch.c: Minor re-ordering of architecture list 2013-03-17 23:09:23 +08:00
Explorer09 b47f13ee4c getarch.c: Minor re-ordering of architecture list 2013-03-17 23:07:48 +08:00
Explorer09 309f90e563 TargetList.txt: minor re-ordering 2013-03-17 23:03:05 +08:00
Explorer09 773c01f496 Typo correction in README.md 2013-03-17 22:48:24 +08:00
Zhang Xianyi d831b2ff8b Override CFLAGS in LAPACK make.in. 2013-03-10 01:01:16 +08:00