Commit Graph

1318 Commits

Author SHA1 Message Date
Zhang Xianyi 3326f3152c Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
2013-04-17 23:56:09 -07:00
wernsaar 7641f6e253 Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
2013-04-16 19:05:06 +02:00
Zhang Xianyi 48bdc1ad3b Added NO_PARALLEL_MAKE flag to disable parallel make. 2013-04-15 21:37:30 +08:00
Zhang Xianyi 3ad29452d1 Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
2013-04-15 00:20:55 -07:00
wernsaar 6e3f6f25a5 New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
2013-04-12 17:55:51 +02:00
Zhang Xianyi 990efcab6e Merge branch 'loongson3b' into loongson3a 2013-04-11 16:11:03 +00:00
Zhang Xianyi 75a5dc3975 Added the configure for the host loongcc compiling on Loongson3. 2013-04-11 16:10:47 +00:00
Xianyi Zhang 986d542acb Merge branch 'loongson3a' into loongson3b 2013-04-11 16:07:59 +08:00
Xianyi Zhang 6958c1a1aa Fixed the SEGFAULT bug with Loongcc and Loongson3. 2013-04-11 15:33:43 +08:00
Zhang Xianyi a068d54981 Refs #209. Export the missing cblas_cdotc_sub functions. 2013-04-08 23:21:28 +08:00
Xianyi Zhang d692ee07f7 Merge branch 'loongson3a' into loongson3b 2013-04-08 14:56:39 +08:00
Xianyi Zhang 1a57717b1a Added the configuration of Loongcc compiler for Loongson 3 CPU. 2013-04-07 15:42:07 +08:00
Xianyi Zhang 6b01d58712 Disable the optimization of muli-threading gemm on the Loongson3A. 2013-03-30 20:12:43 +00:00
Xianyi Zhang 35b943f17f Merge branch 'develop' into loongson3a 2013-03-27 14:36:15 +00:00
Zhang Xianyi e029242870 Merge pull request #206 from wlbksy/patch-1
Fix #204 wget in mingw/msys sometimes download file with trailing name,
2013-03-23 09:57:41 -07:00
wlbksy 7a9b94b519 Fix #204 2013-03-23 14:41:26 +08:00
Kenneth Hoste 66b919d99f adjusted Makefile to allow for provided required LAPACK source files rather than downloading them 2013-03-22 19:45:11 +01:00
Zhang Xianyi f4846afbad Merge pull request #201 from Explorer09/develop 2013-03-18 07:31:30 -07:00
Explorer09 53588bc786 getarch.c: Minor re-ordering of architecture list 2013-03-17 23:09:23 +08:00
Explorer09 b47f13ee4c getarch.c: Minor re-ordering of architecture list 2013-03-17 23:07:48 +08:00
Explorer09 309f90e563 TargetList.txt: minor re-ordering 2013-03-17 23:03:05 +08:00
Explorer09 773c01f496 Typo correction in README.md 2013-03-17 22:48:24 +08:00
Zhang Xianyi d831b2ff8b Override CFLAGS in LAPACK make.in. 2013-03-10 01:01:16 +08:00
Zhang Xianyi 724ae159ce Fixed the Windows x86_64 ABI bug in s/daxpy kernels. 2013-03-08 22:28:34 +08:00
Zhang Xianyi 2c9a203bd1 Merge pull request #198 from wernsaar/develop
new optimization of dgemm kernel for bulldozer: 10% performance increase
2013-03-06 13:39:53 -08:00
wernsaar f300ce3df5 new optimization of dgemm kernel for bulldozer: 10% performance increase 2013-03-06 17:26:03 +01:00
Zhang Xianyi e2c7c75715 Merge pull request #197 from wernsaar/develop
optimized again bulldozer dgemm kernel
2013-03-06 01:11:08 -08:00
wernsaar 66e64131ed optimized again bulldozer dgemm kernel 2013-03-05 19:51:37 +01:00
Zhang Xianyi 5900b1462e Merge pull request #195 from wernsaar/develop
Develop dgemm for bullozer
2013-03-05 05:35:42 -08:00
wernsaar 9405f26f4b new dgemm_kernel for bulldozer 2013-03-04 17:37:38 +01:00
Zhang Xianyi 54e7b37630 Merge branch 'develop' 2013-03-02 14:42:06 +08:00
Zhang Xianyi 529f1b5006 Refs#194. Export the missing LAPACK s/dlamc3 functions. 2013-03-02 14:41:18 +08:00
Zhang Xianyi e5ac3007e0 Merge branch 'develop' 2013-03-02 14:24:23 +08:00
Zhang Xianyi 0d0405b434 Updated the doc for 0.2.6 version. 2013-03-02 14:22:27 +08:00
Zhang Xianyi f1ce74ffdd Improved the print when OS don't support AVX. 2013-03-02 14:15:54 +08:00
Zhang Xianyi d744c9590a In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly. 2013-03-01 14:36:47 +08:00
Zhang Xianyi 3cc6ae793e Refs #174. Return sb pointer when OpenMP or Windows. 2013-02-26 00:48:21 +08:00
Zhang Xianyi 4c2123c334 Fixed the overflowing bug in single thread cholesky factorization. 2013-02-23 13:00:52 +08:00
Zhang Xianyi 5155e3f509 Refs #174. Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
2013-02-13 16:05:58 +08:00
Zhang Xianyi 5c8bf6ae0e Merge branch 'bulldozer' into develop 2013-02-10 01:19:42 +08:00
Zhang Xianyi 6ae2f868fd Set the affinity. Only use 1 core of each module on bulldozer. 2013-02-09 18:19:02 +01:00
Zhang Xianyi a1ead62f28 Disable the warning of sgemm bulldozer kernel. 2013-02-09 17:03:13 +01:00
Zhang Xianyi 0133580148 Used sgemm bulldozer kernel on 64 bit. 2013-02-09 16:29:14 +01:00
Zhang Xianyi 274246651d Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer 2013-02-09 16:25:07 +01:00
Zhang Xianyi 299b5a44dc Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer 2013-02-09 16:22:04 +01:00
Zaheer Chothia a9500d0079 Missing line continuation -- follow-up to last commit (64ad8b9809). 2013-02-01 09:34:12 +01:00
Zaheer Chothia 64ad8b9809 Refs #193. Don't use C99 complex numbers when building C++ code. 2013-02-01 09:24:44 +01:00
Zaheer Chothia 875d520ccf Refs #193. cblas: move #include out of extern "C" block.
Standard headers may contain C++ templates which are not permitted inside an
extern "C" block. This might be the case when we include <complex.h>.
2013-01-31 08:48:27 +01:00
Zhang Xianyi d311236dfd Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64. 2013-01-25 20:56:14 +08:00
Zhang Xianyi 36e0982966 Refs #187. Use perl to generate cblas_noconst.h instead of sed.
Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187
2013-01-22 00:29:54 +08:00