Commit Graph

100 Commits

Author SHA1 Message Date
traz c8360e3ae5 Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops. 2011-07-18 17:03:38 +00:00
traz 68532fa9ec Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a 2011-06-24 09:28:12 +00:00
traz 708d2b6255 Fix compute error in ztrmm. 2011-06-24 09:27:41 +00:00
traz e72113f06a Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G. 2011-06-23 21:11:00 +00:00
traz 14f81da375 Change prefetch length of A and B, the performance is 2.1G now. 2011-06-23 10:46:58 +00:00
Xianyi Zhang fc21f7ad28 Merge branch 'release-v0.1alpha2' into loongson3a 2011-06-23 16:08:23 +08:00
traz 1c96d345e2 Improve zgemm performance from 1G to 1.8G, change block size in param.h. 2011-06-21 22:16:23 +00:00
Xianyi Zhang c4efde7713 Merge branch 'loongson3a' into release-v0.1alpha2 2011-06-21 17:50:00 +08:00
Xianyi Zhang 32353a9d30 Refs #20. Fixed the installation bug with DYNAMIC_ARCH=1. 2011-06-21 17:39:08 +08:00
Xianyi Zhang b3d1887745 Fixed #35 a build bug with NO_LAPACK=1 DYNAMIC_ARCH=1 FC=gfortran. I forgot to test it with gfortran in last bug fixed commit. 2011-06-09 22:59:49 +08:00
Xianyi Zhang 8d50a9fd1a Fixed #35 a build bug with NO_LAPACK=1 & DYNAMIC_ARCH=1. 2011-06-09 11:38:59 +08:00
Wang Qian 4335bca2f7 Fixed #33 ztrmm bug on Nehalem. 2011-06-07 12:53:25 +08:00
Xianyi 31040e4d80 Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack. 2011-06-03 13:19:54 +08:00
traz 88d94d0ec8 Fixed #30 strmm computational error on Loongson3A. 2011-05-28 09:48:34 +00:00
traz fc84909115 Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a. 2011-05-27 09:47:17 +00:00
traz 5ca4e51df0 Remove the useless code, modify code comments and format. 2011-05-18 10:54:51 +00:00
Xianyi Zhang fcb5ce011b Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel. 2011-05-17 21:24:00 +00:00
traz a9320f896e Fixed #25 dtrmm and dtrsm computational error on Loongson3A. 2011-05-14 22:00:57 +00:00
Xianyi Zhang b206fc7075 Fixed #28. Convert the result to double precision in the end of dsdot kernel. 2011-05-13 02:34:30 +08:00
traz 29dce62b8f Finish dtrsm_kernel_Rx.S on Loongson3A. 2011-05-11 10:44:23 +00:00
traz 432c309f63 Finish dtrsm_kernel_Lx.S on Loongson3A. 2011-05-10 12:48:43 +00:00
traz d2f351d819 Modify dtrsm compiler options 2011-05-09 17:31:58 +00:00
traz 5a991b7149 Fixed #24 drmm error on Loongson3A 2011-05-09 17:28:20 +00:00
traz 9320933520 Completely dtrmm function. 2011-04-17 20:26:49 +00:00
traz 921caefa56 Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 . 2011-04-15 21:56:25 +00:00
traz ecd4c1f3d9 Modify prefetching C. 2011-04-11 22:46:36 +00:00
traz ab9e4ce351 Adjust kc size from 112 to 116 . 2011-04-11 22:17:57 +00:00
traz 782205a693 Add dgemm compiler Options in KERNEL.LOONGSON3A. 2011-04-06 10:38:34 +00:00
traz ac494c0d04 New kernel in LOONGSON3A. 2011-04-06 10:36:44 +00:00
Xianyi Zhang f405b5bcc5 Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct. 2011-03-18 23:05:56 +00:00
Wang Qian d5cffd506a Modified the default kernel makefile in MIPS64 arch. 2011-03-07 11:23:12 +00:00
Xianyi Zhang 5838f12995 Support unalign address in daxpy on loongson3a simd.. 2011-03-05 10:17:10 +08:00
Xianyi Zhang 5444a3f8f7 Unroll to 16 in daxpy on loongson3a. 2011-03-04 17:50:17 +08:00
Xianyi Zhang 88cbfcc5b5 Merge commit 'origin/x86' into loongson3a 2011-03-04 14:11:52 +00:00
Xianyi Zhang ce78abe37e Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86 2011-03-04 11:53:04 +08:00
Xianyi Zhang 8f1090d32a Support NO_LAPACK=1 to build the lib without LAPACK functions. 2011-03-04 11:51:32 +08:00
Xianyi 272f62a2b6 Changed movlps macro name in capital in x86/zdot_sse2.S file. 2011-03-03 00:46:39 +08:00
Xianyi 36016fe349 On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
2011-03-02 18:45:43 +08:00
Xianyi Zhang 6eb02bbb9c Merge remote branch 'origin/x86' into loongson3a 2011-03-02 13:52:05 +08:00
Xianyi 12214e1d0f Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch. 2011-02-23 20:08:34 +08:00
Xianyi Zhang 0cfd29a819 Fixed #7. 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0. 2011-02-21 00:24:21 +08:00
Xianyi bfaa80c316 fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch. 2011-02-18 03:00:58 +08:00
Xianyi Zhang c5852d4e30 fixed #4 csrot returned the wrong result when incx==incy==0. 2011-02-16 23:39:43 +08:00
Xianyi Zhang 84ba64e65b fixed a bug in drot whe incx or incy equals to zero. 2011-02-16 23:35:41 +08:00
Xianyi Zhang 1e671b49f3 Did the experiment with Loongson 3A 128bit load & store instruction. 2011-01-29 03:05:27 +08:00
Xianyi Zhang 77b7020d69 changed prefetch order. 2011-01-29 03:03:34 +08:00
Xianyi Zhang e003b811ab load x & y contiguously in axpy. 2011-01-28 11:18:50 +08:00
Xianyi Zhang ebe2da8474 Modified aligned size. Added additional prefetch instruction because of cache line is 32 bytes in Loongson 3A. 2011-01-27 23:07:06 +08:00
Xianyi Zhang c0b5992fab added axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance & instruction order. 2011-01-26 22:34:33 +08:00
Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 2011-01-24 14:54:24 +00:00