Martin Kroeker
|
6c7b691083
|
Really revert xDOT changes from 1832
neglected to rebase #1892 on merging
|
2018-11-30 21:32:01 +01:00 |
Martin Kroeker
|
5f4c550c27
|
Merge pull request #1892 from martin-frbg/mipsdot
revert MIPS64 xDOT kernel changes from #1832
|
2018-11-30 21:28:21 +01:00 |
Martin Kroeker
|
95a5542e3c
|
Revert DOT kernel changes from #1834
as the failures seen on Loongson3A appear to be limited to DSDOT/SDSDOT (i.e. my hackish "fix" from #1684)
|
2018-11-30 11:16:24 +01:00 |
Martin Kroeker
|
7a2e1bc804
|
Use generic kernel for DSDOT/SDSDOT
as discussed in #1834
|
2018-11-30 10:57:09 +01:00 |
fengruilin
|
43bb386b10
|
fix dot problem on 64bit mips
|
2018-11-15 11:11:59 +08:00 |
fengrl
|
2d8064174c
|
register push/pop command change
64bit push/pop register command should be used. Otherwise, data will lost.
|
2018-10-26 17:55:15 +08:00 |
fengruilin
|
6fc85a6359
|
test_axpy work error on LOONGSON3A platform #1777
|
2018-09-26 15:14:04 +08:00 |
Martin Kroeker
|
4e103c822c
|
typo fix
|
2018-07-16 12:56:39 +02:00 |
Martin Kroeker
|
d2142760e0
|
Fix precision problem in DSDOT
|
2018-07-15 17:11:40 +02:00 |
Martin Kroeker
|
2fbfc64da8
|
Use C kernels for default c/zAXPY, xROT, c/zSWAP
|
2018-07-15 17:09:55 +02:00 |
Shivraj Patil
|
e3d844b062
|
Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2017-09-22 11:57:43 +05:30 |
Shivraj Patil
|
beb1d076a4
|
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-07-15 18:38:25 +05:30 |
Aleksey Kuleshov
|
fca66262c4
|
mips64/axpy: fix error when INCY == 0
|
2016-05-23 13:30:27 +03:00 |
Shivraj Patil
|
2c3dfe2bf3
|
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
|
2016-04-22 14:03:18 +05:30 |
Zhang Xianyi
|
2fb02626da
|
Update organization info.
|
2014-11-25 15:28:58 +08:00 |
Timothy Gu
|
6c2ead30f0
|
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
|
2014-06-27 12:05:18 -07:00 |
Wang Qian
|
8e53b57bb2
|
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
|
2012-01-10 17:16:13 +00:00 |
Wang Qian
|
66904fc4e8
|
BLAS3 used standard MIPS instructions without extensions on Loongson 3B.
|
2011-11-25 11:20:25 +00:00 |
Xianyi Zhang
|
0884f6b78d
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3b
|
2011-11-11 14:26:49 +00:00 |
traz
|
2d78fb05c8
|
Add conjugate condition to gemv.
|
2011-11-10 15:38:48 +00:00 |
Xianyi Zhang
|
b95ad4cfaf
|
Support detecting ICT Loongson-3B CPU.
|
2011-11-09 19:29:50 +00:00 |
traz
|
a32e56500a
|
Fix the compute error of gemv when incx and incy are negative numbers.
|
2011-11-04 19:32:21 +00:00 |
traz
|
c1e618ea2d
|
Add complete gemv function on Loongson3a platform.
|
2011-11-03 13:53:48 +00:00 |
traz
|
e08cfaf9ca
|
Complete all the complex single-precision functions of level3, but the performance needs further improve.
|
2011-09-16 17:50:40 +00:00 |
traz
|
ee4bb8bd25
|
Add ctrmm part in cgemm_kernel_loongson3a_4x2_ps.S.
|
2011-09-16 16:08:39 +00:00 |
traz
|
7fa3d23dd9
|
Complete cgemm function, but no optimization.
|
2011-09-15 16:08:23 +00:00 |
traz
|
9679dd077e
|
Fix some compute error.
|
2011-09-14 20:00:35 +00:00 |
traz
|
d238a768ab
|
Use ps instructions in cgemm.
|
2011-09-14 15:32:25 +00:00 |
traz
|
74d4cdb81a
|
Fix an illegal instruction for strmm_RTLU.
|
2011-09-02 19:41:06 +00:00 |
traz
|
7906146836
|
Fix an error for strmm_LLTN.
|
2011-09-02 16:57:33 +00:00 |
traz
|
3274ff47b8
|
Fix an error for strmm_LLTN.
|
2011-09-02 16:50:50 +00:00 |
traz
|
a059c553a1
|
Fix a compute error for strmm.
|
2011-09-02 16:00:04 +00:00 |
traz
|
23e182ca7c
|
Fix stack-pointer bug for strmm.
|
2011-09-02 15:28:01 +00:00 |
traz
|
a15bc95824
|
Add strmm part.
|
2011-09-02 09:15:09 +00:00 |
traz
|
09f49fa891
|
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now.
|
2011-08-31 21:24:03 +00:00 |
traz
|
cb0214787b
|
Modify compile options.
|
2011-08-30 20:57:00 +00:00 |
traz
|
2e8cdd1542
|
Using ps instruction.
|
2011-08-30 20:54:19 +00:00 |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
2011-07-18 17:03:38 +00:00 |
traz
|
68532fa9ec
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
|
2011-06-24 09:28:12 +00:00 |
traz
|
708d2b6255
|
Fix compute error in ztrmm.
|
2011-06-24 09:27:41 +00:00 |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
2011-06-23 21:11:00 +00:00 |
traz
|
14f81da375
|
Change prefetch length of A and B, the performance is 2.1G now.
|
2011-06-23 10:46:58 +00:00 |
Xianyi Zhang
|
fc21f7ad28
|
Merge branch 'release-v0.1alpha2' into loongson3a
|
2011-06-23 16:08:23 +08:00 |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
2011-06-21 22:16:23 +00:00 |
Xianyi Zhang
|
c4efde7713
|
Merge branch 'loongson3a' into release-v0.1alpha2
|
2011-06-21 17:50:00 +08:00 |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
2011-05-28 09:48:34 +00:00 |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
2011-05-27 09:47:17 +00:00 |
traz
|
5ca4e51df0
|
Remove the useless code, modify code comments and format.
|
2011-05-18 10:54:51 +00:00 |
Xianyi Zhang
|
fcb5ce011b
|
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel.
|
2011-05-17 21:24:00 +00:00 |
traz
|
a9320f896e
|
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
|
2011-05-14 22:00:57 +00:00 |