Werner Saar
|
cd5241d0cf
|
modified KERNEL for power, to use the generic DSDOT-KERNEL
|
2016-03-06 09:07:24 +01:00 |
Werner Saar
|
8d652f11e7
|
updated smallscaling.c to build without C99 or C11
increased the threshold value of nep.in to 40
|
2016-03-06 08:40:51 +01:00 |
Zhang Xianyi
|
6c86570e1f
|
Merge pull request #790 from jeromerobert/bug786
ztrmv_L.c: no longer need a 4kB buffer
|
2016-03-05 15:25:27 -05:00 |
Jerome Robert
|
53ba1a77c8
|
ztrmv_L.c: no longer need a 4kB buffer
Fix #786
|
2016-03-05 19:07:03 +01:00 |
Zhang Xianyi
|
d23c7c713c
|
Fixed #789 Fix utest/ctest.h on Mingw.
|
2016-03-05 09:34:37 -05:00 |
Zhang Xianyi
|
8c43d7fa5f
|
Merge remote-tracking branch 'origin/power8' into develop
Refs #774
|
2016-03-05 06:03:19 -05:00 |
Werner Saar
|
085f215257
|
Modified assembly label name, so that they are hidden.
Added license informations.
|
2016-03-05 10:27:27 +01:00 |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
2016-03-05 08:32:03 +08:00 |
Werner Saar
|
0afc76fd65
|
enabled gemm_beta assembly kernels
|
2016-03-04 15:01:15 +01:00 |
Werner Saar
|
91e1c5080c
|
modified configuration, to use power6 sgemm kernel for power8
|
2016-03-04 13:38:57 +01:00 |
Werner Saar
|
73f04c2c72
|
enabled hemv assemly function for power8
|
2016-03-04 13:20:50 +01:00 |
Werner Saar
|
3e633152c6
|
enabled symv assembly kernels on power8
|
2016-03-04 13:08:18 +01:00 |
Werner Saar
|
d5130ce7e3
|
enabled gemv assembly on power8
|
2016-03-04 12:53:31 +01:00 |
Werner Saar
|
4824b88fcb
|
enabled all level1 assembly kernels for power8
|
2016-03-04 12:35:25 +01:00 |
Werner Saar
|
cc26d888b8
|
BUGFIX: increased BUFFER_SIZE for POWER8
|
2016-03-04 10:26:53 +01:00 |
Zhang Xianyi
|
8577be2a95
|
Modify travis script.
|
2016-03-04 04:24:43 +08:00 |
Zhang Xianyi
|
1edf30b790
|
Change Opteron(SSE3) to Opteron_SSE3 at dyanmaic core name.
|
2016-03-01 20:13:08 +08:00 |
Werner Saar
|
b752858d6c
|
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
|
2016-03-01 07:33:56 +01:00 |
Zhang Xianyi
|
4fc8c937d4
|
Refs #695 add testcase.
|
2016-03-01 01:05:56 -05:00 |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
2016-03-01 11:18:56 +08:00 |
Zhang Xianyi
|
17d655fa64
|
Merge pull request #784 from peterph/develop
collected usage notes
|
2016-02-27 11:24:20 -05:00 |
Petr Cerny
|
f68141cf1d
|
collected usage notes
|
2016-02-27 16:57:22 +01:00 |
Zhang Xianyi
|
aa90518201
|
Update Changelog for 0.2.16.rc1.
|
2016-02-24 15:21:22 -05:00 |
Zhang Xianyi
|
6b85dbb6dc
|
Refs #696. Turn off stack limit setting on Linux.
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
|
2016-02-24 14:21:42 -05:00 |
Zhang Xianyi
|
a0debd4293
|
Refs #696. Turn off stack limit setting on Linux.
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
|
2016-02-24 14:18:39 -05:00 |
Zhang Xianyi
|
937493bfeb
|
Release 0.2.16 rc1
|
2016-02-23 18:29:21 -05:00 |
Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
2016-02-23 22:47:53 +00:00 |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
2016-02-19 17:56:07 -05:00 |
Zhang Xianyi
|
a04d0555ba
|
[av skip] Fix utest makefile bug on travis ci.
|
2016-02-20 00:21:43 +08:00 |
Zhang Xianyi
|
3761c30ba4
|
Fix makefile bug for utest.
|
2016-02-18 17:01:48 -05:00 |
Zhang Xianyi
|
38593cd3a3
|
Fix compiling bug on ARM Cortex-A57.
|
2016-02-13 15:38:52 +00:00 |
Zhang Xianyi
|
e3b7781c2b
|
Update readme.
|
2016-02-13 00:33:53 +08:00 |
Zhang Xianyi
|
5e6965ea47
|
Run utest when building.
|
2016-02-13 00:33:31 +08:00 |
Zhang Xianyi
|
5cc0301fc3
|
Enable utest for appveyor.
|
2016-02-12 01:50:20 -05:00 |
Zhang Xianyi
|
19a6dedfd6
|
Add utest for CMake.
|
2016-02-12 05:38:13 +08:00 |
Zhang Xianyi
|
0e2b92e216
|
Added mising lapacke files for CMake.
|
2016-02-12 05:28:16 +08:00 |
Zhang Xianyi
|
d06b92906a
|
Add gemm3m building for CMake.
|
2016-02-12 05:02:51 +08:00 |
Zhang Xianyi
|
8e98478ff3
|
Update ctest.h from github.com:xianyi/ctest.git.
|
2016-02-12 05:01:57 +08:00 |
Zhang Xianyi
|
fb8968fb83
|
Refs #707. Bugfix for previous commit.
|
2016-02-11 05:14:53 +08:00 |
Zhang Xianyi
|
dae6b82a71
|
Refs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.
If you want to build LAPACK deprecated functions since LAPACK 3.6.0
make BUILD_LAPACK_DEPRECATED=1
|
2016-02-11 04:22:53 +08:00 |
Zhang Xianyi
|
d73244b825
|
Refs #727. Align stack buffer address on 32-bytes.
|
2016-02-11 03:52:02 +08:00 |
Zhang Xianyi
|
233c6b959f
|
Merge pull request #780 from jeromerobert/bug727
Bug727
|
2016-02-08 13:24:40 -05:00 |
Jerome Robert
|
16ec5323c9
|
Fix zgemv.c compilation when stack allocation is disabled
|
2016-02-08 12:05:02 +01:00 |
Jerome Robert
|
0ad02ef2d6
|
update CONTRIBUTORS.md
|
2016-02-08 11:26:51 +01:00 |
Jerome Robert
|
73397faf68
|
Add benchmark/smallscaling.c
* Bench small matrices with multi-threading
* Close #727
|
2016-02-08 11:25:27 +01:00 |
Jerome Robert
|
5fc2203d8a
|
zgemv: Add a workaround for #746
|
2016-02-08 11:25:15 +01:00 |
Jerome Robert
|
78dcf5c3d5
|
Improve performances of ztrmv on small matrices
* Use stack allocation
* Disable multi-threading
* Ref #727
|
2016-02-08 11:25:02 +01:00 |
Jerome Robert
|
32f793195f
|
Use stack allocation in zgemv and zger
For better performance with small matrices
Ref #727
|
2016-02-08 11:24:21 +01:00 |
Zhang Xianyi
|
be4e5fcd20
|
Fixed #778. Merge branch 'buffer51-develop' into develop
|
2016-02-05 08:39:08 +08:00 |
buffer51
|
855e0cb700
|
Restored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682.
|
2016-02-04 17:20:07 -05:00 |