Commit Graph

2032 Commits

Author SHA1 Message Date
Werner Saar
b752858d6c added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8 2016-03-01 07:33:56 +01:00
Zhang Xianyi
4fc8c937d4 Refs #695 add testcase. 2016-03-01 01:05:56 -05:00
Zhang Xianyi
efa4f5c936 Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
2016-03-01 11:18:56 +08:00
Zhang Xianyi
17d655fa64 Merge pull request #784 from peterph/develop
collected usage notes
2016-02-27 11:24:20 -05:00
Petr Cerny
f68141cf1d collected usage notes 2016-02-27 16:57:22 +01:00
Zhang Xianyi
aa90518201 Update Changelog for 0.2.16.rc1. 2016-02-24 15:21:22 -05:00
Zhang Xianyi
6b85dbb6dc Refs #696. Turn off stack limit setting on Linux.
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:21:42 -05:00
Zhang Xianyi
a0debd4293 Refs #696. Turn off stack limit setting on Linux.
I cannot reproduce SEGFAULT of lapack-test with default stack size
on ARM Linux.
2016-02-24 14:18:39 -05:00
Zhang Xianyi
937493bfeb Release 0.2.16 rc1 v0.2.16.rc1 2016-02-23 18:29:21 -05:00
Zhang Xianyi
74b0672223 Fix c/zaxpyc kernel bug on Cortex-A57. 2016-02-23 22:47:53 +00:00
Zhang Xianyi
6e7be06e07 Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
2016-02-19 17:56:07 -05:00
Zhang Xianyi
a04d0555ba [av skip] Fix utest makefile bug on travis ci. 2016-02-20 00:21:43 +08:00
Zhang Xianyi
3761c30ba4 Fix makefile bug for utest. 2016-02-18 17:01:48 -05:00
Zhang Xianyi
38593cd3a3 Fix compiling bug on ARM Cortex-A57. 2016-02-13 15:38:52 +00:00
Zhang Xianyi
e3b7781c2b Update readme. 2016-02-13 00:33:53 +08:00
Zhang Xianyi
5e6965ea47 Run utest when building. 2016-02-13 00:33:31 +08:00
Zhang Xianyi
5cc0301fc3 Enable utest for appveyor. 2016-02-12 01:50:20 -05:00
Zhang Xianyi
19a6dedfd6 Add utest for CMake. 2016-02-12 05:38:13 +08:00
Zhang Xianyi
0e2b92e216 Added mising lapacke files for CMake. 2016-02-12 05:28:16 +08:00
Zhang Xianyi
d06b92906a Add gemm3m building for CMake. 2016-02-12 05:02:51 +08:00
Zhang Xianyi
8e98478ff3 Update ctest.h from github.com:xianyi/ctest.git. 2016-02-12 05:01:57 +08:00
Zhang Xianyi
fb8968fb83 Refs #707. Bugfix for previous commit. 2016-02-11 05:14:53 +08:00
Zhang Xianyi
dae6b82a71 Refs #707. Add BUILD_LAPACK_DEPRECATED flag in Makefile.rule.
If you want to build LAPACK deprecated functions since LAPACK 3.6.0

make BUILD_LAPACK_DEPRECATED=1
2016-02-11 04:22:53 +08:00
Zhang Xianyi
d73244b825 Refs #727. Align stack buffer address on 32-bytes. 2016-02-11 03:52:02 +08:00
Zhang Xianyi
233c6b959f Merge pull request #780 from jeromerobert/bug727
Bug727
2016-02-08 13:24:40 -05:00
Jerome Robert
16ec5323c9 Fix zgemv.c compilation when stack allocation is disabled 2016-02-08 12:05:02 +01:00
Jerome Robert
0ad02ef2d6 update CONTRIBUTORS.md 2016-02-08 11:26:51 +01:00
Jerome Robert
73397faf68 Add benchmark/smallscaling.c
* Bench small matrices with multi-threading
* Close #727
2016-02-08 11:25:27 +01:00
Jerome Robert
5fc2203d8a zgemv: Add a workaround for #746 2016-02-08 11:25:15 +01:00
Jerome Robert
78dcf5c3d5 Improve performances of ztrmv on small matrices
* Use stack allocation
* Disable multi-threading
* Ref #727
2016-02-08 11:25:02 +01:00
Jerome Robert
32f793195f Use stack allocation in zgemv and zger
For better performance with small matrices
Ref #727
2016-02-08 11:24:21 +01:00
Zhang Xianyi
be4e5fcd20 Fixed #778. Merge branch 'buffer51-develop' into develop 2016-02-05 08:39:08 +08:00
buffer51
855e0cb700 Restored LAPACK_COMPLEX_STRUCTURE for Android prior to 21. Refs #682. 2016-02-04 17:20:07 -05:00
buffer51
7f7d04dcd2 Fixed linking error when compiling ARMv7 for Android (disabled -lpthread and added -Wl,--no-warn-mismatch). 2016-02-04 17:05:31 -05:00
buffer51
4e1b521e27 Fix lapack complex implementation of lauu2 and potf2 for Android (use FLOAT instead of FLOAT[2] as imaginary part is not used). 2016-02-04 16:59:56 -05:00
Zhang Xianyi
a1a96589aa Fixed #773 blas_quickdivide bug on CMake and Visual Studio x86 32-bit. 2016-02-04 15:23:32 -05:00
Zhang Xianyi
0e68beb89f Fixed #711, #698. Merge branch 'byzhang-develop' into develop 2016-02-03 02:56:27 +08:00
Zhang Xianyi
926ba8b7ca Merge branch 'develop' of https://github.com/byzhang/OpenBLAS into byzhang-develop 2016-02-03 02:48:32 +08:00
Zhang Xianyi
9f080c47e1 Merge pull request #743 from tkelman/patch-1
re enable Fortran optimization flag on windows
2016-02-02 13:46:12 -05:00
Zhang Xianyi
52eba814ce Fixed #769. Merge branch 'martin-frbg-develop' into develop 2016-02-02 13:43:51 -05:00
Martin Kroeker
935356c34f Update dynamic.c and cpuid_x86.c for Intel Avoton.
Second part of "support Intel Avoton via Nehalem kernel"
2016-02-02 13:42:55 -05:00
Zhang Xianyi
ff9388d625 Refs #768. Swap the result of zdot x87 fp kernel. 2016-02-02 13:38:01 -05:00
Martin Kroeker
4f05c23673 Update cpuid_x86.c
Add recognition of Intel Atom C27xx (Avoton, model code 4D)
2016-02-02 13:38:01 -05:00
Benyu Zhang
4a1263f609 Fix the source paths 2016-02-01 18:32:42 -08:00
Zhang Xianyi
962376664d Refs #768. Swap the result of zdot x87 fp kernel. 2016-02-02 09:15:02 +08:00
Tony Kelman
5fef0d1b75 re enable Fortran optimization flag on windows
partial revert of 299cdcdc29
from #696, was not explained why that was needed
2016-01-30 01:13:51 -08:00
Zhang Xianyi
578f471808 Fix utest bug when INTERFACE64=1. 2016-01-28 22:18:38 -06:00
Zhang Xianyi
5a8447e97e Use ctest.h for unit test. Enable unit test on travis CI. 2016-01-29 11:35:31 +08:00
Zhang Xianyi
be95bdaf47 Detect ARMV8 on 32-bit mode by using ARMV7 kernels. 2016-01-28 17:30:26 +00:00
Zhang Xianyi
c44ff4d648 Refs #714. avoid compiling warnings. 2016-01-28 04:38:07 +08:00