Dan Horák
81fed55782
detect CPU on zArch
2017-04-20 21:13:41 +02:00
Martin Kroeker
35387edb8d
Merge pull request #1160 from gcp/extra-streamroller-cpuid
...
Add an extra familiy/model combination used by AMD Steamrolller.
2017-04-19 20:03:23 +02:00
Gian-Carlo Pascutto
9c884986ad
Add an extra familiy/model combination used by AMD Steamrolller (Godavari).
2017-04-19 19:15:47 +02:00
Martin Kroeker
f2f0e98bb5
Merge pull request #1158 from martin-frbg/force-zen
...
Make FORCE_ZEN option in getarch.c actually set target names to ZEN
2017-04-19 15:04:41 +02:00
Martin Kroeker
166d64eb7c
Fix FORCE_ZEN option in getarch.c
2017-04-19 14:20:42 +02:00
Martin Kroeker
e078339e8d
Merge pull request #1157 from gcp/revert-zen-param
...
Revert Zen param.h to Haswell values (instead of Excavator).
2017-04-18 13:32:16 +02:00
Gian-Carlo Pascutto
832a272784
Revert Zen param.h to Haswell values (instead of Excavator).
2017-04-18 12:40:25 +02:00
Martin Kroeker
356606314c
Merge pull request #1156 from SoapGentoo/cmake-fixes
...
Use GNUInstallDirs to allow changing target directories
2017-04-18 09:00:24 +02:00
David Seifert
ed79a29d87
Use GNUInstallDirs to allow changing target directories
...
* Multi-lib distributions need to change the libdir
which is only portably possible with `GNUInstallDirs`.
* Multi-arch distributions such as Debian and Exherbo
need to be able to change the bindir.
2017-04-16 00:43:47 +02:00
Martin Kroeker
77d16ffc69
Merge pull request #1154 from sharkcz/s390x
...
add lapack laswp directory for zarch
2017-04-13 16:37:29 +02:00
Dan Horák
56762d5e4c
add lapack laswp for zarch
2017-04-13 15:38:59 +02:00
Zhang Xianyi
90dd190a6d
Build shared library for Android.
2017-04-11 12:01:18 +08:00
Martin Kroeker
ab9ec4ab4e
Merge pull request #1148 from gcp/fix-dynamic-zen
...
Fix dynamic detection for ZEN CPUs.
2017-04-10 20:17:14 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Martin Kroeker
20a413e154
Merge pull request #1142 from amodra/develop
...
Power8 inline assembly tweaks
2017-04-06 16:20:01 +02:00
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
...
Further fixes on top of 9e2f316ed
. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker
1acfc78c8f
Merge pull request #1140 from JohannesBuchner/develop
...
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 09:47:09 +02:00
Johannes Buchner
b4071d0d16
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 17:07:27 +10:00
Martin Kroeker
7908efafc8
Fix integer overflow in LAPACK DBDSQR, SBDSQR ( #1135 )
...
* Fix integer overflow in DBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in SBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
2017-03-24 22:05:22 +01:00
Martin Kroeker
66dc10b019
Merge pull request #1133 from steckdenis/develop
...
Add ZEN support
2017-03-24 13:47:32 +01:00
Zhang Xianyi
b5c96fcfcd
Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger.
2017-03-20 17:39:25 +08:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Andrew
99880f7906
Address unlikely memleak in zimatcopy interface ( #1129 )
...
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
...
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker
ad124a5e8b
Merge pull request #1126 from martin-frbg/pgi
...
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
2017-03-14 17:17:39 +01:00
Martin Kroeker
211d2eceb5
Update zdot.c
2017-03-13 18:08:00 +01:00
Martin Kroeker
5813ed095b
Update zdot.c
2017-03-13 17:49:07 +01:00
Martin Kroeker
e44b028fe5
Replace gnu _real_, _imag_ extensions in initializers
2017-03-13 00:40:11 +01:00
Martin Kroeker
a6efabf155
Replace gnu _real_ , _imag_ extensions in initializers
2017-03-13 00:38:37 +01:00
Martin Kroeker
ea26b00c06
Fix CREAL,CIMAG macros for PGI
2017-03-13 00:36:01 +01:00
Abdurrauf
08786c4b95
strmm and ctrmm
2017-03-13 01:23:16 +04:00
Martin Kroeker
12e476f7a2
Merge pull request #1124 from martin-frbg/c_check-ppc
...
Update c_check.cmake to label ppc64 as power ARCH
2017-03-10 12:58:38 +01:00
Martin Kroeker
8de40955ad
Update c_check.cmake
2017-03-10 11:45:48 +01:00
Martin Kroeker
9b24688eed
Merge pull request #1122 from martin-frbg/zlasyf
...
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 09:51:34 +01:00
Martin Kroeker
43224f7273
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 08:44:49 +01:00
Martin Kroeker
9254a701f3
Merge pull request #1121 from staticfloat/sf/Xsymv_export
...
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-10 08:33:36 +01:00
Elliot Saba
26a614fdd1
Whitespace cleanup/reformatting
2017-03-09 15:30:43 -08:00
Elliot Saba
7ae64f4f9c
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-09 15:22:40 -08:00
Zhang Xianyi
90e02ccf68
Support ARM softfp ABI for sgemm on ARMV7.
...
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Zhang Xianyi
503dcbfde6
Merge branch 'develop' into arm_soft_fp_abi
2017-03-06 13:53:56 +08:00
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
2017-03-06 04:27:40 +04:00
Martin Kroeker
4227049c7d
Merge pull request #1111 from martin-frbg/kaby-no-avx
...
Fix core detection for Kaby Lake without AVX (G4560)
2017-03-02 18:43:59 +01:00
Martin Kroeker
688267edf3
Fix core detection for Kaby Lake without AVX (G4560)
...
Should fix #1109 )
2017-03-02 17:36:16 +01:00
Martin Kroeker
d1fe040d9b
Merge pull request #1110 from quickwritereader/develop
...
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf
411982715c
conventional usage of the register save area
2017-03-01 20:39:39 +04:00
Abdurrauf
e831d6924e
changed to conventional register save area
2017-03-01 03:13:21 +04:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00