Martin Kroeker
ab9ec4ab4e
Merge pull request #1148 from gcp/fix-dynamic-zen
...
Fix dynamic detection for ZEN CPUs.
2017-04-10 20:17:14 +02:00
Gian-Carlo Pascutto
0cbd2d34e4
Recognize ZEN when passed as OPENBLAS_CORETYPE.
2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto
62979fd104
Fix dynamic detection for ZEN CPUs.
2017-04-10 19:08:37 +02:00
Martin Kroeker
20a413e154
Merge pull request #1142 from amodra/develop
...
Power8 inline assembly tweaks
2017-04-06 16:20:01 +02:00
Alan Modra
dc40bc7368
Power8 inline assembly tweaks
...
Further fixes on top of 9e2f316ed
. Writing some doco for gcc on
inline assembly woke me up to some more errors.
- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
*y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
operand, a problem if the "=f" sum output was ever allocated a vsx
reg in the altivec set. This might be possible with inlining and
future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker
1acfc78c8f
Merge pull request #1140 from JohannesBuchner/develop
...
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 09:47:09 +02:00
Johannes Buchner
b4071d0d16
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 17:07:27 +10:00
Martin Kroeker
7908efafc8
Fix integer overflow in LAPACK DBDSQR, SBDSQR ( #1135 )
...
* Fix integer overflow in DBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in SBDSQR
As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
* Fix integer overflow in threshold calculation
Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
2017-03-24 22:05:22 +01:00
Martin Kroeker
66dc10b019
Merge pull request #1133 from steckdenis/develop
...
Add ZEN support
2017-03-24 13:47:32 +01:00
Zhang Xianyi
b5c96fcfcd
Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger.
2017-03-20 17:39:25 +08:00
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
2017-03-19 15:32:50 +01:00
Andrew
99880f7906
Address unlikely memleak in zimatcopy interface ( #1129 )
...
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
...
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker
ad124a5e8b
Merge pull request #1126 from martin-frbg/pgi
...
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
2017-03-14 17:17:39 +01:00
Martin Kroeker
211d2eceb5
Update zdot.c
2017-03-13 18:08:00 +01:00
Martin Kroeker
5813ed095b
Update zdot.c
2017-03-13 17:49:07 +01:00
Martin Kroeker
e44b028fe5
Replace gnu _real_, _imag_ extensions in initializers
2017-03-13 00:40:11 +01:00
Martin Kroeker
a6efabf155
Replace gnu _real_ , _imag_ extensions in initializers
2017-03-13 00:38:37 +01:00
Martin Kroeker
ea26b00c06
Fix CREAL,CIMAG macros for PGI
2017-03-13 00:36:01 +01:00
Abdurrauf
08786c4b95
strmm and ctrmm
2017-03-13 01:23:16 +04:00
Martin Kroeker
12e476f7a2
Merge pull request #1124 from martin-frbg/c_check-ppc
...
Update c_check.cmake to label ppc64 as power ARCH
2017-03-10 12:58:38 +01:00
Martin Kroeker
8de40955ad
Update c_check.cmake
2017-03-10 11:45:48 +01:00
Martin Kroeker
9b24688eed
Merge pull request #1122 from martin-frbg/zlasyf
...
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 09:51:34 +01:00
Martin Kroeker
43224f7273
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 08:44:49 +01:00
Martin Kroeker
9254a701f3
Merge pull request #1121 from staticfloat/sf/Xsymv_export
...
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-10 08:33:36 +01:00
Elliot Saba
26a614fdd1
Whitespace cleanup/reformatting
2017-03-09 15:30:43 -08:00
Elliot Saba
7ae64f4f9c
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-09 15:22:40 -08:00
Zhang Xianyi
90e02ccf68
Support ARM softfp ABI for sgemm on ARMV7.
...
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Zhang Xianyi
503dcbfde6
Merge branch 'develop' into arm_soft_fp_abi
2017-03-06 13:53:56 +08:00
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
2017-03-06 04:27:40 +04:00
Martin Kroeker
4227049c7d
Merge pull request #1111 from martin-frbg/kaby-no-avx
...
Fix core detection for Kaby Lake without AVX (G4560)
2017-03-02 18:43:59 +01:00
Martin Kroeker
688267edf3
Fix core detection for Kaby Lake without AVX (G4560)
...
Should fix #1109 )
2017-03-02 17:36:16 +01:00
Martin Kroeker
d1fe040d9b
Merge pull request #1110 from quickwritereader/develop
...
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf
411982715c
conventional usage of the register save area
2017-03-01 20:39:39 +04:00
Abdurrauf
e831d6924e
changed to conventional register save area
2017-03-01 03:13:21 +04:00
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
...
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K
67473d09dd
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
2017-02-28 05:31:41 +00:00
Martin Kroeker
f09a9afa03
Merge pull request #1107 from quickwritereader/develop
...
ztrmm(zgemm) complex double precision kernel for ibm z13
2017-02-26 09:49:01 +01:00
Abdurrauf
0d96b0e2a7
Merge branch 'z13' into develop
2017-02-26 06:17:33 +04:00
Abdurrauf
848cb27b1e
ztrmm kernel.
2017-02-26 06:14:12 +04:00
Martin Kroeker
dc34a0da96
Merge pull request #915 from mdong/small_fix_for_icc
...
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Ashwin Sekhar T K
a3935f0dfb
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
2017-02-23 10:02:15 -08:00
Martin Kroeker
47e9fe0bb4
Merge pull request #1105 from martin-frbg/testing-eig-typos
...
TESTING/EIG: fix spurious EXTERNAL references to nonexistent functions
2017-02-22 22:42:52 +01:00
Martin Kroeker
c7bc0ee823
Remove spurious names from EXTERNAL list
...
Remove unused (and nonexistent) functions ZHETRD_SY2SB and ZHETRD_SB2ST from comment and EXTERNAL declaration
2017-02-22 21:48:35 +01:00
Martin Kroeker
6bdee6d50a
Remove spurious names from EXTERNAL list
...
Remove unused (and nonexistent) ZHETRD_SY2SB and ZHETRD_SB2ST
2017-02-22 21:45:27 +01:00
Martin Kroeker
009c0d2e5a
Fix typo in EXTERNAL declaration
...
ZHBTRD_HB2ST should be ZHETRD_HB2ST
2017-02-22 21:41:07 +01:00
Martin Kroeker
4d88e1a4ad
Merge pull request #1104 from martin-frbg/lapack-comma
...
LAPACK: fix missing comma on continued lines
2017-02-22 10:31:39 +01:00
Martin Kroeker
0958b49811
Fix missing comma on continued line
...
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:40:39 +01:00