Commit Graph

3390 Commits

Author SHA1 Message Date
Martin Kroeker ab9ec4ab4e Merge pull request #1148 from gcp/fix-dynamic-zen
Fix dynamic detection for ZEN CPUs.
2017-04-10 20:17:14 +02:00
Gian-Carlo Pascutto 0cbd2d34e4 Recognize ZEN when passed as OPENBLAS_CORETYPE. 2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto 62979fd104 Fix dynamic detection for ZEN CPUs. 2017-04-10 19:08:37 +02:00
Martin Kroeker 20a413e154 Merge pull request #1142 from amodra/develop
Power8 inline assembly tweaks
2017-04-06 16:20:01 +02:00
Alan Modra dc40bc7368 Power8 inline assembly tweaks
Further fixes on top of 9e2f316ed.  Writing some doco for gcc on
inline assembly woke me up to some more errors.

- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
  *y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
  operand, a problem if the "=f" sum output was ever allocated a vsx
  reg in the altivec set.  This might be possible with inlining and
  future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker 1acfc78c8f Merge pull request #1140 from JohannesBuchner/develop
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 09:47:09 +02:00
Johannes Buchner b4071d0d16 Autodetect AMD A8-6410 as BARCELONA 2017-04-03 17:07:27 +10:00
Martin Kroeker 7908efafc8 Fix integer overflow in LAPACK DBDSQR, SBDSQR (#1135)
* Fix integer overflow in DBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in SBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
2017-03-24 22:05:22 +01:00
Martin Kroeker 66dc10b019 Merge pull request #1133 from steckdenis/develop
Add ZEN support
2017-03-24 13:47:32 +01:00
Zhang Xianyi b5c96fcfcd Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger. 2017-03-20 17:39:25 +08:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Andrew 99880f7906 Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker cd135e2b59 Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker ad124a5e8b Merge pull request #1126 from martin-frbg/pgi
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
2017-03-14 17:17:39 +01:00
Martin Kroeker 211d2eceb5 Update zdot.c 2017-03-13 18:08:00 +01:00
Martin Kroeker 5813ed095b Update zdot.c 2017-03-13 17:49:07 +01:00
Martin Kroeker e44b028fe5 Replace gnu _real_, _imag_ extensions in initializers 2017-03-13 00:40:11 +01:00
Martin Kroeker a6efabf155 Replace gnu _real_ , _imag_ extensions in initializers 2017-03-13 00:38:37 +01:00
Martin Kroeker ea26b00c06 Fix CREAL,CIMAG macros for PGI 2017-03-13 00:36:01 +01:00
Abdurrauf 08786c4b95 strmm and ctrmm 2017-03-13 01:23:16 +04:00
Martin Kroeker 12e476f7a2 Merge pull request #1124 from martin-frbg/c_check-ppc
Update c_check.cmake to label ppc64 as power ARCH
2017-03-10 12:58:38 +01:00
Martin Kroeker 8de40955ad Update c_check.cmake 2017-03-10 11:45:48 +01:00
Martin Kroeker 9b24688eed Merge pull request #1122 from martin-frbg/zlasyf
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 09:51:34 +01:00
Martin Kroeker 43224f7273 Fix misspelling of zlasyf_aa from previous commit 2017-03-10 08:44:49 +01:00
Martin Kroeker 9254a701f3 Merge pull request #1121 from staticfloat/sf/Xsymv_export
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-10 08:33:36 +01:00
Elliot Saba 26a614fdd1 Whitespace cleanup/reformatting 2017-03-09 15:30:43 -08:00
Elliot Saba 7ae64f4f9c Add `csymv` and `zsymv` into `@lapackobjs2` for exporting 2017-03-09 15:22:40 -08:00
Zhang Xianyi 90e02ccf68 Support ARM softfp ABI for sgemm on ARMV7.
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Zhang Xianyi 503dcbfde6 Merge branch 'develop' into arm_soft_fp_abi 2017-03-06 13:53:56 +08:00
Abdurrauf 82e80fa82b initial strmm(sgemm). not tuned yet 2017-03-06 04:27:40 +04:00
Martin Kroeker 4227049c7d Merge pull request #1111 from martin-frbg/kaby-no-avx
Fix core detection for Kaby Lake without AVX (G4560)
2017-03-02 18:43:59 +01:00
Martin Kroeker 688267edf3 Fix core detection for Kaby Lake without AVX (G4560)
Should fix #1109)
2017-03-02 17:36:16 +01:00
Martin Kroeker d1fe040d9b Merge pull request #1110 from quickwritereader/develop
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf 411982715c conventional usage of the register save area 2017-03-01 20:39:39 +04:00
Abdurrauf e831d6924e changed to conventional register save area 2017-03-01 03:13:21 +04:00
Martin Kroeker ffc1d6c468 Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K a86474c6f7 THUNDERX2T99: Performance fix for ZGEMM 2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K 67473d09dd THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM 2017-02-28 01:11:38 -08:00
Ashwin Sekhar T K 19ba133383 THUNDERX2T99: Add Optimized ZGEMM Implementation 2017-02-28 05:31:41 +00:00
Martin Kroeker f09a9afa03 Merge pull request #1107 from quickwritereader/develop
ztrmm(zgemm) complex double precision kernel for ibm z13
2017-02-26 09:49:01 +01:00
Abdurrauf 0d96b0e2a7 Merge branch 'z13' into develop 2017-02-26 06:17:33 +04:00
Abdurrauf 848cb27b1e ztrmm kernel. 2017-02-26 06:14:12 +04:00
Martin Kroeker dc34a0da96 Merge pull request #915 from mdong/small_fix_for_icc
remove input from clobbered list
2017-02-23 20:00:22 +01:00
Ashwin Sekhar T K a3935f0dfb THUNDERX2T99: Add Optimized D/Z NRM2 Implementation 2017-02-23 10:02:15 -08:00
Martin Kroeker 47e9fe0bb4 Merge pull request #1105 from martin-frbg/testing-eig-typos
TESTING/EIG: fix spurious EXTERNAL references to nonexistent functions
2017-02-22 22:42:52 +01:00
Martin Kroeker c7bc0ee823 Remove spurious names from EXTERNAL list
Remove unused (and nonexistent) functions ZHETRD_SY2SB and ZHETRD_SB2ST from comment and EXTERNAL declaration
2017-02-22 21:48:35 +01:00
Martin Kroeker 6bdee6d50a Remove spurious names from EXTERNAL list
Remove unused (and nonexistent) ZHETRD_SY2SB and ZHETRD_SB2ST
2017-02-22 21:45:27 +01:00
Martin Kroeker 009c0d2e5a Fix typo in EXTERNAL declaration
ZHBTRD_HB2ST  should be ZHETRD_HB2ST
2017-02-22 21:41:07 +01:00
Martin Kroeker 4d88e1a4ad Merge pull request #1104 from martin-frbg/lapack-comma
LAPACK: fix missing comma on continued lines
2017-02-22 10:31:39 +01:00
Martin Kroeker 0958b49811 Fix missing comma on continued line
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
2017-02-22 08:40:39 +01:00