Commit Graph

7452 Commits

Author SHA1 Message Date
Dan Horák 81fed55782 detect CPU on zArch 2017-04-20 21:13:41 +02:00
Martin Kroeker 35387edb8d Merge pull request #1160 from gcp/extra-streamroller-cpuid
Add an extra familiy/model combination used by AMD Steamrolller.
2017-04-19 20:03:23 +02:00
Gian-Carlo Pascutto 9c884986ad Add an extra familiy/model combination used by AMD Steamrolller (Godavari). 2017-04-19 19:15:47 +02:00
Martin Kroeker f2f0e98bb5 Merge pull request #1158 from martin-frbg/force-zen
Make FORCE_ZEN option in getarch.c actually set target names to ZEN
2017-04-19 15:04:41 +02:00
Martin Kroeker 166d64eb7c Fix FORCE_ZEN option in getarch.c 2017-04-19 14:20:42 +02:00
Martin Kroeker e078339e8d Merge pull request #1157 from gcp/revert-zen-param
Revert Zen param.h to Haswell values (instead of Excavator).
2017-04-18 13:32:16 +02:00
Gian-Carlo Pascutto 832a272784 Revert Zen param.h to Haswell values (instead of Excavator). 2017-04-18 12:40:25 +02:00
Martin Kroeker 356606314c Merge pull request #1156 from SoapGentoo/cmake-fixes
Use GNUInstallDirs to allow changing target directories
2017-04-18 09:00:24 +02:00
David Seifert ed79a29d87 Use GNUInstallDirs to allow changing target directories
* Multi-lib distributions need to change the libdir
  which is only portably possible with `GNUInstallDirs`.
* Multi-arch distributions such as Debian and Exherbo
  need to be able to change the bindir.
2017-04-16 00:43:47 +02:00
Martin Kroeker 77d16ffc69 Merge pull request #1154 from sharkcz/s390x
add lapack laswp directory for zarch
2017-04-13 16:37:29 +02:00
Dan Horák 56762d5e4c add lapack laswp for zarch 2017-04-13 15:38:59 +02:00
Zhang Xianyi 90dd190a6d Build shared library for Android. 2017-04-11 12:01:18 +08:00
Martin Kroeker ab9ec4ab4e Merge pull request #1148 from gcp/fix-dynamic-zen
Fix dynamic detection for ZEN CPUs.
2017-04-10 20:17:14 +02:00
Gian-Carlo Pascutto 0cbd2d34e4 Recognize ZEN when passed as OPENBLAS_CORETYPE. 2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto 62979fd104 Fix dynamic detection for ZEN CPUs. 2017-04-10 19:08:37 +02:00
Martin Kroeker 20a413e154 Merge pull request #1142 from amodra/develop
Power8 inline assembly tweaks
2017-04-06 16:20:01 +02:00
Alan Modra dc40bc7368 Power8 inline assembly tweaks
Further fixes on top of 9e2f316ed.  Writing some doco for gcc on
inline assembly woke me up to some more errors.

- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
  *y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
  operand, a problem if the "=f" sum output was ever allocated a vsx
  reg in the altivec set.  This might be possible with inlining and
  future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker 1acfc78c8f Merge pull request #1140 from JohannesBuchner/develop
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 09:47:09 +02:00
Johannes Buchner b4071d0d16 Autodetect AMD A8-6410 as BARCELONA 2017-04-03 17:07:27 +10:00
Martin Kroeker 7908efafc8 Fix integer overflow in LAPACK DBDSQR, SBDSQR (#1135)
* Fix integer overflow in DBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in SBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
2017-03-24 22:05:22 +01:00
Martin Kroeker 66dc10b019 Merge pull request #1133 from steckdenis/develop
Add ZEN support
2017-03-24 13:47:32 +01:00
Zhang Xianyi b5c96fcfcd Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger. 2017-03-20 17:39:25 +08:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Andrew 99880f7906 Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00
Martin Kroeker cd135e2b59 Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
2017-03-15 10:00:52 +01:00
Martin Kroeker ad124a5e8b Merge pull request #1126 from martin-frbg/pgi
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
2017-03-14 17:17:39 +01:00
Martin Kroeker 211d2eceb5 Update zdot.c 2017-03-13 18:08:00 +01:00
Martin Kroeker 5813ed095b Update zdot.c 2017-03-13 17:49:07 +01:00
Martin Kroeker e44b028fe5 Replace gnu _real_, _imag_ extensions in initializers 2017-03-13 00:40:11 +01:00
Martin Kroeker a6efabf155 Replace gnu _real_ , _imag_ extensions in initializers 2017-03-13 00:38:37 +01:00
Martin Kroeker ea26b00c06 Fix CREAL,CIMAG macros for PGI 2017-03-13 00:36:01 +01:00
Abdurrauf 08786c4b95 strmm and ctrmm 2017-03-13 01:23:16 +04:00
Martin Kroeker 12e476f7a2 Merge pull request #1124 from martin-frbg/c_check-ppc
Update c_check.cmake to label ppc64 as power ARCH
2017-03-10 12:58:38 +01:00
Martin Kroeker 8de40955ad Update c_check.cmake 2017-03-10 11:45:48 +01:00
Martin Kroeker 9b24688eed Merge pull request #1122 from martin-frbg/zlasyf
Fix misspelling of zlasyf_aa from previous commit
2017-03-10 09:51:34 +01:00
Martin Kroeker 43224f7273 Fix misspelling of zlasyf_aa from previous commit 2017-03-10 08:44:49 +01:00
Martin Kroeker 9254a701f3 Merge pull request #1121 from staticfloat/sf/Xsymv_export
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
2017-03-10 08:33:36 +01:00
Elliot Saba 26a614fdd1 Whitespace cleanup/reformatting 2017-03-09 15:30:43 -08:00
Elliot Saba 7ae64f4f9c Add `csymv` and `zsymv` into `@lapackobjs2` for exporting 2017-03-09 15:22:40 -08:00
Zhang Xianyi 90e02ccf68 Support ARM softfp ABI for sgemm on ARMV7.
make ARM_SOFTFP_ABI=1
2017-03-06 22:16:13 +08:00
Zhang Xianyi 503dcbfde6 Merge branch 'develop' into arm_soft_fp_abi 2017-03-06 13:53:56 +08:00
Abdurrauf 82e80fa82b initial strmm(sgemm). not tuned yet 2017-03-06 04:27:40 +04:00
Martin Kroeker 4227049c7d Merge pull request #1111 from martin-frbg/kaby-no-avx
Fix core detection for Kaby Lake without AVX (G4560)
2017-03-02 18:43:59 +01:00
Martin Kroeker 688267edf3 Fix core detection for Kaby Lake without AVX (G4560)
Should fix #1109)
2017-03-02 17:36:16 +01:00
Martin Kroeker d1fe040d9b Merge pull request #1110 from quickwritereader/develop
Conventional usage of the register save area.
2017-03-01 23:08:07 +01:00
Abdurrauf 411982715c conventional usage of the register save area 2017-03-01 20:39:39 +04:00
Abdurrauf e831d6924e changed to conventional register save area 2017-03-01 03:13:21 +04:00
Martin Kroeker ffc1d6c468 Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
2017-02-28 16:02:19 +01:00
Ashwin Sekhar T K a86474c6f7 THUNDERX2T99: Performance fix for ZGEMM 2017-02-28 06:05:00 -08:00
Ashwin Sekhar T K 67473d09dd THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM 2017-02-28 01:11:38 -08:00