Commit Graph

3078 Commits

Author SHA1 Message Date
Paul Osmialowski 42bbe74791 build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-25 17:03:20 +01:00
Zhang Xianyi c8322c65e4 Merge pull request #1187 from mine260309/develop
build: fix libxlmass errors building on Power CPU
2017-05-24 15:54:58 +08:00
Lei YU 87dde1fde6
build: fix libxlmass errors building on Power CPU
IBM MASS library is upgraded to 8.1.5 and 8.1.3 is not available.
Update README.md and Makefile.power to use version 8.1.5 of libxlmass.
2017-05-24 14:51:52 +08:00
Martin Kroeker 42466e54fa Merge pull request #1182 from martin-frbg/martin-frbg-patch-1
Build shared library on Android without SONAME versioning
2017-05-10 19:39:09 +02:00
Martin Kroeker 3b0624d50f Build shared library on Android without SONAME versioning
Android does not support versioned SONAME entries, ref. #1173
2017-05-10 13:08:13 +02:00
Martin Kroeker fd4e68128e Merge pull request #1178 from jcowgill/mips-fixes
MIPS threading fixes
2017-05-06 17:20:10 +02:00
Martin Kroeker 6464d1723a Merge pull request #1179 from jcowgill/memory-fixes
Fixes to driver/others/memory.c
2017-05-06 13:08:46 +02:00
James Cowgill 59c97cfee4 memory: Fix buffer overflow when position == NUM_BUFFERS 2017-05-05 17:47:03 +01:00
James Cowgill de7875ca5d mips: remove incorrect blas_lock implementations
MIPS 32-bit currently has an empty blas_lock implementation which is
worse than nothing at all. MIPS 64-bit does has a blas_lock
implementation but is broken. Remove them and fallback to the generic
version in common.h which should do the right thing on MIPS.
2017-05-05 17:28:03 +01:00
James Cowgill 67836c2ab4 mips: implement MB and WMB
The MIPS architecture has weak memory ordering and therefore requires
sutible memory barriers when doing lock free programming with multiple
threads (just like ARM does). This commit implements those barriers for
MIPS and MIPS64 using GCC bultins which is probably easiest way.
2017-05-05 17:14:03 +01:00
James Cowgill 5fecfe0f42 memory: switch loop condition around in blas_memory_free
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.

This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Martin Kroeker bba6676803 Merge pull request #1175 from martin-frbg/lapack_143
Fix workspace computation in LAPACKE ?tpmqrt
2017-05-05 12:00:04 +02:00
Martin Kroeker 5649b2c53a Merge pull request #1176 from staticfloat/sf/dynamic_arch
Fix DYNAMIC_ARCH=1 breaking builds on non-x86 platforms
2017-05-05 11:59:41 +02:00
Elliot Saba 6e972994b2 Force `DYNAMIC_ARCH` to empty when `DYNAMIC_CORE` is not set 2017-05-04 12:55:31 -07:00
Elliot Saba 5b04cf7ab4 Add Makefile debugging trick so that we can inspect runtime Makefile variables 2017-05-04 11:51:29 -07:00
Martin Kroeker d5ea8fd823 Fix workspace computation for side=L
From netlib PR#144
2017-05-04 20:01:41 +02:00
Martin Kroeker 4beffaaa4b Fix workspace computation for side=L
From netlib PR#144
2017-05-04 19:59:02 +02:00
Martin Kroeker fb28e4adc9 Fix workspace computation for side=L
From netlib PR#144
2017-05-04 19:55:02 +02:00
Martin Kroeker 26faa3ca47 Fix workspace allocation in lapacke_ctp for side=L
from netlib PR #144
2017-05-04 19:49:51 +02:00
Martin Kroeker 4f75989634 Merge pull request #1169 from martin-frbg/cblas_xerbla
Add trivial implementation of cblas_xerbla
2017-05-04 19:32:50 +02:00
Martin Kroeker 1e06b49854 Update xerbla.c 2017-04-26 20:29:30 +02:00
Martin Kroeker 7f546f54fa Add cblas_xerbla 2017-04-26 20:01:34 +02:00
Martin Kroeker a809431e34 Add cblas_xerbla() 2017-04-26 19:58:59 +02:00
Martin Kroeker 5ee1cf0223 Merge pull request #1165 from rcoscali/patch-1
README.md update
2017-04-21 15:14:16 +02:00
Rémi Cohen-Scali 9aea7a0d9a Update README.md 2017-04-21 14:18:57 +02:00
Martin Kroeker da0987507c Merge pull request #1164 from sharkcz/s390x
detect CPU on zArch
2017-04-21 10:53:49 +02:00
Dan Horák 81fed55782 detect CPU on zArch 2017-04-20 21:13:41 +02:00
Martin Kroeker 35387edb8d Merge pull request #1160 from gcp/extra-streamroller-cpuid
Add an extra familiy/model combination used by AMD Steamrolller.
2017-04-19 20:03:23 +02:00
Gian-Carlo Pascutto 9c884986ad Add an extra familiy/model combination used by AMD Steamrolller (Godavari). 2017-04-19 19:15:47 +02:00
Martin Kroeker f2f0e98bb5 Merge pull request #1158 from martin-frbg/force-zen
Make FORCE_ZEN option in getarch.c actually set target names to ZEN
2017-04-19 15:04:41 +02:00
Martin Kroeker 166d64eb7c Fix FORCE_ZEN option in getarch.c 2017-04-19 14:20:42 +02:00
Martin Kroeker e078339e8d Merge pull request #1157 from gcp/revert-zen-param
Revert Zen param.h to Haswell values (instead of Excavator).
2017-04-18 13:32:16 +02:00
Gian-Carlo Pascutto 832a272784 Revert Zen param.h to Haswell values (instead of Excavator). 2017-04-18 12:40:25 +02:00
Martin Kroeker 356606314c Merge pull request #1156 from SoapGentoo/cmake-fixes
Use GNUInstallDirs to allow changing target directories
2017-04-18 09:00:24 +02:00
David Seifert ed79a29d87 Use GNUInstallDirs to allow changing target directories
* Multi-lib distributions need to change the libdir
  which is only portably possible with `GNUInstallDirs`.
* Multi-arch distributions such as Debian and Exherbo
  need to be able to change the bindir.
2017-04-16 00:43:47 +02:00
Martin Kroeker 77d16ffc69 Merge pull request #1154 from sharkcz/s390x
add lapack laswp directory for zarch
2017-04-13 16:37:29 +02:00
Dan Horák 56762d5e4c add lapack laswp for zarch 2017-04-13 15:38:59 +02:00
Zhang Xianyi 90dd190a6d Build shared library for Android. 2017-04-11 12:01:18 +08:00
Martin Kroeker ab9ec4ab4e Merge pull request #1148 from gcp/fix-dynamic-zen
Fix dynamic detection for ZEN CPUs.
2017-04-10 20:17:14 +02:00
Gian-Carlo Pascutto 0cbd2d34e4 Recognize ZEN when passed as OPENBLAS_CORETYPE. 2017-04-10 20:05:16 +02:00
Gian-Carlo Pascutto 62979fd104 Fix dynamic detection for ZEN CPUs. 2017-04-10 19:08:37 +02:00
Martin Kroeker 20a413e154 Merge pull request #1142 from amodra/develop
Power8 inline assembly tweaks
2017-04-06 16:20:01 +02:00
Alan Modra dc40bc7368 Power8 inline assembly tweaks
Further fixes on top of 9e2f316ed.  Writing some doco for gcc on
inline assembly woke me up to some more errors.

- dgemv_kernel_4x4 asm did not mention *ap as a memory input, and
  *y is both read and write.
- sasum_kernel_32 and casum_kernel_16 did not use %x for a vsx insn
  operand, a problem if the "=f" sum output was ever allocated a vsx
  reg in the altivec set.  This might be possible with inlining and
  future gcc optimisation.
2017-04-04 23:13:54 +09:30
Martin Kroeker 1acfc78c8f Merge pull request #1140 from JohannesBuchner/develop
Autodetect AMD A8-6410 as BARCELONA
2017-04-03 09:47:09 +02:00
Johannes Buchner b4071d0d16 Autodetect AMD A8-6410 as BARCELONA 2017-04-03 17:07:27 +10:00
Martin Kroeker 7908efafc8 Fix integer overflow in LAPACK DBDSQR, SBDSQR (#1135)
* Fix integer overflow in DBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in SBDSQR

As noted in lapack issue 135, an integer overflow in the calculation of the iteration limit could lead to an immediate return without any iterations having been performed if the input matrix is sufficiently big.

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919

* Fix integer overflow in threshold calculation

Related to lapack issue 135, the threshold calculation can overflow as well as the multiplication is evaluated from left to right.
Without explicit parentheses, the calculation would overflow for N >= 18919
2017-03-24 22:05:22 +01:00
Martin Kroeker 66dc10b019 Merge pull request #1133 from steckdenis/develop
Add ZEN support
2017-03-24 13:47:32 +01:00
Zhang Xianyi b5c96fcfcd Support ARM SOFTFP ABI for saxpy, sdot, snrm2, sscal, sgemv, sger. 2017-03-20 17:39:25 +08:00
Denis Steckelmacher c9ff735da6 Add ZEN support (tested for auto-detected static backend) 2017-03-19 15:32:50 +01:00
Andrew 99880f7906 Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface

* fix only unlikely memleak in zimatcopy interface
2017-03-16 13:13:31 +01:00