Commit Graph

3414 Commits

Author SHA1 Message Date
Martin Kroeker e5e47cfdb5 Merge pull request #1220 from ashwinyes/develop_aarch64_20170701_t99_options
arm64: Change mtune/mcpu options for THUNDERX2T99 target
2017-07-01 20:43:23 +02:00
Ashwin Sekhar T K ebf9e9dabe arm64: Change mtune/mcpu options for THUNDERX2T99 target 2017-07-01 11:17:10 -07:00
Ashwin Sekhar T K 83bd547517 arm: add softfp support in kernel/arm/swap_vfp.S 2017-07-01 20:37:40 +05:30
Ashwin Sekhar T K e25f4c01d6 arm: add softfp support in kernel/arm/nrm2_vfp*.S 2017-07-01 19:57:28 +05:30
Ashwin Sekhar T K 54915ce343 arm: add softfp support in kernel/arm/*dot_vfp.S 2017-06-30 23:46:02 +05:30
Ashwin Sekhar T K 0150fabdb6 arm: add softfp support in kernel/arm/rot_vfp.S 2017-06-30 21:52:32 +05:30
Ashwin Sekhar T K 4f0773f07d arm: add softfp support in kernel/arm/axpy_vfp.S 2017-06-30 20:25:59 +05:30
Ashwin Sekhar T K aa5edebc80 arm: add softfp support in kernel/arm/asum_vfp.S 2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K 89924b3d5b arm: Use assembly implementations based on the ARM abi
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.

In case of hard abi, all assembly implementations are used.
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K da7f0ff425 generic: add some generic gemm and trmm kernels
Added generic 4x4 and 4x2 gemm kernels
Added generic 4x2 trmm kernel
2017-06-30 18:21:05 +05:30
Ashwin Sekhar T K 0d5c8e5386 arm: Determine the abi from compiler if not specified on command line
If ARM abi is not explicitly mentioned on the command line, then set the
arm abi to softfp or hard according to the compiler environment.
This assumes that compiler sets the defines __ARM_PCS and __ARM_PCS_VFP
accordingly.
2017-06-30 18:20:59 +05:30
Martin Kroeker 912410f214 Add ReLAPACK to Makefiles 2017-06-28 18:15:21 +02:00
Martin Kroeker b122413fb0 Restore ReLAPACK test folder 2017-06-28 18:13:14 +02:00
Martin Kroeker 9b7b5f7fdc Add Elmar Peise's ReLAPACK 2017-06-28 17:38:41 +02:00
Neil Shipp 34513be726 Add Microsoft Windows 10 UWP build support 2017-06-23 13:07:34 -07:00
Zhang Xianyi 482015f8d6 Merge branch 'arm_soft_fp_abi' into develop 2017-06-23 11:35:25 +08:00
Zhang Xianyi 639000e34f Merge pull request #1211 from neilsh-msft/develop
Add 64bit support for Microsoft Visual Studio
2017-06-23 11:33:09 +08:00
Neil Shipp 5de7727cc7 Reorder dependencies to allow in-place build to succeed the first time. 2017-06-22 18:05:19 -07:00
Neil Shipp 96df4b9b17 Avoid truncating cblas.h when compiling gencblas target 2017-06-22 17:08:09 -07:00
Neil Shipp 29dc8e0c61 Revert changes to sed and awk 2017-06-21 17:49:57 -07:00
Neil Shipp 65e56cb29d Add 64bit support for Microsoft Visual Studio 2017-06-21 13:38:22 -07:00
Matt Brown bd831a03a8 Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:46 +10:00
Matt Brown edc97918f8 Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:35 +10:00
Matt Brown e0034de22d Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:19 +10:00
Matt Brown 32c7fe6bff Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:02:10 +10:00
Matt Brown 19bdf9d52b Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 17:00:07 +10:00
Matt Brown 4f09030fdc Optimise cswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:53 +10:00
Matt Brown 6f4eca5ea4 Optimise sswap for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown be55f96cbd Optimise scopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:59:13 +10:00
Matt Brown 96dd0ef4f7 Optimise ccopy for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
2017-06-14 16:58:59 +10:00
Martin Kroeker 8f0d6c06a9 Fix installation of header files with cmake (#1186)
* Fix installation of header files with cmake 

Install only the required header files, with openblas_config.h preprocessed like in Makefile.install
Fixes #1184

* Update CMakeLists.txt

Escape remaining semicolons in awk argument list (to get it working on Windows as well)

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Add files via upload

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

see if it is the single quotes that cause the problem on windows

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Use C utility instead of awk for header generation in cmake builds

* Update CMakeLists.txt

* Fix generation and installation of header files

Generate openblas_config.h and f77blas.h with same contents as in plain Makefile builds and install only the public header files
2017-06-01 16:36:26 +02:00
Martin Kroeker 410a07cbec Merge pull request #1190 from oviradoi/utest_make_complex
Update test to use openblas_make_complex_float and openblas_make_comp…
2017-06-01 16:35:52 +02:00
Ovidiu Radoi 72f95a0acc Update test to use openblas_make_complex_float and openblas_make_complex_double functions 2017-05-30 12:12:49 +03:00
Martin Kroeker e545b81e76 Merge pull request #1189 from pawosm-arm/flang
build: Flang has the same interface as PGI
2017-05-28 11:07:57 +02:00
Paul Osmialowski d7afdf9137 build: Flang has the same interface as PGI
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-27 06:26:48 +01:00
Martin Kroeker 4f4daaa42a Merge pull request #1188 from pawosm-arm/flang
build: Flang compiler support
2017-05-26 23:02:47 +02:00
Paul Osmialowski 42bbe74791 build: LLVM: Add Flang compiler support and enable OpenMP for Clang
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
2017-05-25 17:03:20 +01:00
Zhang Xianyi c8322c65e4 Merge pull request #1187 from mine260309/develop
build: fix libxlmass errors building on Power CPU
2017-05-24 15:54:58 +08:00
Lei YU 87dde1fde6
build: fix libxlmass errors building on Power CPU
IBM MASS library is upgraded to 8.1.5 and 8.1.3 is not available.
Update README.md and Makefile.power to use version 8.1.5 of libxlmass.
2017-05-24 14:51:52 +08:00
Martin Kroeker 42466e54fa Merge pull request #1182 from martin-frbg/martin-frbg-patch-1
Build shared library on Android without SONAME versioning
2017-05-10 19:39:09 +02:00
Martin Kroeker 3b0624d50f Build shared library on Android without SONAME versioning
Android does not support versioned SONAME entries, ref. #1173
2017-05-10 13:08:13 +02:00
Martin Kroeker fd4e68128e Merge pull request #1178 from jcowgill/mips-fixes
MIPS threading fixes
2017-05-06 17:20:10 +02:00
Martin Kroeker 6464d1723a Merge pull request #1179 from jcowgill/memory-fixes
Fixes to driver/others/memory.c
2017-05-06 13:08:46 +02:00
James Cowgill 59c97cfee4 memory: Fix buffer overflow when position == NUM_BUFFERS 2017-05-05 17:47:03 +01:00
James Cowgill de7875ca5d mips: remove incorrect blas_lock implementations
MIPS 32-bit currently has an empty blas_lock implementation which is
worse than nothing at all. MIPS 64-bit does has a blas_lock
implementation but is broken. Remove them and fallback to the generic
version in common.h which should do the right thing on MIPS.
2017-05-05 17:28:03 +01:00
James Cowgill 67836c2ab4 mips: implement MB and WMB
The MIPS architecture has weak memory ordering and therefore requires
sutible memory barriers when doing lock free programming with multiple
threads (just like ARM does). This commit implements those barriers for
MIPS and MIPS64 using GCC bultins which is probably easiest way.
2017-05-05 17:14:03 +01:00
James Cowgill 5fecfe0f42 memory: switch loop condition around in blas_memory_free
Before this commit, the "position < NUM_BUFFERS" loop condition from
blas_memory_free will be completely optimized away by GCC. This is
because the condition can only be false after undefined behavior has
already been invoked (reading past the end of an array). As a
consequence of this bug, GCC also removes the subsequent if statement
and all the code after the error label because all of it is dead.

This commit switches the loop condition around so it works as intended.
2017-05-05 16:01:58 +01:00
Martin Kroeker bba6676803 Merge pull request #1175 from martin-frbg/lapack_143
Fix workspace computation in LAPACKE ?tpmqrt
2017-05-05 12:00:04 +02:00
Martin Kroeker 5649b2c53a Merge pull request #1176 from staticfloat/sf/dynamic_arch
Fix DYNAMIC_ARCH=1 breaking builds on non-x86 platforms
2017-05-05 11:59:41 +02:00
Elliot Saba 6e972994b2 Force `DYNAMIC_ARCH` to empty when `DYNAMIC_CORE` is not set 2017-05-04 12:55:31 -07:00