Commit Graph

5177 Commits

Author SHA1 Message Date
Marius Hillenbrand 77ea73f5e5 s390x: for clang use fp-contract=on instead of fast
Make clang slightly more cautious when contracting floating-point
operations (e.g., when applying fused multiply add) by setting
-ffp-contract=on (instead of fast).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Marius Hillenbrand f91057cbad s390x: move common vector definitions and utils into header
... to facilitate reuse beyond gemm_vec.c and avoid code duplication.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Martin Kroeker 992d7ca63d
Merge pull request #2845 from martin-frbg/lapack443
Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)
2020-09-18 23:18:41 +02:00
Martin Kroeker 7e4d5c237c
Fix workspace query in xGELQ (Reference-LAPACK PR443) 2020-09-18 09:19:46 +02:00
Martin Kroeker 8d12027a79
Merge pull request #86 from xianyi/develop
rebase
2020-09-18 09:17:49 +02:00
Martin Kroeker b1e0bcceec
Merge pull request #2844 from RajalakshmiSR/daxpy_p10
Optimize daxpy/zaxpy for POWER10
2020-09-17 23:46:32 +02:00
Rajalakshmi Srinivasaraghavan be43d2cb96 Optimize daxpy/zaxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-17 12:56:28 -05:00
Martin Kroeker 2855e6000c
Merge pull request #2841 from martin-frbg/cpp_gemvtest
Make thread safety tests available to CMAKE and support running only the GEMV version
2020-09-17 17:29:56 +02:00
Martin Kroeker 144a03446d
Merge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch
s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
2020-09-17 17:28:43 +02:00
Marius Hillenbrand 75d440caa0 s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
An unrelated commit and merge inadvertently reverted our recent two
changes for simplifying DYNAMIC_ARCH on s390x. Simply reapply the
changes.

Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Closes: #2842
Fixes: ba644378dc ("Copy BUILD_ options available to the compiler flags"

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-17 17:09:03 +02:00
Martin Kroeker 6abca76c4e
Add option for running only the less demanding GEMV version of the thread safety tests 2020-09-17 13:49:24 +02:00
Martin Kroeker 84c00c3c6e
Support running just the GEMV version of the thread safety test 2020-09-17 13:46:41 +02:00
Martin Kroeker 8c5c991bd7
Add cpp_thread_test options 2020-09-17 13:45:40 +02:00
Martin Kroeker 2e3b15d68b
Add CMakeLists.txt 2020-09-17 13:43:55 +02:00
Martin Kroeker eaf7f825bd
Merge pull request #85 from xianyi/develop
rebase
2020-09-17 13:42:47 +02:00
Martin Kroeker 4c10a1673d
Merge pull request #2840 from martin-frbg/fixup2833
Fix for cmake BUILD_ settings PR 2833
2020-09-16 18:55:50 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 3843bd188c
Merge pull request #84 from xianyi/develop
rebase
2020-09-15 23:13:30 +02:00
Martin Kroeker ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm
Adding performance patch for trmm, just like trsm (#2836)
2020-09-15 21:17:48 +02:00
fossum dfeca46098 Adding performance patch for trmm, just like #2836 2020-09-15 08:59:50 -05:00
Martin Kroeker f8950f40a2
Merge pull request #2836 from austinpagan/gordon_trsm
Fixing a performance bug in trsm_[LR].c.
2020-09-15 11:26:37 +02:00
fossum 274d6e015b Fixing a performance bug in trsm_[LR].c. 2020-09-14 13:10:48 -05:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 1ee1e7b495
Merge pull request #2833 from martin-frbg/issue2830
Make building the tests for individual data types conditional on the respective BUILD option
2020-09-14 07:24:23 +02:00
Martin Kroeker ba644378dc
Copy BUILD_ options available to the compiler flags 2020-09-14 00:03:33 +02:00
Martin Kroeker 9e11c2d62f
Add BUILD_SINGLE etc 2020-09-13 23:55:11 +02:00
Martin Kroeker 4d250d0cdf
Rearrange ifdefs 2020-09-13 23:29:01 +02:00
Martin Kroeker de139337b8
Remove spurious tests for complex ASUM and NRM2 2020-09-13 22:20:41 +02:00
Martin Kroeker ec2948f147
Make tests conditional on BUILD_DOUBLE 2020-09-13 22:17:46 +02:00
Martin Kroeker ce89398636
Make tests for individual variable types conditional on the respective BUILD_ option 2020-09-13 21:52:18 +02:00
Martin Kroeker 593ce9e237
Make building individual tests depend on BUILD_SINGLE etc defines 2020-09-13 21:50:12 +02:00
Martin Kroeker 74e358bcd5
Remove spurious complex16 tests 2020-09-13 21:49:01 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Martin Kroeker 6b52c7e172
Merge pull request #2832 from martin-frbg/issue2831
Fix gfortran detection by vendor matching
2020-09-13 21:20:30 +02:00
Martin Kroeker 746ad3bd19
Fix vendor match for GCC gfortran 2020-09-13 18:40:59 +02:00
Martin Kroeker 55d4d470ec
Merge pull request #83 from xianyi/develop
rebase
2020-09-13 18:30:11 +02:00
Martin Kroeker a270894730
Merge pull request #2829 from mhillenibm/clang_s390x
Fix DYNAMIC_ARCH=1 with clang s390x
2020-09-08 23:36:41 +02:00
Marius Hillenbrand 047b8d7aff Add an s390 build with clang to the Travis configuration
Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.

Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.

LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 20:59:06 +02:00
Marius Hillenbrand f7731a358a Update CONTRIBUTERS.md - clang build fixes for IBM z
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand a55fe06f25 s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand 4f34bcfb5e s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand 0629d8ebdb s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang
Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Martin Kroeker 15da2f9acb
Merge pull request #2828 from martin-frbg/lapack438
Correct xLASET arguments in LAPACK EIG tests
2020-09-08 10:25:19 +02:00
Martin Kroeker 7d9c77f421
Correct dimension argument to xLASET
from Reference-LAPACK PR 438
2020-09-07 22:03:46 +02:00
Martin Kroeker c8f029a518
Merge pull request #82 from xianyi/develop
rebase
2020-09-07 21:59:13 +02:00
Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
2020-09-06 18:32:15 +02:00
Martin Kroeker 6e0f6c5f00
Merge pull request #2824 from martin-frbg/asumbench
Use POSIX2001 clock.gettime in asum benchmark if available
2020-09-06 10:05:47 +02:00
Martin Kroeker 6f8fad87c5
Use POSIX2001 clock.gettime for higher resolution 2020-09-05 19:44:01 +02:00
Martin Kroeker ed0f2d3dd7
Merge pull request #2816 from martin-frbg/silicon
Add basic support for Apple Vortex (ARM64) cpu
2020-09-05 19:17:59 +02:00
Martin Kroeker 43a31b7786
Merge pull request #2823 from martin-frbg/fix2778
Improve fix for lapack-test EIG/cchkhb2stg from PR 2778
2020-09-05 17:29:38 +02:00