Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker e5e2fbd593
Support building only selected types 2020-09-22 23:21:30 +02:00
Martin Kroeker 3287848c8f
Support building only seleced types 2020-09-22 23:20:51 +02:00
Martin Kroeker 26611af8e1
fix grouping of sources used for more than one type 2020-09-22 23:20:05 +02:00
Martin Kroeker b886bd672b
add defines for building a subset of types 2020-09-22 23:18:55 +02:00
Martin Kroeker 61fae59298
Merge pull request #88 from xianyi/develop
rebase
2020-09-22 23:15:33 +02:00
Martin Kroeker 33d22f99f1
Merge pull request #2851 from martin-frbg/travis-xcode12
Add an OSX build with xcode12
2020-09-22 21:44:55 +02:00
Martin Kroeker 5ba01dd1a8
Add an OSX build with xcode12 2020-09-22 17:26:19 +02:00
Qiyu8 14f7dad3b7 performance improved 2020-09-22 16:52:15 +08:00
y00512012 06cf73a239 fix a bug of trmm 2020-09-22 16:47:10 +08:00
Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 2020-09-22 10:38:35 +08:00
Martin Kroeker 0f112077e6
Merge pull request #2847 from mhillenibm/fixup_cscal
s390x: fix cscal and zscal implementations
2020-09-21 22:22:43 +02:00
Marius Hillenbrand 22aa81f3e5 s390x: fix cscal and zscal implementations
The implementation of complex scalar * vector multiplication for Z14
makes some LAPACK tests fail because the numerical differences to the
reference implementation exceed the threshold (as can be seen by running
make lapack-test and replacing kernel/zarch/cscal.c with a generic
implementation for comparison).

The complex multiplication uses terms of the form a * b + c * d for both
real and imaginary parts. The assembly code (and compiler-emitted code
as well) uses fused multiply add operations for the second product and
sum. The results can be "surprising", for example when both terms in the
imaginary part nearly cancel each other out. In that case, the second
product contributes more digits to the sum than the first product that
has been rounded before.

One option is to use separate multiplications (which then round the same
way) and a distinct add. Change the code to pursue that path, by (1)
requesting the compiler not to contract the operations into FMAs and (2)
replacing the assembly kernel with corresponding vectorized C code
(where change 1 also applies).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 13:10:05 +02:00
Marius Hillenbrand 77ea73f5e5 s390x: for clang use fp-contract=on instead of fast
Make clang slightly more cautious when contracting floating-point
operations (e.g., when applying fused multiply add) by setting
-ffp-contract=on (instead of fast).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Marius Hillenbrand f91057cbad s390x: move common vector definitions and utils into header
... to facilitate reuse beyond gemm_vec.c and avoid code duplication.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-21 11:32:08 +02:00
Martin Kroeker 992d7ca63d
Merge pull request #2845 from martin-frbg/lapack443
Fix workspace query in LAPACK xGELQ (Reference-LAPACK 443)
2020-09-18 23:18:41 +02:00
Martin Kroeker 7e4d5c237c
Fix workspace query in xGELQ (Reference-LAPACK PR443) 2020-09-18 09:19:46 +02:00
Martin Kroeker 8d12027a79
Merge pull request #86 from xianyi/develop
rebase
2020-09-18 09:17:49 +02:00
Martin Kroeker b1e0bcceec
Merge pull request #2844 from RajalakshmiSR/daxpy_p10
Optimize daxpy/zaxpy for POWER10
2020-09-17 23:46:32 +02:00
Rajalakshmi Srinivasaraghavan be43d2cb96 Optimize daxpy/zaxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Tested in simulator and no new failures.
2020-09-17 12:56:28 -05:00
Martin Kroeker 2855e6000c
Merge pull request #2841 from martin-frbg/cpp_gemvtest
Make thread safety tests available to CMAKE and support running only the GEMV version
2020-09-17 17:29:56 +02:00
Martin Kroeker 144a03446d
Merge pull request #2843 from mhillenibm/fixup_merge_dynamic_zarch
s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
2020-09-17 17:28:43 +02:00
Marius Hillenbrand 75d440caa0 s390x/DYNAMIC_ARCH: fixup broken merge and reapply simplification
An unrelated commit and merge inadvertently reverted our recent two
changes for simplifying DYNAMIC_ARCH on s390x. Simply reapply the
changes.

Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Closes: #2842
Fixes: ba644378dc ("Copy BUILD_ options available to the compiler flags"

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-17 17:09:03 +02:00
Martin Kroeker 6abca76c4e
Add option for running only the less demanding GEMV version of the thread safety tests 2020-09-17 13:49:24 +02:00
Martin Kroeker 84c00c3c6e
Support running just the GEMV version of the thread safety test 2020-09-17 13:46:41 +02:00
Martin Kroeker 8c5c991bd7
Add cpp_thread_test options 2020-09-17 13:45:40 +02:00
Martin Kroeker 2e3b15d68b
Add CMakeLists.txt 2020-09-17 13:43:55 +02:00
Martin Kroeker eaf7f825bd
Merge pull request #85 from xianyi/develop
rebase
2020-09-17 13:42:47 +02:00
Martin Kroeker 4c10a1673d
Merge pull request #2840 from martin-frbg/fixup2833
Fix for cmake BUILD_ settings PR 2833
2020-09-16 18:55:50 +02:00
Martin Kroeker c4aeeeb9f4
Activate all BUILD_ options if none was specified 2020-09-15 23:15:34 +02:00
Martin Kroeker 3843bd188c
Merge pull request #84 from xianyi/develop
rebase
2020-09-15 23:13:30 +02:00
Martin Kroeker ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm
Adding performance patch for trmm, just like trsm (#2836)
2020-09-15 21:17:48 +02:00
fossum dfeca46098 Adding performance patch for trmm, just like #2836 2020-09-15 08:59:50 -05:00
Martin Kroeker f8950f40a2
Merge pull request #2836 from austinpagan/gordon_trsm
Fixing a performance bug in trsm_[LR].c.
2020-09-15 11:26:37 +02:00
fossum 274d6e015b Fixing a performance bug in trsm_[LR].c. 2020-09-14 13:10:48 -05:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 1ee1e7b495
Merge pull request #2833 from martin-frbg/issue2830
Make building the tests for individual data types conditional on the respective BUILD option
2020-09-14 07:24:23 +02:00
Martin Kroeker ba644378dc
Copy BUILD_ options available to the compiler flags 2020-09-14 00:03:33 +02:00
Martin Kroeker 9e11c2d62f
Add BUILD_SINGLE etc 2020-09-13 23:55:11 +02:00
Martin Kroeker 4d250d0cdf
Rearrange ifdefs 2020-09-13 23:29:01 +02:00
Martin Kroeker de139337b8
Remove spurious tests for complex ASUM and NRM2 2020-09-13 22:20:41 +02:00
Martin Kroeker ec2948f147
Make tests conditional on BUILD_DOUBLE 2020-09-13 22:17:46 +02:00
Martin Kroeker ce89398636
Make tests for individual variable types conditional on the respective BUILD_ option 2020-09-13 21:52:18 +02:00
Martin Kroeker 593ce9e237
Make building individual tests depend on BUILD_SINGLE etc defines 2020-09-13 21:50:12 +02:00
Martin Kroeker 74e358bcd5
Remove spurious complex16 tests 2020-09-13 21:49:01 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Martin Kroeker 6b52c7e172
Merge pull request #2832 from martin-frbg/issue2831
Fix gfortran detection by vendor matching
2020-09-13 21:20:30 +02:00
Martin Kroeker 746ad3bd19
Fix vendor match for GCC gfortran 2020-09-13 18:40:59 +02:00
Martin Kroeker 55d4d470ec
Merge pull request #83 from xianyi/develop
rebase
2020-09-13 18:30:11 +02:00
Martin Kroeker a270894730
Merge pull request #2829 from mhillenibm/clang_s390x
Fix DYNAMIC_ARCH=1 with clang s390x
2020-09-08 23:36:41 +02:00
Marius Hillenbrand 047b8d7aff Add an s390 build with clang to the Travis configuration
Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.

Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.

LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 20:59:06 +02:00