Commit Graph

5621 Commits

Author SHA1 Message Date
fossum dfeca46098 Adding performance patch for trmm, just like #2836 2020-09-15 08:59:50 -05:00
Martin Kroeker f8950f40a2
Merge pull request #2836 from austinpagan/gordon_trsm
Fixing a performance bug in trsm_[LR].c.
2020-09-15 11:26:37 +02:00
fossum 274d6e015b Fixing a performance bug in trsm_[LR].c. 2020-09-14 13:10:48 -05:00
Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis
Add bfloat16 based dot and conversion with single/double
2020-09-14 15:00:19 +02:00
Martin Kroeker 1ee1e7b495
Merge pull request #2833 from martin-frbg/issue2830
Make building the tests for individual data types conditional on the respective BUILD option
2020-09-14 07:24:23 +02:00
Martin Kroeker ba644378dc
Copy BUILD_ options available to the compiler flags 2020-09-14 00:03:33 +02:00
Martin Kroeker 9e11c2d62f
Add BUILD_SINGLE etc 2020-09-13 23:55:11 +02:00
Martin Kroeker 4d250d0cdf
Rearrange ifdefs 2020-09-13 23:29:01 +02:00
Martin Kroeker de139337b8
Remove spurious tests for complex ASUM and NRM2 2020-09-13 22:20:41 +02:00
Martin Kroeker ec2948f147
Make tests conditional on BUILD_DOUBLE 2020-09-13 22:17:46 +02:00
Martin Kroeker ce89398636
Make tests for individual variable types conditional on the respective BUILD_ option 2020-09-13 21:52:18 +02:00
Martin Kroeker 593ce9e237
Make building individual tests depend on BUILD_SINGLE etc defines 2020-09-13 21:50:12 +02:00
Martin Kroeker 74e358bcd5
Remove spurious complex16 tests 2020-09-13 21:49:01 +02:00
Martin Kroeker 26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests 2020-09-13 21:47:55 +02:00
Martin Kroeker 6b52c7e172
Merge pull request #2832 from martin-frbg/issue2831
Fix gfortran detection by vendor matching
2020-09-13 21:20:30 +02:00
Martin Kroeker 746ad3bd19
Fix vendor match for GCC gfortran 2020-09-13 18:40:59 +02:00
Martin Kroeker 55d4d470ec
Merge pull request #83 from xianyi/develop
rebase
2020-09-13 18:30:11 +02:00
Martin Kroeker a270894730
Merge pull request #2829 from mhillenibm/clang_s390x
Fix DYNAMIC_ARCH=1 with clang s390x
2020-09-08 23:36:41 +02:00
Marius Hillenbrand 047b8d7aff Add an s390 build with clang to the Travis configuration
Since clang builds have been fixed on s390x, including support for
DYNAMIC_ARCH, cover that build type in Travis.

Explicitly request Ubuntu 20.04 (codename focal) to get a recent
LLVM/clang version 10.x and thereby cover all s390x architecture
generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot
build the inline assembly in some of the Z13 and Z14 kernels.

LLVM/clang currently does not support OpenMP on s390x, so disable that
in the build.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 20:59:06 +02:00
Marius Hillenbrand f7731a358a Update CONTRIBUTERS.md - clang build fixes for IBM z
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand a55fe06f25 s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions
Enable building DYNAMIC_ARCH support with older versions of glibc that
do not know about the hwcap flag HWCAP_S390_VXE yet.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand 4f34bcfb5e s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code
... instead of duplicating the (old) mechanism from the Makefile that
aimed to derive supported architecture generations from the gcc
version.

To enable builds with DYNAMIC_ARCH with older compiler releases, the
Makefile and drivers/other/dynamic_arch.c need a common view of the
architecture support built into the library.

We follow the notation from x86 when used with DYNAMIC_LIST, where
defines DYN_<ARCH NAME> denote support for a given generation to be
built in. Since there are far fewer architecture generations in OpenBLAS
for s390x, that does not bloat command lines too much.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Marius Hillenbrand 0629d8ebdb s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang
Simplify detection of which kernels we can compile on s390x. Instead of
decoding the gcc version in a complicated manner, just check if CC
supports a given -march=archXY flag. Together with the next patch, we
thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2020-09-08 19:34:18 +02:00
Martin Kroeker 15da2f9acb
Merge pull request #2828 from martin-frbg/lapack438
Correct xLASET arguments in LAPACK EIG tests
2020-09-08 10:25:19 +02:00
Martin Kroeker 7d9c77f421
Correct dimension argument to xLASET
from Reference-LAPACK PR 438
2020-09-07 22:03:46 +02:00
Martin Kroeker c8f029a518
Merge pull request #82 from xianyi/develop
rebase
2020-09-07 21:59:13 +02:00
Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum
Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic
2020-09-06 18:32:15 +02:00
Martin Kroeker 6e0f6c5f00
Merge pull request #2824 from martin-frbg/asumbench
Use POSIX2001 clock.gettime in asum benchmark if available
2020-09-06 10:05:47 +02:00
Martin Kroeker 6f8fad87c5
Use POSIX2001 clock.gettime for higher resolution 2020-09-05 19:44:01 +02:00
Martin Kroeker ed0f2d3dd7
Merge pull request #2816 from martin-frbg/silicon
Add basic support for Apple Vortex (ARM64) cpu
2020-09-05 19:17:59 +02:00
Martin Kroeker 43a31b7786
Merge pull request #2823 from martin-frbg/fix2778
Improve fix for lapack-test EIG/cchkhb2stg from PR 2778
2020-09-05 17:29:38 +02:00
Martin Kroeker 8a2a137a9e
Correct argument to SLASET (Improves fix from PR2778)
as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior
2020-09-05 13:06:31 +02:00
Martin Kroeker 0d1f30a297
Merge pull request #81 from xianyi/develop
rebase
2020-09-05 12:47:03 +02:00
Martin Kroeker 70a254d507
Merge pull request #2822 from martin-frbg/issue2821
Fix potential domain error in sqrt
2020-09-05 12:39:32 +02:00
Martin Kroeker 330044d821
Fix potentiol domain error in sqrt 2020-09-05 09:44:33 +02:00
Martin Kroeker 97636b2c8a
Merge pull request #2819 from h-vetinari/carry_lapack_437
Carry lapack#437
2020-09-04 23:50:43 +02:00
Martin Kroeker 4d36711547
Merge pull request #2820 from RajalakshmiSR/clang
POWER9: Fix mcpu option with clang
2020-09-04 23:09:31 +02:00
Rajalakshmi Srinivasaraghavan 718f67421a POWER9: Fix mcpu option with clang
Adding check for compiler type before checking GCC version in Makefile.
This allows clang to use power9 instead of power8 when CORE is POWER9.
2020-09-04 10:36:19 -05:00
H. Vetinari 3426519ae2 adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h 2020-09-04 17:33:24 +02:00
H. Vetinari 1c6c71fa85 Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures
Based on how the surrounding functions in lapack.h are handling the
parameters, particularly the ?ggsv?3-variants of the affected functions
2020-09-04 17:33:11 +02:00
H. Vetinari 860247b5da Follow-up to lapack#434 & lapack#409: fix signature mismatches 2020-09-04 17:32:53 +02:00
Martin Kroeker c61771e335
Merge pull request #2778 from martin-frbg/lapackeig
Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite
2020-09-04 10:06:02 +02:00
Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double
1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-09-04 02:31:25 +08:00
Martin Kroeker c7ef7174e4
Merge pull request #2817 from martin-frbg/lapack436
LAPACKE: fix declaration of work arrays in [cz]gesvdq
2020-09-03 17:10:23 +02:00
Martin Kroeker 775a87242d
Rename KERNEL.SILICON to KERNEL.VORTEX 2020-09-03 08:44:20 +02:00
Martin Kroeker af5bc95503
Rename SILICON to VORTEX and fix duplicate numbering 2020-09-03 08:43:26 +02:00
Martin Kroeker ea3a58c844
Rename SILICON to VORTEX 2020-09-03 08:38:53 +02:00
Martin Kroeker 17dca035de
rename SILICON to VORTEX 2020-09-03 08:38:08 +02:00
Gengxin Xie 1b0f17eeed align to 64, using SSE when input size is small 2020-09-03 14:25:54 +08:00
Martin Kroeker c31b72965e
Fix data type of work array in zgesvdq prototype 2020-09-02 23:44:44 +02:00