OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Marius Hillenbrand	f7731a358a	Update CONTRIBUTERS.md - clang build fixes for IBM z Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	a55fe06f25	s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Enable building DYNAMIC_ARCH support with older versions of glibc that do not know about the hwcap flag HWCAP_S390_VXE yet. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	4f34bcfb5e	s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code ... instead of duplicating the (old) mechanism from the Makefile that aimed to derive supported architecture generations from the gcc version. To enable builds with DYNAMIC_ARCH with older compiler releases, the Makefile and drivers/other/dynamic_arch.c need a common view of the architecture support built into the library. We follow the notation from x86 when used with DYNAMIC_LIST, where defines DYN_<ARCH NAME> denote support for a given generation to be built in. Since there are far fewer architecture generations in OpenBLAS for s390x, that does not bloat command lines too much. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	0629d8ebdb	s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang Simplify detection of which kernels we can compile on s390x. Instead of decoding the gcc version in a complicated manner, just check if CC supports a given -march=archXY flag. Together with the next patch, we thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Martin Kroeker	15da2f9acb	Merge pull request #2828 from martin-frbg/lapack438 Correct xLASET arguments in LAPACK EIG tests	2020-09-08 10:25:19 +02:00
Martin Kroeker	7d9c77f421	Correct dimension argument to xLASET from Reference-LAPACK PR 438	2020-09-07 22:03:46 +02:00
Martin Kroeker	c8f029a518	Merge pull request #82 from xianyi/develop rebase	2020-09-07 21:59:13 +02:00
Martin Kroeker	e72430fe46	Merge pull request #2803 from xiegengxin/AVX2-asum Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	2020-09-06 18:32:15 +02:00
Martin Kroeker	6e0f6c5f00	Merge pull request #2824 from martin-frbg/asumbench Use POSIX2001 clock.gettime in asum benchmark if available	2020-09-06 10:05:47 +02:00
Martin Kroeker	6f8fad87c5	Use POSIX2001 clock.gettime for higher resolution	2020-09-05 19:44:01 +02:00
Martin Kroeker	ed0f2d3dd7	Merge pull request #2816 from martin-frbg/silicon Add basic support for Apple Vortex (ARM64) cpu	2020-09-05 19:17:59 +02:00
Martin Kroeker	43a31b7786	Merge pull request #2823 from martin-frbg/fix2778 Improve fix for lapack-test EIG/cchkhb2stg from PR 2778	2020-09-05 17:29:38 +02:00
Martin Kroeker	8a2a137a9e	Correct argument to SLASET (Improves fix from PR2778) as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior	2020-09-05 13:06:31 +02:00
Martin Kroeker	0d1f30a297	Merge pull request #81 from xianyi/develop rebase	2020-09-05 12:47:03 +02:00
Martin Kroeker	70a254d507	Merge pull request #2822 from martin-frbg/issue2821 Fix potential domain error in sqrt	2020-09-05 12:39:32 +02:00
Martin Kroeker	330044d821	Fix potentiol domain error in sqrt	2020-09-05 09:44:33 +02:00
Martin Kroeker	97636b2c8a	Merge pull request #2819 from h-vetinari/carry_lapack_437 Carry lapack#437	2020-09-04 23:50:43 +02:00
Martin Kroeker	4d36711547	Merge pull request #2820 from RajalakshmiSR/clang POWER9: Fix mcpu option with clang	2020-09-04 23:09:31 +02:00
Rajalakshmi Srinivasaraghavan	718f67421a	POWER9: Fix mcpu option with clang Adding check for compiler type before checking GCC version in Makefile. This allows clang to use power9 instead of power8 when CORE is POWER9.	2020-09-04 10:36:19 -05:00
H. Vetinari	3426519ae2	adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h	2020-09-04 17:33:24 +02:00
H. Vetinari	1c6c71fa85	Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures Based on how the surrounding functions in lapack.h are handling the parameters, particularly the ?ggsv?3-variants of the affected functions	2020-09-04 17:33:11 +02:00
H. Vetinari	860247b5da	Follow-up to lapack#434 & lapack#409: fix signature mismatches	2020-09-04 17:32:53 +02:00
Martin Kroeker	c61771e335	Merge pull request #2778 from martin-frbg/lapackeig Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite	2020-09-04 10:06:02 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	c7ef7174e4	Merge pull request #2817 from martin-frbg/lapack436 LAPACKE: fix declaration of work arrays in [cz]gesvdq	2020-09-03 17:10:23 +02:00
Martin Kroeker	775a87242d	Rename KERNEL.SILICON to KERNEL.VORTEX	2020-09-03 08:44:20 +02:00
Martin Kroeker	af5bc95503	Rename SILICON to VORTEX and fix duplicate numbering	2020-09-03 08:43:26 +02:00
Martin Kroeker	ea3a58c844	Rename SILICON to VORTEX	2020-09-03 08:38:53 +02:00
Martin Kroeker	17dca035de	rename SILICON to VORTEX	2020-09-03 08:38:08 +02:00
Gengxin Xie	1b0f17eeed	align to 64, using SSE when input size is small	2020-09-03 14:25:54 +08:00
Martin Kroeker	c31b72965e	Fix data type of work array in zgesvdq prototype	2020-09-02 23:44:44 +02:00
Martin Kroeker	0ce2aa3163	Fix data type of rwork array	2020-09-02 23:41:51 +02:00
Martin Kroeker	80794fe8fd	Create KERNEL.SILICON	2020-09-02 22:56:58 +02:00
Martin Kroeker	4a4d1ca6e0	Add AppleSIlicon cpu	2020-09-02 22:52:12 +02:00
Martin Kroeker	b37d17382a	Add Apple Silicon	2020-09-02 22:48:49 +02:00
Martin Kroeker	029fd01cfb	Detect AppleSilicon cpu on OSX	2020-09-02 22:47:38 +02:00
Martin Kroeker	9d1ea75aa0	Merge pull request #80 from xianyi/develop rebase	2020-09-02 22:16:41 +02:00
Martin Kroeker	776d005f4c	Merge pull request #2815 from mhillenibm/clang_s390x Fix build with clang on s390x	2020-09-02 16:56:01 +02:00
Marius Hillenbrand	2ee5b899ce	s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses explicit unrolling and interleaving to improve performance. The code employs an empty inline asm statement with operands that constrain the compiler's instruction scheduling and thereby enforce proper overlapping of load and compute phases. Fix an ifdef to apply that for clang builds, as well. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	095f4e6964	s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior) gcc's default setting for floating-point expression contraction is "fast", which allows the compiler to emit fused multiply adds instead of separate multiplies and adds (amongst others). Fused multiply-adds, which assembly kernels typically apply, also bring a significant performance advantage to the C implementation for matrix-matrix multiplication on s390x. To enable that performance advantage for builds with clang, add -ffp-contract=fast to the compiler options. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	87e5bbd887	s390x: avoid variable-length arrays in struct for asm operands ... since it is not required and clang does not support that gcc extension. Instead, use a variable-length array directly for these operands. Note that, while the actual inline assembly code does not directly use these memory operands, they serve to inform the compiler that it cannot reorder reads or writes to/from the input and output data across the inline asm statements. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:31 +02:00
Marius Hillenbrand	b9b3265ec8	s390x: avoid inline assembly for vector loads for clang ... since clang does not support the instruction format for inline assembly and also it is not required for current versions of clang. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Marius Hillenbrand	a1616a0b86	s390x: replace nop with "nop 0" in inline assembly ... as a bandaid for building with clang until LLVM's internal assembler supports nops without operand. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Marius Hillenbrand	60ef193258	s390x: use "lghi" for immediate values to fix build with clang Some of the kernels written in assembly utilize a "load address" instruction for loading an immediate value into a register. That is both unnecessarily complex and LLVM's assembler does not understand that specific syntax. Thus, replace with the appropriate "load immediate" instruction, which is also clearer to read. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-02 13:49:30 +02:00
Martin Kroeker	18bfb6d6f7	Merge pull request #2813 from martin-frbg/issue2804-2 Fix for c_check misinterpreting arm64 in uname -m output as armv7	2020-09-01 23:39:46 +02:00
Martin Kroeker	e4900caa11	Fix c_check misinterpreting arm64 in uname output to mean armv7 additionla fix for upcoming OSX on ARM64 related to #2804, as suggested by fxcoudert in #2805	2020-09-01 19:54:08 +02:00
Martin Kroeker	68b1713c30	Merge pull request #2811 from martin-frbg/issue2806 Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well	2020-09-01 17:19:14 +02:00
Martin Kroeker	4074770d00	Merge pull request #2797 from martin-frbg/relafixes1 ReLAPACK fixes	2020-09-01 16:04:03 +02:00
Martin Kroeker	b87a77da02	Merge pull request #79 from xianyi/develop rebase	2020-09-01 12:03:53 +02:00
Martin Kroeker	f42e84d46c	Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808 ) * Fix misnaming of LAPACK_?ggsvp and ?ggsvd function prototypes as LAPACKE_ * Drop the LAPACKE matrix_layout parameter from the argument lists, change ints to pointers and add missing work arguments.	2020-09-01 10:44:48 +02:00

... 50 51 52 53 54 ...

7452 Commits All Branches Search

7452 Commits

All Branches