OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	eaf7f825bd	Merge pull request #85 from xianyi/develop rebase	2020-09-17 13:42:47 +02:00
Martin Kroeker	4c10a1673d	Merge pull request #2840 from martin-frbg/fixup2833 Fix for cmake BUILD_ settings PR 2833	2020-09-16 18:55:50 +02:00
Martin Kroeker	c4aeeeb9f4	Activate all BUILD_ options if none was specified	2020-09-15 23:15:34 +02:00
Martin Kroeker	3843bd188c	Merge pull request #84 from xianyi/develop rebase	2020-09-15 23:13:30 +02:00
Martin Kroeker	ddec244a5a	Merge pull request #2838 from austinpagan/gordon_trmm Adding performance patch for trmm, just like trsm (#2836)	2020-09-15 21:17:48 +02:00
fossum	dfeca46098	Adding performance patch for trmm, just like #2836	2020-09-15 08:59:50 -05:00
Martin Kroeker	f8950f40a2	Merge pull request #2836 from austinpagan/gordon_trsm Fixing a performance bug in trsm_[LR].c.	2020-09-15 11:26:37 +02:00
fossum	274d6e015b	Fixing a performance bug in trsm_[LR].c.	2020-09-14 13:10:48 -05:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Martin Kroeker	1ee1e7b495	Merge pull request #2833 from martin-frbg/issue2830 Make building the tests for individual data types conditional on the respective BUILD option	2020-09-14 07:24:23 +02:00
Martin Kroeker	ba644378dc	Copy BUILD_ options available to the compiler flags	2020-09-14 00:03:33 +02:00
Martin Kroeker	9e11c2d62f	Add BUILD_SINGLE etc	2020-09-13 23:55:11 +02:00
Martin Kroeker	4d250d0cdf	Rearrange ifdefs	2020-09-13 23:29:01 +02:00
Martin Kroeker	de139337b8	Remove spurious tests for complex ASUM and NRM2	2020-09-13 22:20:41 +02:00
Martin Kroeker	ec2948f147	Make tests conditional on BUILD_DOUBLE	2020-09-13 22:17:46 +02:00
Martin Kroeker	ce89398636	Make tests for individual variable types conditional on the respective BUILD_ option	2020-09-13 21:52:18 +02:00
Martin Kroeker	593ce9e237	Make building individual tests depend on BUILD_SINGLE etc defines	2020-09-13 21:50:12 +02:00
Martin Kroeker	74e358bcd5	Remove spurious complex16 tests	2020-09-13 21:49:01 +02:00
Martin Kroeker	26792d2096	Copy BUILD_* directives to the compiler options to allow ifdef in tests	2020-09-13 21:47:55 +02:00
Martin Kroeker	6b52c7e172	Merge pull request #2832 from martin-frbg/issue2831 Fix gfortran detection by vendor matching	2020-09-13 21:20:30 +02:00
Martin Kroeker	746ad3bd19	Fix vendor match for GCC gfortran	2020-09-13 18:40:59 +02:00
Martin Kroeker	55d4d470ec	Merge pull request #83 from xianyi/develop rebase	2020-09-13 18:30:11 +02:00
Martin Kroeker	a270894730	Merge pull request #2829 from mhillenibm/clang_s390x Fix DYNAMIC_ARCH=1 with clang s390x	2020-09-08 23:36:41 +02:00
Marius Hillenbrand	047b8d7aff	Add an s390 build with clang to the Travis configuration Since clang builds have been fixed on s390x, including support for DYNAMIC_ARCH, cover that build type in Travis. Explicitly request Ubuntu 20.04 (codename focal) to get a recent LLVM/clang version 10.x and thereby cover all s390x architecture generations supported in OpenBLAS. Ubuntu 18.10's LLVM/clang 6.x cannot build the inline assembly in some of the Z13 and Z14 kernels. LLVM/clang currently does not support OpenMP on s390x, so disable that in the build. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 20:59:06 +02:00
Marius Hillenbrand	f7731a358a	Update CONTRIBUTERS.md - clang build fixes for IBM z Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	a55fe06f25	s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Enable building DYNAMIC_ARCH support with older versions of glibc that do not know about the hwcap flag HWCAP_S390_VXE yet. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	4f34bcfb5e	s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code ... instead of duplicating the (old) mechanism from the Makefile that aimed to derive supported architecture generations from the gcc version. To enable builds with DYNAMIC_ARCH with older compiler releases, the Makefile and drivers/other/dynamic_arch.c need a common view of the architecture support built into the library. We follow the notation from x86 when used with DYNAMIC_LIST, where defines DYN_<ARCH NAME> denote support for a given generation to be built in. Since there are far fewer architecture generations in OpenBLAS for s390x, that does not bloat command lines too much. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	0629d8ebdb	s390x/DYNAMIC_ARCH: generalize detecting supported archs for clang Simplify detection of which kernels we can compile on s390x. Instead of decoding the gcc version in a complicated manner, just check if CC supports a given -march=archXY flag. Together with the next patch, we thereby gain support for builds with LLVM/clang with DYNAMIC_ARCH=1. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Martin Kroeker	15da2f9acb	Merge pull request #2828 from martin-frbg/lapack438 Correct xLASET arguments in LAPACK EIG tests	2020-09-08 10:25:19 +02:00
Martin Kroeker	7d9c77f421	Correct dimension argument to xLASET from Reference-LAPACK PR 438	2020-09-07 22:03:46 +02:00
Martin Kroeker	c8f029a518	Merge pull request #82 from xianyi/develop rebase	2020-09-07 21:59:13 +02:00
Martin Kroeker	e72430fe46	Merge pull request #2803 from xiegengxin/AVX2-asum Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic	2020-09-06 18:32:15 +02:00
Martin Kroeker	6e0f6c5f00	Merge pull request #2824 from martin-frbg/asumbench Use POSIX2001 clock.gettime in asum benchmark if available	2020-09-06 10:05:47 +02:00
Martin Kroeker	6f8fad87c5	Use POSIX2001 clock.gettime for higher resolution	2020-09-05 19:44:01 +02:00
Martin Kroeker	ed0f2d3dd7	Merge pull request #2816 from martin-frbg/silicon Add basic support for Apple Vortex (ARM64) cpu	2020-09-05 19:17:59 +02:00
Martin Kroeker	43a31b7786	Merge pull request #2823 from martin-frbg/fix2778 Improve fix for lapack-test EIG/cchkhb2stg from PR 2778	2020-09-05 17:29:38 +02:00
Martin Kroeker	8a2a137a9e	Correct argument to SLASET (Improves fix from PR2778) as explained by serguei-patchkovskii in Reference-LAPACK/lapack#438 (comment) , passing in an index of 1 instead of N leads to a standards violation accessing matrix A in SLASET, i.e. undefined behavior	2020-09-05 13:06:31 +02:00
Martin Kroeker	0d1f30a297	Merge pull request #81 from xianyi/develop rebase	2020-09-05 12:47:03 +02:00
Martin Kroeker	70a254d507	Merge pull request #2822 from martin-frbg/issue2821 Fix potential domain error in sqrt	2020-09-05 12:39:32 +02:00
Martin Kroeker	330044d821	Fix potentiol domain error in sqrt	2020-09-05 09:44:33 +02:00
Martin Kroeker	97636b2c8a	Merge pull request #2819 from h-vetinari/carry_lapack_437 Carry lapack#437	2020-09-04 23:50:43 +02:00
Martin Kroeker	4d36711547	Merge pull request #2820 from RajalakshmiSR/clang POWER9: Fix mcpu option with clang	2020-09-04 23:09:31 +02:00
Rajalakshmi Srinivasaraghavan	718f67421a	POWER9: Fix mcpu option with clang Adding check for compiler type before checking GCC version in Makefile. This allows clang to use power9 instead of power8 when CORE is POWER9.	2020-09-04 10:36:19 -05:00
H. Vetinari	3426519ae2	adapt ?ggsv?-functions to ambient code style in LAPACKE/include/lapack.h	2020-09-04 17:33:24 +02:00
H. Vetinari	1c6c71fa85	Follow-up to lapack#434 & lapack#409: add missing 'const' in signatures Based on how the surrounding functions in lapack.h are handling the parameters, particularly the ?ggsv?3-variants of the affected functions	2020-09-04 17:33:11 +02:00
H. Vetinari	860247b5da	Follow-up to lapack#434 & lapack#409: fix signature mismatches	2020-09-04 17:32:53 +02:00
Martin Kroeker	c61771e335	Merge pull request #2778 from martin-frbg/lapackeig Fix various wrong calls to SLASET/DLASET in the EIG part of the LAPACK testsuite	2020-09-04 10:06:02 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Martin Kroeker	c7ef7174e4	Merge pull request #2817 from martin-frbg/lapack436 LAPACKE: fix declaration of work arrays in [cz]gesvdq	2020-09-03 17:10:23 +02:00
Martin Kroeker	775a87242d	Rename KERNEL.SILICON to KERNEL.VORTEX	2020-09-03 08:44:20 +02:00

... 18 19 20 21 22 ...

5876 Commits All Branches Search

5876 Commits

All Branches