OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	186368ddc3	Fix compilation with CLANG	2021-03-16 16:52:57 +01:00
Martin Kroeker	1a3ad4b670	Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64	2021-02-22 19:40:36 +01:00
Peter Hawkins	dbbf92c1d1	Fix race in blas_thread_shutdown. blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.	2021-02-18 13:46:50 -05:00
Martin Kroeker	cb429d6b12	Merge pull request #3110 from martin-frbg/issue3108 Fix get_num_procs() in the USE_TLS branch for non-glibc systems	2021-02-18 15:45:25 +01:00
Martin Kroeker	b0bded3f2f	Fix get_num_procs() in the USE_TLS branch for non-glibc systems	2021-02-18 11:14:05 +01:00
Martin Kroeker	e4e5042e38	Recognize Intel Tiger Lake as SkylakeX	2021-02-11 20:17:11 +01:00
Martin Kroeker	0cc36770f1	Merge pull request #3073 from xoviat/embedded add embedded option	2021-01-31 18:02:41 +01:00
Martin Kroeker	eea0c0f2ed	Merge pull request #3085 from alexhenrie/memory_alloc Fix null pointer check in blas_memory_alloc	2021-01-26 20:11:42 +01:00
Martin Kroeker	0cb9e9fc8d	Remove the VORTEX support bits again for now	2021-01-25 19:02:21 +01:00
Alex Henrie	113840da12	Fix null pointer check in blas_memory_alloc	2021-01-24 22:20:44 -07:00
Martin Kroeker	deb2e66bcc	Add DYNAMIC_LIST support for ARM64	2021-01-24 23:18:52 +01:00
xoviat	2e8d6e8690	add functions for embedded	2021-01-23 22:12:17 -06:00
Martin Kroeker	b94dab5250	patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel	2021-01-20 21:34:36 +01:00
Martin Kroeker	63fa3c3f8f	Require gcc 11 for builtin_cpu_is(power10) fixes #3074	2021-01-20 15:41:04 +01:00
xoviat	b60de4447a	add cortex-m platform	2021-01-19 08:57:44 -06:00
Martin Kroeker	2c445be8ba	Merge pull request #3051 from martin-frbg/rocketlake Add CPUID information for Intel Rocket Lake	2021-01-14 15:56:25 +01:00
Martin Kroeker	6fe0f1fab9	Label get_cpu_ftr as volatile to keep gcc from rearranging the code	2021-01-11 19:05:29 +01:00
Martin Kroeker	17c16f2a71	Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers	2020-12-19 23:21:22 +01:00
Martin Kroeker	865676682d	Add Intel Rocket Lake	2020-12-14 22:40:23 +01:00
Martin Kroeker	6232237dba	Make fallback from P10 to P9 conditional on suitable compiler	2020-12-11 23:41:17 +01:00
Martin Kroeker	18d8a67485	Merge pull request #2994 from antonblanchard/power10-fixes Power10 fixes	2020-12-11 23:37:30 +01:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	bc5b1ddf0d	Merge pull request #3004 from martin-frbg/bsd_getauxval ARM64 DYNAMIC_ARCH build fix for BSD/OSX	2020-11-23 08:35:12 +01:00
Martin Kroeker	e7bf8ced6c	Build fix for systems that do not support getauxval	2020-11-22 20:20:28 +01:00
Martin Kroeker	5fa305172a	Use ifeq instead of ifdef for user-definable options	2020-11-22 16:29:56 +01:00
Alexander Grund	60005eb47b	Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may be called by num_cpu_avail in case the OpenBLAS NUM_THREADS differ from the OMP num threads. This leads to a memory leak which can cause subsequent execution of BLAS kernels to fail. Fixes #2993	2020-11-19 14:51:51 +01:00
Anton Blanchard	043f3d6faa	POWER10: Use POWER9 as a fallback If the toolchain is too old, or the mma features isn't set on a POWER10 fall back to the POWER9 loops.	2020-11-19 21:04:10 +11:00
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-08 22:43:00 +01:00
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-05 15:12:36 +08:00
Martin Kroeker	aa21cb5217	Merge pull request #2960 from thrasibule/avx2_detection fix avx2 detection	2020-10-31 20:24:21 +01:00
Guillaume Horel	1f564d729b	fix avx2 detection reword commits to make it clearer	2020-10-31 10:00:48 -04:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	2207a16235	Merge pull request #2952 from martin-frbg/issue2931 Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails	2020-10-28 09:37:32 +01:00
Martin Kroeker	b937d78a6d	Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails	2020-10-27 17:51:32 +01:00
Martin Kroeker	fd7da56965	Move definitions that are neither needed nor supported on SUNOS	2020-10-25 12:01:50 +01:00
Martin Kroeker	ff65952e46	Move HAVE_P10_SUPPORT to the build system to be able to include a binutils version check	2020-10-20 00:55:41 +02:00
Martin Kroeker	85154c2e18	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:05:05 +02:00
Martin Kroeker	ac653c94f3	Merge branch 'develop' into issue2588-cmake	2020-10-11 13:57:07 +02:00
Martin Kroeker	f032d8966e	Merge pull request #2874 from Flamefire/memory_fixes Avoid out of bounds access on invalid memory free	2020-10-04 15:16:51 +02:00
Martin Kroeker	f6e4cf2f9d	Merge pull request #2876 from Flamefire/omp_fork_fix Lazyly reinit threads after a fork in OMP mode	2020-10-03 22:52:17 +02:00
User User-User	d2333e7842	aarch64 fix std=c18 compilation	2020-10-03 18:00:34 +03:00
Alexander Grund	3094fc6c83	Lazyly reinit threads after a fork in OMP mode This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that.	2020-10-01 15:41:42 +02:00
Alexander Grund	3c05f54df8	Avoid out of bounds access on invalid memory free	2020-10-01 10:48:45 +02:00
Alexander Grund	dee7c49938	Fix TABs and trailing space	2020-10-01 10:43:16 +02:00
Martin Kroeker	896bbd55e1	Add support for building only selected variable types	2020-09-26 23:25:55 +02:00
Martin Kroeker	357bff06b5	Add BUILD_vartype defines	2020-09-22 23:24:22 +02:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Marius Hillenbrand	a55fe06f25	s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Enable building DYNAMIC_ARCH support with older versions of glibc that do not know about the hwcap flag HWCAP_S390_VXE yet. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	4f34bcfb5e	s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code ... instead of duplicating the (old) mechanism from the Makefile that aimed to derive supported architecture generations from the gcc version. To enable builds with DYNAMIC_ARCH with older compiler releases, the Makefile and drivers/other/dynamic_arch.c need a common view of the architecture support built into the library. We follow the notation from x86 when used with DYNAMIC_LIST, where defines DYN_<ARCH NAME> denote support for a given generation to be built in. Since there are far fewer architecture generations in OpenBLAS for s390x, that does not bloat command lines too much. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00

1 2 3 4 5 ...

419 Commits