OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	c8d05aa7a5	Move the threads overflow flag under the protection of the local blas lock (#3476 ) * Move accesses to the overflow flag into the scope of the blas lock	2021-12-13 08:34:52 +01:00
Rafael Cardoso Fernandes Sousa	214fbcee15	Fix cmake for power	2021-12-09 08:28:17 -06:00
Martin Kroeker	4f057bffd6	Fix NULL pointer checks in blas_memory_alloc	2021-11-05 10:43:17 +01:00
Martin Kroeker	08f8bb66c0	Add CPUIDs for Alder Lake and other recent Intel cpus	2021-11-04 20:36:39 +01:00
Martin Kroeker	efb16fafb0	Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437 ) * return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP	2021-11-04 12:11:16 +01:00
Marius Hillenbrand	77747bc536	cpuid_zarch/hwcaps: add documentation and dump hwcaps in init Add pointers to the definition of the hardware capability flags in glibc and describe how they relate to the levels CPU_Z13 and CPU_Z14 for optimized kernels. To aid identifying available hardware capabilities and in debugging potential build issues, dump their value in dynamic_arch_init() when OPENBLAS_VERBOSE is set to 2 or higher. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2021-10-28 12:08:48 +02:00
Martin Kroeker	22a616bd8f	Add model number for Tiger Lake H (mobile variant)	2021-10-27 22:17:58 +02:00
Marius Hillenbrand	44950ca173	s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time choice in cpuid_zarch use different methods for identifying the supported CPU features. To make cpuid_zarch future-proof and both easier to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH (i.e., derive the supported CPU features from hwcap flags) and share code between both (in a new header cpuid_zarch.h). Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2021-10-26 16:19:14 +02:00
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	2021-10-12 01:30:40 -07:00
Rafael Cardoso Fernandes Sousa	0e8b4adf22	Remove unused commented code (#if directive)	2021-09-15 22:18:48 +00:00
Martin Kroeker	dd09f0173e	Remove extraneous qualifiers from struct definition	2021-09-14 21:52:26 +02:00
Wangyang Guo	045ed5c91d	sbgemm: fix build error in BFLOAT16 disabled	2021-09-07 23:37:08 +08:00
Wangyang Guo	8356a604f0	sbgemm: cooperlake: tuning for block params	2021-09-07 21:30:46 +08:00
Martin Kroeker	cd10d1c03b	Fix typo	2021-08-30 14:38:28 +02:00
Martin Kroeker	2db1a99aca	Clean up debug messages	2021-08-30 14:21:25 +02:00
Martin Kroeker	89fc5b8f4f	Fix unmap logic	2021-08-29 19:50:24 +02:00
Martin Kroeker	7fd12a5e69	Add likely() hints for gcc	2021-08-29 13:54:51 +02:00
Martin Kroeker	2ba9a567aa	Fix typo	2021-08-28 17:14:59 +02:00
Martin Kroeker	b4b952eece	Add auxiliary tracking space for thread buffer frees too	2021-08-28 17:03:53 +02:00
Martin Kroeker	7d1becc575	Allocate an auxiliary struct when running out of preconfigured threads	2021-08-28 14:18:36 +02:00
Martin Kroeker	898212efcd	Actually add the message to the TLS section	2021-08-02 14:50:14 +02:00
Martin Kroeker	210a1584c5	Rebase source and edit TLS version of the message as well	2021-08-02 14:19:16 +02:00
Martin Kroeker	f2a7a67f5a	Improve the "tried to allocate too many buffers" error message	2021-07-31 17:23:40 +02:00
Craig Watson	4d7dfe4845	Include Haiku in processor count checks	2021-07-27 09:00:30 +00:00
JonasZhou	0fca36c8c3	Add cpu detection support for Zhaoxin processors Signed-off-by: JonasZhou <JonasZhou@zhaoxin.com>	2021-07-12 13:43:45 +08:00
River Dillon	2f6326a630	Remove <linux/unistd.h>	2021-07-10 00:36:07 -07:00
Martin Kroeker	8f22ac552b	Add vendor string Shanghai as successor to Centaur	2021-07-08 18:28:49 +02:00
Martin Kroeker	eb2fdd3af0	Recognize newer Zhaoxin/Centaur processors as Nehalem	2021-07-08 12:23:15 +02:00
User User-User	750719528a	bugz	2021-06-20 16:40:43 +02:00
User User-User	6423b282a1	dynamic_arch	2021-06-20 14:19:41 +02:00
Martin Kroeker	cbfd3c87e1	Recognize Intel Ice Lake SP as Cooper Lake	2021-05-14 20:44:06 +02:00
Martin Kroeker	623d580b4c	Restore __volatile__ keyword	2021-04-16 10:27:32 +02:00
Martin Kroeker	186368ddc3	Fix compilation with CLANG	2021-03-16 16:52:57 +01:00
Martin Kroeker	1a3ad4b670	Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64	2021-02-22 19:40:36 +01:00
Peter Hawkins	dbbf92c1d1	Fix race in blas_thread_shutdown. blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault.	2021-02-18 13:46:50 -05:00
Martin Kroeker	cb429d6b12	Merge pull request #3110 from martin-frbg/issue3108 Fix get_num_procs() in the USE_TLS branch for non-glibc systems	2021-02-18 15:45:25 +01:00
Martin Kroeker	b0bded3f2f	Fix get_num_procs() in the USE_TLS branch for non-glibc systems	2021-02-18 11:14:05 +01:00
Martin Kroeker	e4e5042e38	Recognize Intel Tiger Lake as SkylakeX	2021-02-11 20:17:11 +01:00
Martin Kroeker	0cc36770f1	Merge pull request #3073 from xoviat/embedded add embedded option	2021-01-31 18:02:41 +01:00
Martin Kroeker	eea0c0f2ed	Merge pull request #3085 from alexhenrie/memory_alloc Fix null pointer check in blas_memory_alloc	2021-01-26 20:11:42 +01:00
Martin Kroeker	0cb9e9fc8d	Remove the VORTEX support bits again for now	2021-01-25 19:02:21 +01:00
Alex Henrie	113840da12	Fix null pointer check in blas_memory_alloc	2021-01-24 22:20:44 -07:00
Martin Kroeker	deb2e66bcc	Add DYNAMIC_LIST support for ARM64	2021-01-24 23:18:52 +01:00
xoviat	2e8d6e8690	add functions for embedded	2021-01-23 22:12:17 -06:00
Martin Kroeker	b94dab5250	patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel	2021-01-20 21:34:36 +01:00
Martin Kroeker	63fa3c3f8f	Require gcc 11 for builtin_cpu_is(power10) fixes #3074	2021-01-20 15:41:04 +01:00
xoviat	b60de4447a	add cortex-m platform	2021-01-19 08:57:44 -06:00
Martin Kroeker	2c445be8ba	Merge pull request #3051 from martin-frbg/rocketlake Add CPUID information for Intel Rocket Lake	2021-01-14 15:56:25 +01:00
Martin Kroeker	6fe0f1fab9	Label get_cpu_ftr as volatile to keep gcc from rearranging the code	2021-01-11 19:05:29 +01:00
Martin Kroeker	17c16f2a71	Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers	2020-12-19 23:21:22 +01:00
Martin Kroeker	865676682d	Add Intel Rocket Lake	2020-12-14 22:40:23 +01:00
Martin Kroeker	6232237dba	Make fallback from P10 to P9 conditional on suitable compiler	2020-12-11 23:41:17 +01:00
Martin Kroeker	18d8a67485	Merge pull request #2994 from antonblanchard/power10-fixes Power10 fixes	2020-12-11 23:37:30 +01:00
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	2020-12-09 10:28:46 +08:00
Martin Kroeker	bc5b1ddf0d	Merge pull request #3004 from martin-frbg/bsd_getauxval ARM64 DYNAMIC_ARCH build fix for BSD/OSX	2020-11-23 08:35:12 +01:00
Martin Kroeker	e7bf8ced6c	Build fix for systems that do not support getauxval	2020-11-22 20:20:28 +01:00
Martin Kroeker	5fa305172a	Use ifeq instead of ifdef for user-definable options	2020-11-22 16:29:56 +01:00
Alexander Grund	60005eb47b	Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may be called by num_cpu_avail in case the OpenBLAS NUM_THREADS differ from the OMP num threads. This leads to a memory leak which can cause subsequent execution of BLAS kernels to fail. Fixes #2993	2020-11-19 14:51:51 +01:00
Anton Blanchard	043f3d6faa	POWER10: Use POWER9 as a fallback If the toolchain is too old, or the mma features isn't set on a POWER10 fall back to the POWER9 loops.	2020-11-19 21:04:10 +11:00
Martin Kroeker	ff16329cb7	Merge pull request #2972 from xiegengxin/rot-intrinsic Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-08 22:43:00 +01:00
Gengxin Xie	d9ba49165a	Improve the performance of rot by using AVX512 and AVX2 intrinsic	2020-11-05 15:12:36 +08:00
Martin Kroeker	aa21cb5217	Merge pull request #2960 from thrasibule/avx2_detection fix avx2 detection	2020-10-31 20:24:21 +01:00
Guillaume Horel	1f564d729b	fix avx2 detection reword commits to make it clearer	2020-10-31 10:00:48 -04:00
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-10-29 02:08:23 +08:00
Martin Kroeker	2207a16235	Merge pull request #2952 from martin-frbg/issue2931 Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails	2020-10-28 09:37:32 +01:00
Martin Kroeker	b937d78a6d	Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails	2020-10-27 17:51:32 +01:00
Martin Kroeker	fd7da56965	Move definitions that are neither needed nor supported on SUNOS	2020-10-25 12:01:50 +01:00
Martin Kroeker	ff65952e46	Move HAVE_P10_SUPPORT to the build system to be able to include a binutils version check	2020-10-20 00:55:41 +02:00
Martin Kroeker	85154c2e18	Change "HALF" and "sh" to "BFLOAT16" and "sb"	2020-10-12 00:05:05 +02:00
Martin Kroeker	ac653c94f3	Merge branch 'develop' into issue2588-cmake	2020-10-11 13:57:07 +02:00
Martin Kroeker	f032d8966e	Merge pull request #2874 from Flamefire/memory_fixes Avoid out of bounds access on invalid memory free	2020-10-04 15:16:51 +02:00
Martin Kroeker	f6e4cf2f9d	Merge pull request #2876 from Flamefire/omp_fork_fix Lazyly reinit threads after a fork in OMP mode	2020-10-03 22:52:17 +02:00
User User-User	d2333e7842	aarch64 fix std=c18 compilation	2020-10-03 18:00:34 +03:00
Alexander Grund	3094fc6c83	Lazyly reinit threads after a fork in OMP mode This initializes the per-thread memory buffers which get cleared/released on a fork via pthread_at_fork. Not doing so leads to each thread calling blas_memory_alloc on almost every execution which slows down the code significantly as the threads race for the memory allocation using locks to serialize that.	2020-10-01 15:41:42 +02:00
Alexander Grund	3c05f54df8	Avoid out of bounds access on invalid memory free	2020-10-01 10:48:45 +02:00
Alexander Grund	dee7c49938	Fix TABs and trailing space	2020-10-01 10:43:16 +02:00
Martin Kroeker	896bbd55e1	Add support for building only selected variable types	2020-09-26 23:25:55 +02:00
Martin Kroeker	357bff06b5	Add BUILD_vartype defines	2020-09-22 23:24:22 +02:00
Martin Kroeker	91c84e1c01	Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis Add bfloat16 based dot and conversion with single/double	2020-09-14 15:00:19 +02:00
Marius Hillenbrand	a55fe06f25	s390x/DYNAMIC_ARCH: define a HW_CAP flag to support slightly older glibc versions Enable building DYNAMIC_ARCH support with older versions of glibc that do not know about the hwcap flag HWCAP_S390_VXE yet. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Marius Hillenbrand	4f34bcfb5e	s390x/DYNAMIC_ARCH: pass supported arch levels from Makefile to run-time code ... instead of duplicating the (old) mechanism from the Makefile that aimed to derive supported architecture generations from the gcc version. To enable builds with DYNAMIC_ARCH with older compiler releases, the Makefile and drivers/other/dynamic_arch.c need a common view of the architecture support built into the library. We follow the notation from x86 when used with DYNAMIC_LIST, where defines DYN_<ARCH NAME> denote support for a given generation to be built in. Since there are far fewer architecture generations in OpenBLAS for s390x, that does not bloat command lines too much. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-09-08 19:34:18 +02:00
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-09-04 02:31:25 +08:00
Chen, Guobing	0c1c903f1e	Fix OMP num specify issue In current code, no matter what number of threads specified, all available CPU count is used when invoking OMP, which leads to very bad performance if the workload is small while all available CPUs are big. Lots of time are wasted on inter-thread sync. Fix this issue by really using the number specified by the variable 'num' from calling API. Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	2020-08-24 02:45:54 +08:00
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	2020-08-13 06:18:00 +08:00
Martin Kroeker	60cd5e55fc	Protect against inadvertent activation of USE_CUDA	2020-08-01 12:31:39 +02:00
Martin Kroeker	7c02f4b1f7	Merge pull request #2744 from martin-frbg/issue2738 Add AMD Renoir/Matisse cpu autodetection and preliminary support for Zen3	2020-07-28 19:32:04 +02:00
Martin Kroeker	12918358aa	Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2 also support AMD family 22 Jaguar/Puma as Bobcat	2020-07-28 13:53:17 +00:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
Martin Kroeker	09eb9d2584	Update conditional for atomics to HAVE_C11	2020-07-18 17:07:38 +00:00
Martin Kroeker	791e046744	Update conditional for atomics to use HAVE_C11	2020-07-18 17:05:59 +00:00
Martin Kroeker	94bab9d1f9	Update conditional for atomics to use HAVE_C11	2020-07-18 17:03:31 +00:00
Rajalakshmi Srinivasaraghavan	af1e140e35	Change minimum gcc version for POWER10 As the MMA patches for POWER10 are backported to gcc10.2, changing the minimum gcc version needed to build OpenBLAS for POWER10.	2020-07-09 21:46:06 -05:00
Rajalakshmi Srinivasaraghavan	45d819ca82	Changing mcpu option as power10 As compiler enabled mcpu option as power10, changing it from future.	2020-07-07 11:25:20 -05:00
Martin Kroeker	584ef8d4ae	Add support for Comet Lake H & S	2020-06-27 14:36:37 +02:00
Matthew Treinish	f37e941d52	Add support to driver/others/dynamic.c too	2020-06-25 11:56:49 -04:00
User User-User	e6b9275034	address vs2019 C4293	2020-06-24 09:12:23 +03:00
Martin Kroeker	6eaeb01263	Merge pull request #2658 from RajalakshmiSR/p10 powerpc: Add support for future processor	2020-06-23 00:02:37 +02:00
Martin Kroeker	007d9f97d7	Make gotoblas_corename report the name of the selected TARGET rather than its aliases	2020-06-13 19:25:28 +02:00
Rajalakshmi Srinivasaraghavan	9fe930f205	powerpc: Add support for future processor This is the initial patch to support build infrastructure for POWER10 architecture.	2020-06-11 15:47:20 -05:00
Marius Hillenbrand	0dbe61a612	s390x: choose SIMD kernels at run-time based on OS and compiler support Extend and simplify the run-time detection for dynamic architecture support for z to check HW_CAP and only use SIMD features if advertised by the OS. While at it, also honor the env variable LD_HWCAP_MASK and do not use the CPU features masked there. Note that we can only use the SIMD features on z13 or newer (i.e., Vector Facility or Vector-Enhancements Facilities) when the operating system supports properly context-switching the vector registers. The OS advertises that support as a bit in the HW_CAP value in the auxiliary vector. While all recent Linux kernels have that support, we should maintain compatibility with older versions that may still be in use. Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>	2020-05-12 11:01:16 +02:00

1 2 3 4 5 ...

501 Commits