186368ddc3 
								
							 
						 
						
							
							
								
								Fix compilation with CLANG  
							
							
							
						 
						
							2021-03-16 16:52:57 +01:00  
				
					
						
							
							
								 
						
							
								1a3ad4b670 
								
							 
						 
						
							
							
								
								Fix signatures of the TLS-mode dll_callback and p_process_term functions for Win64  
							
							
							
						 
						
							2021-02-22 19:40:36 +01:00  
				
					
						
							
							
								 
						
							
								dbbf92c1d1 
								
							 
						 
						
							
							
								
								Fix race in blas_thread_shutdown.  
							
							... 
							
							
							
							blas_server_avail was read without holding server_lock. If multiple threads call blas_thread_shutdown simultaneously, for example, by calling fork(), then they can attempt to shut down multiple times. This can lead to a segmentation fault. 
							
						 
						
							2021-02-18 13:46:50 -05:00  
				
					
						
							
							
								 
						
							
								cb429d6b12 
								
							 
						 
						
							
							
								
								Merge pull request  #3110  from martin-frbg/issue3108  
							
							... 
							
							
							
							Fix get_num_procs()  in the USE_TLS branch for non-glibc systems 
							
						 
						
							2021-02-18 15:45:25 +01:00  
				
					
						
							
							
								 
						
							
								b0bded3f2f 
								
							 
						 
						
							
							
								
								Fix get_num_procs()  in the USE_TLS branch for non-glibc systems  
							
							
							
						 
						
							2021-02-18 11:14:05 +01:00  
				
					
						
							
							
								 
						
							
								e4e5042e38 
								
							 
						 
						
							
							
								
								Recognize Intel Tiger Lake as SkylakeX  
							
							
							
						 
						
							2021-02-11 20:17:11 +01:00  
				
					
						
							
							
								 
						
							
								0cc36770f1 
								
							 
						 
						
							
							
								
								Merge pull request  #3073  from xoviat/embedded  
							
							... 
							
							
							
							add embedded option 
							
						 
						
							2021-01-31 18:02:41 +01:00  
				
					
						
							
							
								 
						
							
								eea0c0f2ed 
								
							 
						 
						
							
							
								
								Merge pull request  #3085  from alexhenrie/memory_alloc  
							
							... 
							
							
							
							Fix null pointer check in blas_memory_alloc 
							
						 
						
							2021-01-26 20:11:42 +01:00  
				
					
						
							
							
								 
						
							
								0cb9e9fc8d 
								
							 
						 
						
							
							
								
								Remove the VORTEX support bits again for now  
							
							
							
						 
						
							2021-01-25 19:02:21 +01:00  
				
					
						
							
							
								 
						
							
								113840da12 
								
							 
						 
						
							
							
								
								Fix null pointer check in blas_memory_alloc  
							
							
							
						 
						
							2021-01-24 22:20:44 -07:00  
				
					
						
							
							
								 
						
							
								deb2e66bcc 
								
							 
						 
						
							
							
								
								Add DYNAMIC_LIST support for ARM64  
							
							
							
						 
						
							2021-01-24 23:18:52 +01:00  
				
					
						
							
							
								 
						
							
								2e8d6e8690 
								
							 
						 
						
							
							
								
								add functions for embedded  
							
							
							
						 
						
							2021-01-23 22:12:17 -06:00  
				
					
						
							
							
								 
						
							
								b94dab5250 
								
							 
						 
						
							
							
								
								patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel  
							
							
							
						 
						
							2021-01-20 21:34:36 +01:00  
				
					
						
							
							
								 
						
							
								63fa3c3f8f 
								
							 
						 
						
							
							
								
								Require gcc 11 for builtin_cpu_is(power10)  
							
							... 
							
							
							
							fixes  #3074  
						
							2021-01-20 15:41:04 +01:00  
				
					
						
							
							
								 
						
							
								b60de4447a 
								
							 
						 
						
							
							
								
								add cortex-m platform  
							
							
							
						 
						
							2021-01-19 08:57:44 -06:00  
				
					
						
							
							
								 
						
							
								2c445be8ba 
								
							 
						 
						
							
							
								
								Merge pull request  #3051  from martin-frbg/rocketlake  
							
							... 
							
							
							
							Add CPUID information for Intel Rocket Lake 
							
						 
						
							2021-01-14 15:56:25 +01:00  
				
					
						
							
							
								 
						
							
								6fe0f1fab9 
								
							 
						 
						
							
							
								
								Label get_cpu_ftr as volatile to keep gcc from rearranging the code  
							
							
							
						 
						
							2021-01-11 19:05:29 +01:00  
				
					
						
							
							
								 
						
							
								17c16f2a71 
								
							 
						 
						
							
							
								
								Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers  
							
							
							
						 
						
							2020-12-19 23:21:22 +01:00  
				
					
						
							
							
								 
						
							
								865676682d 
								
							 
						 
						
							
							
								
								Add Intel Rocket Lake  
							
							
							
						 
						
							2020-12-14 22:40:23 +01:00  
				
					
						
							
							
								 
						
							
								6232237dba 
								
							 
						 
						
							
							
								
								Make fallback from P10 to P9 conditional on suitable compiler  
							
							
							
						 
						
							2020-12-11 23:41:17 +01:00  
				
					
						
							
							
								 
						
							
								18d8a67485 
								
							 
						 
						
							
							
								
								Merge pull request  #2994  from antonblanchard/power10-fixes  
							
							... 
							
							
							
							Power10 fixes 
							
						 
						
							2020-12-11 23:37:30 +01:00  
				
					
						
							
							
								 
						
							
								83de62c20d 
								
							 
						 
						
							
							
								
								Merge pull request  #3026  from martin-frbg/revert747  
							
							... 
							
							
							
							Revert PR747 - SYRK parameter changes for Haswell and related targets 
							
						 
						
							2020-12-10 16:29:41 +01:00  
				
					
						
							
							
								 
						
							
								4b548857d6 
								
							 
						 
						
							
							
								
								Add msa support for loongson  
							
							... 
							
							
							
							1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1 
							
						 
						
							2020-12-09 10:28:46 +08:00  
				
					
						
							
							
								 
						
							
								a554712439 
								
							 
						 
						
							
							
								
								remove extra/intermediate size step for min_jj introduced in PR747  
							
							
							
						 
						
							2020-12-08 21:01:36 +01:00  
				
					
						
							
							
								 
						
							
								5d26223f4a 
								
							 
						 
						
							
							
								
								remove extra/intermediate size step of min_jj from PR747  
							
							
							
						 
						
							2020-12-08 20:59:56 +01:00  
				
					
						
							
							
								 
						
							
								bc5b1ddf0d 
								
							 
						 
						
							
							
								
								Merge pull request  #3004  from martin-frbg/bsd_getauxval  
							
							... 
							
							
							
							ARM64 DYNAMIC_ARCH build fix for BSD/OSX 
							
						 
						
							2020-11-23 08:35:12 +01:00  
				
					
						
							
							
								 
						
							
								e7bf8ced6c 
								
							 
						 
						
							
							
								
								Build fix for systems that do not support getauxval  
							
							
							
						 
						
							2020-11-22 20:20:28 +01:00  
				
					
						
							
							
								 
						
							
								5fa305172a 
								
							 
						 
						
							
							
								
								Use ifeq instead of ifdef for user-definable options  
							
							
							
						 
						
							2020-11-22 16:29:56 +01:00  
				
					
						
							
							
								 
						
							
								d3ff1f889f 
								
							 
						 
						
							
							
								
								Convert ifndefs to ifneq  
							
							
							
						 
						
							2020-11-22 16:27:17 +01:00  
				
					
						
							
							
								 
						
							
								60005eb47b 
								
							 
						 
						
							
							
								
								Don't overwrite blas_thread_buffer if already set  
							
							... 
							
							
							
							After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.
Fixes  #2993  
							
						 
						
							2020-11-19 14:51:51 +01:00  
				
					
						
							
							
								 
						
							
								043f3d6faa 
								
							 
						 
						
							
							
								
								POWER10: Use POWER9 as a fallback  
							
							... 
							
							
							
							If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops. 
							
						 
						
							2020-11-19 21:04:10 +11:00  
				
					
						
							
							
								 
						
							
								ff16329cb7 
								
							 
						 
						
							
							
								
								Merge pull request  #2972  from xiegengxin/rot-intrinsic  
							
							... 
							
							
							
							Improve the performance of rot by using AVX512 and AVX2 intrinsic 
							
						 
						
							2020-11-08 22:43:00 +01:00  
				
					
						
							
							
								 
						
							
								d9ba49165a 
								
							 
						 
						
							
							
								
								Improve the performance of rot by using AVX512 and AVX2 intrinsic  
							
							
							
						 
						
							2020-11-05 15:12:36 +08:00  
				
					
						
							
							
								 
						
							
								aa21cb5217 
								
							 
						 
						
							
							
								
								Merge pull request  #2960  from thrasibule/avx2_detection  
							
							... 
							
							
							
							fix avx2 detection 
							
						 
						
							2020-10-31 20:24:21 +01:00  
				
					
						
							
							
								 
						
							
								1f564d729b 
								
							 
						 
						
							
							
								
								fix avx2 detection  
							
							... 
							
							
							
							reword commits to make it clearer 
							
						 
						
							2020-10-31 10:00:48 -04:00  
				
					
						
							
							
								 
						
							
								a7b1f9b1bb 
								
							 
						 
						
							
							
								
								Implementation of BF16 based gemv  
							
							... 
							
							
							
							1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com> 
							
						 
						
							2020-10-29 02:08:23 +08:00  
				
					
						
							
							
								 
						
							
								2207a16235 
								
							 
						 
						
							
							
								
								Merge pull request  #2952  from martin-frbg/issue2931  
							
							... 
							
							
							
							Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails 
							
						 
						
							2020-10-28 09:37:32 +01:00  
				
					
						
							
							
								 
						
							
								b937d78a6d 
								
							 
						 
						
							
							
								
								Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails  
							
							
							
						 
						
							2020-10-27 17:51:32 +01:00  
				
					
						
							
							
								 
						
							
								fd7da56965 
								
							 
						 
						
							
							
								
								Move definitions that are neither needed nor supported on SUNOS  
							
							
							
						 
						
							2020-10-25 12:01:50 +01:00  
				
					
						
							
							
								 
						
							
								ff65952e46 
								
							 
						 
						
							
							
								
								Move HAVE_P10_SUPPORT to the build system  
							
							... 
							
							
							
							to be able to include a binutils version check 
							
						 
						
							2020-10-20 00:55:41 +02:00  
				
					
						
							
							
								 
						
							
								b5d30b390d 
								
							 
						 
						
							
							
								
								Fix build issues with bfloat16  
							
							... 
							
							
							
							This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16. 
							
						 
						
							2020-10-13 11:00:22 -05:00  
				
					
						
							
							
								 
						
							
								006c7f6671 
								
							 
						 
						
							
							
								
								Change "HALF" and "sh" to "BFLOAT16" and "sb"  
							
							
							
						 
						
							2020-10-12 00:06:06 +02:00  
				
					
						
							
							
								 
						
							
								85154c2e18 
								
							 
						 
						
							
							
								
								Change "HALF" and "sh" to "BFLOAT16" and "sb"  
							
							
							
						 
						
							2020-10-12 00:05:05 +02:00  
				
					
						
							
							
								 
						
							
								887e00fd7f 
								
							 
						 
						
							
							
								
								Adapt for supporting only a subset of variable types  
							
							
							
						 
						
							2020-10-11 14:58:57 +02:00  
				
					
						
							
							
								 
						
							
								886a8e3190 
								
							 
						 
						
							
							
								
								Adapt for supporting only a subset of variable types  
							
							
							
						 
						
							2020-10-11 14:57:32 +02:00  
				
					
						
							
							
								 
						
							
								ac653c94f3 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into issue2588-cmake  
							
							
							
						 
						
							2020-10-11 13:57:07 +02:00  
				
					
						
							
							
								 
						
							
								f032d8966e 
								
							 
						 
						
							
							
								
								Merge pull request  #2874  from Flamefire/memory_fixes  
							
							... 
							
							
							
							Avoid out of bounds access on invalid memory free 
							
						 
						
							2020-10-04 15:16:51 +02:00  
				
					
						
							
							
								 
						
							
								f6e4cf2f9d 
								
							 
						 
						
							
							
								
								Merge pull request  #2876  from Flamefire/omp_fork_fix  
							
							... 
							
							
							
							Lazyly reinit threads after a fork in OMP mode 
							
						 
						
							2020-10-03 22:52:17 +02:00  
				
					
						
							
							
								 
						
							
								d2333e7842 
								
							 
						 
						
							
							
								
								aarch64 fix std=c18 compilation  
							
							
							
						 
						
							2020-10-03 18:00:34 +03:00  
				
					
						
							
							
								 
						
							
								3094fc6c83 
								
							 
						 
						
							
							
								
								Lazyly reinit threads after a fork in OMP mode  
							
							... 
							
							
							
							This initializes the per-thread memory buffers which get
cleared/released on a fork via pthread_at_fork. Not doing so leads to
each thread calling blas_memory_alloc on almost every execution which
slows down the code significantly as the threads race for the memory
allocation using locks to serialize that. 
							
						 
						
							2020-10-01 15:41:42 +02:00