b5ba95a6c0 
								
							 
						 
						
							
							
								
								Modernize obsolete inline order  
							
							
							
						 
						
							2023-08-16 00:48:40 +02:00  
				
					
						
							
							
								 
						
							
								9773a9d6b3 
								
							 
						 
						
							
							
								
								undefine YIELDING for the Emscripten js converter  
							
							
							
						 
						
							2022-09-14 17:04:11 +02:00  
				
					
						
							
							
								 
						
							
								84a5f0e2eb 
								
							 
						 
						
							
							
								
								Fixes   #3743 .  
							
							
							
						 
						
							2022-08-26 11:44:11 +02:00  
				
					
						
							
							
								 
						
							
								bc93f468ef 
								
							 
						 
						
							
							
								
								Add Elbrus E2000 architecture as generic x86_64 compatible  
							
							
							
						 
						
							2022-01-22 18:53:38 +01:00  
				
					
						
							
							
								 
						
							
								af0a69f355 
								
							 
						 
						
							
							
								
								Add support for LOONGARCH64  
							
							
							
						 
						
							2021-07-27 15:29:12 +08:00  
				
					
						
							
							
								 
						
							
								53ee0b76bb 
								
							 
						 
						
							
							
								
								x86: Enable Intel CET  
							
							... 
							
							
							
							When Intel CET is enabled, we need to include <cet.h> in assembly codes
to mark Intel CET support and place _CET_ENDBR at the function entry. 
							
						 
						
							2021-04-30 19:45:39 -07:00  
				
					
						
							
							
								 
						
							
								b60de4447a 
								
							 
						 
						
							
							
								
								add cortex-m platform  
							
							
							
						 
						
							2021-01-19 08:57:44 -06:00  
				
					
						
							
							
								 
						
							
								d7ba7679b6 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into risc-v  
							
							
							
						 
						
							2020-10-16 23:27:38 +08:00  
				
					
						
							
							
								 
						
							
								84949754a0 
								
							 
						 
						
							
							
								
								Fix bfloat16 conditional  
							
							
							
						 
						
							2020-10-13 09:11:36 +02:00  
				
					
						
							
							
								 
						
							
								ca31c32693 
								
							 
						 
						
							
							
								
								Rename "HALF" and "sh" to "BFLOAT16" and "sb"  
							
							
							
						 
						
							2020-10-11 23:49:22 +02:00  
				
					
						
							
							
								 
						
							
								dc8e4e1959 
								
							 
						 
						
							
							
								
								Reduce the BLAS3 heap allocation threshold to 32 and mark it as configurable  
							
							
							
						 
						
							2020-10-04 22:59:24 +02:00  
				
					
						
							
							
								 
						
							
								d2333e7842 
								
							 
						 
						
							
							
								
								aarch64 fix std=c18 compilation  
							
							
							
						 
						
							2020-10-03 18:00:34 +03:00  
				
					
						
							
							
								 
						
							
								deaeb6c5b8 
								
							 
						 
						
							
							
								
								Add bfloat16 based dot and conversion with single/double  
							
							... 
							
							
							
							1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t
Signed-off-by: Chen, Guobing <guobing.chen@intel.com> 
							
						 
						
							2020-09-04 02:31:25 +08:00  
				
					
						
							
							
								 
						
							
								a36eb19ae0 
								
							 
						 
						
							
							
								
								Update conditional for C11 atomics to use HAVE_C11  
							
							
							
						 
						
							2020-07-18 17:13:24 +00:00  
				
					
						
							
							
								 
						
							
								9fe930f205 
								
							 
						 
						
							
							
								
								powerpc: Add support for future processor  
							
							... 
							
							
							
							This is the initial patch to support build infrastructure
for POWER10 architecture. 
							
						 
						
							2020-06-11 15:47:20 -05:00  
				
					
						
							
							
								 
						
							
								67cc4b9e16 
								
							 
						 
						
							
							
								
								Fix warnings in clang and export symbol  
							
							
							
						 
						
							2020-04-15 19:15:23 -05:00  
				
					
						
							
							
								 
						
							
								7eb55504b1 
								
							 
						 
						
							
							
								
								RFC : Add half precision gemm for bfloat16 in OpenBLAS  
							
							... 
							
							
							
							This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved. 
							
						 
						
							2020-04-14 14:55:08 -05:00  
				
					
						
							
							
								 
						
							
								79fd006c58 
								
							 
						 
						
							
							
								
								Expose the support_avx512 function provided in dynamic.c  
							
							
							
						 
						
							2020-03-26 21:25:39 +01:00  
				
					
						
							
							
								 
						
							
								4aa2d89217 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into risc-v  
							
							
							
						 
						
							2020-02-27 13:53:49 +08:00  
				
					
						
							
							
								 
						
							
								d2cb610272 
								
							 
						 
						
							
							
								
								Add option USE_LOCKING for single-threaded build with locking support  
							
							... 
							
							
							
							for calling from concurrent threads 
							
						 
						
							2019-05-15 23:18:43 +02:00  
				
					
						
							
							
								 
						
							
								40e53e52d6 
								
							 
						 
						
							
							
								
								snprintf define consolidated to common.h  
							
							
							
						 
						
							2019-04-22 17:01:34 -07:00  
				
					
						
							
							
								 
						
							
								7c51cc8527 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into develop  
							
							
							
						 
						
							2019-03-29 19:36:29 +01:00  
				
					
						
							
							
								 
						
							
								853a18bc17 
								
							 
						 
						
							
							
								
								power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself  
							
							
							
						 
						
							2019-03-29 15:49:40 +00:00  
				
					
						
							
							
								 
						
							
								1006ff8a7b 
								
							 
						 
						
							
							
								
								Use POSIX getenv on Cygwin  
							
							... 
							
							
							
							The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork(). 
							
						 
						
							2019-03-15 15:06:30 +01:00  
				
					
						
							
							
								 
						
							
								9531d0e175 
								
							 
						 
						
							
							
								
								lets fit it in one 4k page  
							
							
							
						 
						
							2018-11-06 17:51:24 +00:00  
				
					
						
							
							
								 
						
							
								3fd41313fc 
								
							 
						 
						
							
							
								
								add low bound for number of buffers  
							
							
							
						 
						
							2018-11-06 09:40:13 +00:00  
				
					
						
							
							
								 
						
							
								48610a4524 
								
							 
						 
						
							
							
								
								fix blasabs for windows  
							
							... 
							
							
							
							Bugfix in #1713  for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API. 
							
						 
						
							2018-08-05 08:18:51 -04:00  
				
					
						
							
							
								 
						
							
								4a553e8678 
								
							 
						 
						
							
							
								
								Merge pull request  #1713  from martin-frbg/issue1710  
							
							... 
							
							
							
							Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64 
							
						 
						
							2018-08-04 23:51:31 +02:00  
				
					
						
							
							
								 
						
							
								40c068a875 
								
							 
						 
						
							
							
								
								Introduce blasabs() to switch between abs() and labs() for INTERFACE64  
							
							
							
						 
						
							2018-08-04 20:07:59 +02:00  
				
					
						
							
							
								 
						
							
								6463bffd59 
								
							 
						 
						
							
							
								
								Haiku supporting patches  
							
							
							
						 
						
							2018-08-02 20:49:14 +02:00  
				
					
						
							
							
								 
						
							
								de8fff671d 
								
							 
						 
						
							
							
								
								Revert "Use usleep instead of sched_yield by default"  
							
							
							
						 
						
							2018-06-11 17:05:27 +02:00  
				
					
						
							
							
								 
						
							
								ed7c4a043b 
								
							 
						 
						
							
							
								
								Use usleep instead of sched_yield by default  
							
							... 
							
							
							
							sched_yield only burns cpu cycles, fixes  #900 ,  see also #923 , #1560  
							
						 
						
							2018-06-07 10:18:26 +02:00  
				
					
						
							
							
								 
						
							
								83da278093 
								
							 
						 
						
							
							
								
								Update common.h  
							
							
							
						 
						
							2018-06-06 09:27:49 +02:00  
				
					
						
							
							
								 
						
							
								358d4df2bd 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into issue1593-2  
							
							
							
						 
						
							2018-06-06 09:21:41 +02:00  
				
					
						
							
							
								 
						
							
								06d43760e4 
								
							 
						 
						
							
							
								
								Restore _Atomic define before stdatomic.h for old gcc  
							
							... 
							
							
							
							see #1593  
							
						 
						
							2018-06-06 09:18:10 +02:00  
				
					
						
							
							
								 
						
							
								354a976a59 
								
							 
						 
						
							
							
								
								Fix inverted condition in _Atomic declaration  
							
							... 
							
							
							
							fixes  #1593  
						
							2018-06-05 10:31:34 +02:00  
				
					
						
							
							
								 
						
							
								53457f222f 
								
							 
						 
						
							
							
								
								move _Atomic define to common.h  
							
							
							
						 
						
							2018-05-11 00:13:16 -07:00  
				
					
						
							
							
								 
						
							
								1b83341d19 
								
							 
						 
						
							
							
								
								Fix race condition in blas_server_omp.c  
							
							... 
							
							
							
							Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d 
							
						 
						
							2018-04-27 17:00:42 +08:00  
				
					
						
							
							
								 
						
							
								c167a3d6f4 
								
							 
						 
						
							
							
								
								Added RISCV build  
							
							
							
						 
						
							2018-04-16 14:08:31 -07:00  
				
					
						
							
							
								 
						
							
								a41d241a0e 
								
							 
						 
						
							
							
								
								Add support for DragonFly BSD  
							
							
							
						 
						
							2018-04-03 16:39:29 -07:00  
				
					
						
							
							
								 
						
							
								8da6b6ae52 
								
							 
						 
						
							
							
								
								Allow building on OpenBSD  
							
							... 
							
							
							
							With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1. 
							
						 
						
							2018-04-02 10:48:22 -07:00  
				
					
						
							
							
								 
						
							
								eb98fdddfc 
								
							 
						 
						
							
							
								
								typedefs only for c  
							
							
							
						 
						
							2017-07-29 20:38:16 +05:30  
				
					
						
							
							
								 
						
							
								ca17b4b75c 
								
							 
						 
						
							
							
								
								Fix complex support for MSVC headers  
							
							
							
						 
						
							2017-07-28 11:50:29 +05:30  
				
					
						
							
							
								 
						
							
								34513be726 
								
							 
						 
						
							
							
								
								Add Microsoft Windows 10 UWP build support  
							
							
							
						 
						
							2017-06-23 13:07:34 -07:00  
				
					
						
							
							
								 
						
							
								ea26b00c06 
								
							 
						 
						
							
							
								
								Fix CREAL,CIMAG macros for PGI  
							
							
							
						 
						
							2017-03-13 00:36:01 +01:00  
				
					
						
							
							
								 
						
							
								b678471d65 
								
							 
						 
						
							
							
								
								Merge branch 'z13' into develop  
							
							... 
							
							
							
							Conflicts:
	CONTRIBUTORS.md 
							
						 
						
							2017-01-09 05:52:42 -05:00  
				
					
						
							
							
								 
						
							
								a94f2b7848 
								
							 
						 
						
							
							
								
								Change to allow compiling with USE_OPENMP on MSVC  
							
							... 
							
							
							
							MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition. 
							
						 
						
							2016-06-14 14:37:28 -04:00  
				
					
						
							
							
								 
						
							
								6a2bde7a2d 
								
							 
						 
						
							
							
								
								optimized dgemm and dgetrf for POWER8  
							
							
							
						 
						
							2016-05-17 14:45:27 +02:00  
				
					
						
							
							
								 
						
							
								2c3dfe2bf3 
								
							 
						 
						
							
							
								
								MIPS P5600(32 bit) and I6400(64 bit) cores support added.  
							
							... 
							
							
							
							Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> 
							
						 
						
							2016-04-22 14:03:18 +05:30  
				
					
						
							
							
								 
						
							
								dd43661cfd 
								
							 
						 
						
							
							
								
								Init IBM z system (s390x) porting.  
							
							
							
						 
						
							2016-04-15 18:02:24 -04:00