67cc4b9e16 
								
							 
						 
						
							
							
								
								Fix warnings in clang and export symbol  
							
							
							
						 
						
							2020-04-15 19:15:23 -05:00  
				
					
						
							
							
								 
						
							
								7eb55504b1 
								
							 
						 
						
							
							
								
								RFC : Add half precision gemm for bfloat16 in OpenBLAS  
							
							... 
							
							
							
							This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved. 
							
						 
						
							2020-04-14 14:55:08 -05:00  
				
					
						
							
							
								 
						
							
								79fd006c58 
								
							 
						 
						
							
							
								
								Expose the support_avx512 function provided in dynamic.c  
							
							
							
						 
						
							2020-03-26 21:25:39 +01:00  
				
					
						
							
							
								 
						
							
								d2cb610272 
								
							 
						 
						
							
							
								
								Add option USE_LOCKING for single-threaded build with locking support  
							
							... 
							
							
							
							for calling from concurrent threads 
							
						 
						
							2019-05-15 23:18:43 +02:00  
				
					
						
							
							
								 
						
							
								40e53e52d6 
								
							 
						 
						
							
							
								
								snprintf define consolidated to common.h  
							
							
							
						 
						
							2019-04-22 17:01:34 -07:00  
				
					
						
							
							
								 
						
							
								7c51cc8527 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into develop  
							
							
							
						 
						
							2019-03-29 19:36:29 +01:00  
				
					
						
							
							
								 
						
							
								853a18bc17 
								
							 
						 
						
							
							
								
								power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself  
							
							
							
						 
						
							2019-03-29 15:49:40 +00:00  
				
					
						
							
							
								 
						
							
								1006ff8a7b 
								
							 
						 
						
							
							
								
								Use POSIX getenv on Cygwin  
							
							... 
							
							
							
							The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork(). 
							
						 
						
							2019-03-15 15:06:30 +01:00  
				
					
						
							
							
								 
						
							
								9531d0e175 
								
							 
						 
						
							
							
								
								lets fit it in one 4k page  
							
							
							
						 
						
							2018-11-06 17:51:24 +00:00  
				
					
						
							
							
								 
						
							
								3fd41313fc 
								
							 
						 
						
							
							
								
								add low bound for number of buffers  
							
							
							
						 
						
							2018-11-06 09:40:13 +00:00  
				
					
						
							
							
								 
						
							
								48610a4524 
								
							 
						 
						
							
							
								
								fix blasabs for windows  
							
							... 
							
							
							
							Bugfix in #1713  for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API. 
							
						 
						
							2018-08-05 08:18:51 -04:00  
				
					
						
							
							
								 
						
							
								4a553e8678 
								
							 
						 
						
							
							
								
								Merge pull request  #1713  from martin-frbg/issue1710  
							
							... 
							
							
							
							Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64 
							
						 
						
							2018-08-04 23:51:31 +02:00  
				
					
						
							
							
								 
						
							
								40c068a875 
								
							 
						 
						
							
							
								
								Introduce blasabs() to switch between abs() and labs() for INTERFACE64  
							
							
							
						 
						
							2018-08-04 20:07:59 +02:00  
				
					
						
							
							
								 
						
							
								6463bffd59 
								
							 
						 
						
							
							
								
								Haiku supporting patches  
							
							
							
						 
						
							2018-08-02 20:49:14 +02:00  
				
					
						
							
							
								 
						
							
								de8fff671d 
								
							 
						 
						
							
							
								
								Revert "Use usleep instead of sched_yield by default"  
							
							
							
						 
						
							2018-06-11 17:05:27 +02:00  
				
					
						
							
							
								 
						
							
								ed7c4a043b 
								
							 
						 
						
							
							
								
								Use usleep instead of sched_yield by default  
							
							... 
							
							
							
							sched_yield only burns cpu cycles, fixes  #900 ,  see also #923 , #1560  
							
						 
						
							2018-06-07 10:18:26 +02:00  
				
					
						
							
							
								 
						
							
								83da278093 
								
							 
						 
						
							
							
								
								Update common.h  
							
							
							
						 
						
							2018-06-06 09:27:49 +02:00  
				
					
						
							
							
								 
						
							
								358d4df2bd 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into issue1593-2  
							
							
							
						 
						
							2018-06-06 09:21:41 +02:00  
				
					
						
							
							
								 
						
							
								06d43760e4 
								
							 
						 
						
							
							
								
								Restore _Atomic define before stdatomic.h for old gcc  
							
							... 
							
							
							
							see #1593  
							
						 
						
							2018-06-06 09:18:10 +02:00  
				
					
						
							
							
								 
						
							
								354a976a59 
								
							 
						 
						
							
							
								
								Fix inverted condition in _Atomic declaration  
							
							... 
							
							
							
							fixes  #1593  
						
							2018-06-05 10:31:34 +02:00  
				
					
						
							
							
								 
						
							
								53457f222f 
								
							 
						 
						
							
							
								
								move _Atomic define to common.h  
							
							
							
						 
						
							2018-05-11 00:13:16 -07:00  
				
					
						
							
							
								 
						
							
								1b83341d19 
								
							 
						 
						
							
							
								
								Fix race condition in blas_server_omp.c  
							
							... 
							
							
							
							Change-Id: Ic896276cd073d6b41930c7c5a29d66348cd1725d 
							
						 
						
							2018-04-27 17:00:42 +08:00  
				
					
						
							
							
								 
						
							
								a41d241a0e 
								
							 
						 
						
							
							
								
								Add support for DragonFly BSD  
							
							
							
						 
						
							2018-04-03 16:39:29 -07:00  
				
					
						
							
							
								 
						
							
								8da6b6ae52 
								
							 
						 
						
							
							
								
								Allow building on OpenBSD  
							
							... 
							
							
							
							With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1. 
							
						 
						
							2018-04-02 10:48:22 -07:00  
				
					
						
							
							
								 
						
							
								eb98fdddfc 
								
							 
						 
						
							
							
								
								typedefs only for c  
							
							
							
						 
						
							2017-07-29 20:38:16 +05:30  
				
					
						
							
							
								 
						
							
								ca17b4b75c 
								
							 
						 
						
							
							
								
								Fix complex support for MSVC headers  
							
							
							
						 
						
							2017-07-28 11:50:29 +05:30  
				
					
						
							
							
								 
						
							
								34513be726 
								
							 
						 
						
							
							
								
								Add Microsoft Windows 10 UWP build support  
							
							
							
						 
						
							2017-06-23 13:07:34 -07:00  
				
					
						
							
							
								 
						
							
								ea26b00c06 
								
							 
						 
						
							
							
								
								Fix CREAL,CIMAG macros for PGI  
							
							
							
						 
						
							2017-03-13 00:36:01 +01:00  
				
					
						
							
							
								 
						
							
								b678471d65 
								
							 
						 
						
							
							
								
								Merge branch 'z13' into develop  
							
							... 
							
							
							
							Conflicts:
	CONTRIBUTORS.md 
							
						 
						
							2017-01-09 05:52:42 -05:00  
				
					
						
							
							
								 
						
							
								a94f2b7848 
								
							 
						 
						
							
							
								
								Change to allow compiling with USE_OPENMP on MSVC  
							
							... 
							
							
							
							MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition. 
							
						 
						
							2016-06-14 14:37:28 -04:00  
				
					
						
							
							
								 
						
							
								6a2bde7a2d 
								
							 
						 
						
							
							
								
								optimized dgemm and dgetrf for POWER8  
							
							
							
						 
						
							2016-05-17 14:45:27 +02:00  
				
					
						
							
							
								 
						
							
								2c3dfe2bf3 
								
							 
						 
						
							
							
								
								MIPS P5600(32 bit) and I6400(64 bit) cores support added.  
							
							... 
							
							
							
							Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> 
							
						 
						
							2016-04-22 14:03:18 +05:30  
				
					
						
							
							
								 
						
							
								dd43661cfd 
								
							 
						 
						
							
							
								
								Init IBM z system (s390x) porting.  
							
							
							
						 
						
							2016-04-15 18:02:24 -04:00  
				
					
						
							
							
								 
						
							
								2e6333f74e 
								
							 
						 
						
							
							
								
								modified common.h for piledriver  
							
							
							
						 
						
							2016-03-09 15:48:29 +01:00  
				
					
						
							
							
								 
						
							
								a1a96589aa 
								
							 
						 
						
							
							
								
								Fixed   #773  blas_quickdivide bug on CMake and Visual Studio x86 32-bit.  
							
							
							
						 
						
							2016-02-04 15:23:32 -05:00  
				
					
						
							
							
								 
						
							
								87a2ccc37c 
								
							 
						 
						
							
							
								
								Factorize MAX_STACK_ALLOC code to common_stackalloc.h  
							
							... 
							
							
							
							Ref #727  
							
						 
						
							2016-01-08 16:03:52 +01:00  
				
					
						
							
							
								 
						
							
								5f2fa15e04 
								
							 
						 
						
							
							
								
								include sched.h if OS is Android  
							
							
							
						 
						
							2016-01-05 12:36:49 +01:00  
				
					
						
							
							
								 
						
							
								9742dba595 
								
							 
						 
						
							
							
								
								Fix compiler errors in common.h  
							
							
							
						 
						
							2015-11-09 14:15:50 +05:30  
				
					
						
							
							
								 
						
							
								63c56d3da9 
								
							 
						 
						
							
							
								
								Only include complex.h since Android 5.0  
							
							
							
						 
						
							2015-10-27 10:47:55 -05:00  
				
					
						
							
							
								 
						
							
								8fade093aa 
								
							 
						 
						
							
							
								
								Fixed cmake bug on Visual Studio.  
							
							
							
						 
						
							2015-10-20 14:37:22 -05:00  
				
					
						
							
							
								 
						
							
								94b125255f 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into cmake  
							
							... 
							
							
							
							Conflicts:
	driver/others/memory.c 
							
						 
						
							2015-10-13 04:46:08 +08:00  
				
					
						
							
							
								 
						
							
								3684706a12 
								
							 
						 
						
							
							
								
								Include time.h.  
							
							
							
						 
						
							2015-10-08 15:18:54 +00:00  
				
					
						
							
							
								 
						
							
								2297a2d989 
								
							 
						 
						
							
							
								
								Fixed error in common.h for Android compilation introduced by  e12cf1123e 
							
							
							
						 
						
							2015-09-03 20:54:21 -04:00  
				
					
						
							
							
								 
						
							
								3efeaed0d8 
								
							 
						 
						
							
							
								
								correct a minor mistake  
							
							
							
						 
						
							2015-08-16 20:12:04 +02:00  
				
					
						
							
							
								 
						
							
								6b92204a7c 
								
							 
						 
						
							
							
								
								add fallback blas_lock implementation  
							
							... 
							
							
							
							to be used on armv5 and new platforms 
							
						 
						
							2015-08-16 18:59:17 +02:00  
				
					
						
							
							
								 
						
							
								e12cf1123e 
								
							 
						 
						
							
							
								
								add fallback rpcc implementation  
							
							... 
							
							
							
							- use on arm, arm64 and any new platform
- use faster integer math instead of double
- use similar scale as rdtsc so that timeouts work 
							
						 
						
							2015-08-16 18:59:16 +02:00  
				
					
						
							
							
								 
						
							
								f8eba3d548 
								
							 
						 
						
							
							
								
								Fixed cmake build bugs on Linux.  
							
							
							
						 
						
							2015-08-11 16:25:16 -05:00  
				
					
						
							
							
								 
						
							
								f874465bb8 
								
							 
						 
						
							
							
								
								Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.  
							
							... 
							
							
							
							Disable CBLAS and LAPACK. 
							
						 
						
							2015-08-10 14:10:44 -05:00  
				
					
						
							
							
								 
						
							
								dcd5ba4443 
								
							 
						 
						
							
							
								
								Merge branch 'cmake' of  https://github.com/hpanderson/OpenBLAS  into hpanderson_cmake  
							
							
							
						 
						
							2015-07-22 04:06:39 +08:00  
				
					
						
							
							
								 
						
							
								a11555c715 
								
							 
						 
						
							
							
								
								Support Android NDK armeabi-v7a-hard ABI. (-mfloat-abi=hard)  
							
							... 
							
							
							
							e.g.
make HOSTCC=gcc CC=arm-linux-androideabi-gcc NO_LAPACK=1 TARGET=ARMV7
In Android NDK, it uses armeabi-v7a-hard ABI.
TARGET_CFLAGS += -mhard-float -D_NDK_MATH_NO_SOFTFP=1
TARGET_LDFLAGS += -Wl,--no-warn-mismatch -lm_hard
For more information, please check hard-float example at
android_ndk/tests/device/hard-float/jni/. 
							
						 
						
							2015-05-20 21:57:27 -05:00