d50abc8903 
								
							 
						 
						
							
							
								
								ARM64: Move parameters from parameter.c to param.h  
							
							... 
							
							
							
							Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time. 
							
						 
						
							2018-10-22 01:45:51 -07:00  
				
					
						
							
							
								 
						
							
								21f46a1cf2 
								
							 
						 
						
							
							
								
								ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8  
							
							... 
							
							
							
							Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines. 
							
						 
						
							2018-10-17 10:44:37 -07:00  
				
					
						
							
							
								 
						
							
								99c7bba8e4 
								
							 
						 
						
							
							
								
								Initial support for SkylakeX / AVX512  
							
							... 
							
							
							
							This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change. 
							
						 
						
							2018-06-03 07:58:52 +00:00  
				
					
						
							
							
								 
						
							
								c9ff735da6 
								
							 
						 
						
							
							
								
								Add ZEN support (tested for auto-detected static backend)  
							
							
							
						 
						
							2017-03-19 15:32:50 +01:00  
				
					
						
							
							
								 
						
							
								a86474c6f7 
								
							 
						 
						
							
							
								
								THUNDERX2T99: Performance fix for ZGEMM  
							
							
							
						 
						
							2017-02-28 06:05:00 -08:00  
				
					
						
							
							
								 
						
							
								19ba133383 
								
							 
						 
						
							
							
								
								THUNDERX2T99: Add Optimized ZGEMM Implementation  
							
							
							
						 
						
							2017-02-28 05:31:41 +00:00  
				
					
						
							
							
								 
						
							
								2757b49767 
								
							 
						 
						
							
							
								
								THUNDERX2T99: Add Optimized CGEMM Implementation  
							
							
							
						 
						
							2017-01-30 17:44:26 +05:30  
				
					
						
							
							
								 
						
							
								f279ff4789 
								
							 
						 
						
							
							
								
								THUNDERX2T99: Add Optimized SGEMM Implementation  
							
							
							
						 
						
							2017-01-16 21:44:33 +05:30  
				
					
						
							
							
								 
						
							
								0863a0d4b4 
								
							 
						 
						
							
							
								
								Merge pull request  #1061  from ashwinyes/develop_aarch64_vulcan_thunderx_patch  
							
							... 
							
							
							
							Add new targets for ARM64 
							
						 
						
							2017-01-16 13:20:10 +08:00  
				
					
						
							
							
								 
						
							
								c1c5a63d3c 
								
							 
						 
						
							
							
								
								prepared parameter.c for UNROLL values, that are not a power of two  
							
							
							
						 
						
							2017-01-11 09:50:28 +01:00  
				
					
						
							
							
								 
						
							
								4b55fae337 
								
							 
						 
						
							
							
								
								ARM64: Add Cavium THUNDERX2T99 Target  
							
							
							
						 
						
							2017-01-11 11:18:40 +05:30  
				
					
						
							
							
								 
						
							
								0b8e876d89 
								
							 
						 
						
							
							
								
								VULCAN: Add optimized DGEMM implementation  
							
							
							
						 
						
							2017-01-10 15:01:37 +05:30  
				
					
						
							
							
								 
						
							
								4713e7c47f 
								
							 
						 
						
							
							
								
								ARM64: Add the VULCAN Target  
							
							
							
						 
						
							2017-01-10 15:01:17 +05:30  
				
					
						
							
							
								 
						
							
								78b05f6476 
								
							 
						 
						
							
							
								
								bugfix for EXCAVATOR and DYNAMIC_ARCH  
							
							
							
						 
						
							2016-04-25 10:13:30 +02:00  
				
					
						
							
							
								 
						
							
								05196a8497 
								
							 
						 
						
							
							
								
								Refs  #716 . Only call getenv at init function.  
							
							
							
						 
						
							2016-03-09 12:50:07 -05:00  
				
					
						
							
							
								 
						
							
								4319769b79 
								
							 
						 
						
							
							
								
								added target processor STEAMROLLER  
							
							
							
						 
						
							2014-12-28 20:16:46 +08:00  
				
					
						
							
							
								 
						
							
								a64fe9bcc9 
								
							 
						 
						
							
							
								
								added optimized sgemv_n kernel for sandybridge  
							
							
							
						 
						
							2014-09-06 08:41:53 +02:00  
				
					
						
							
							
								 
						
							
								2021d0f9d6 
								
							 
						 
						
							
							
								
								experimentally removed expensive function calls  
							
							
							
						 
						
							2014-09-05 15:05:53 +02:00  
				
					
						
							
							
								 
						
							
								50e99a52ea 
								
							 
						 
						
							
							
								
								added definitions for PILEDRIVER and HASWELL  
							
							
							
						 
						
							2014-07-06 12:08:27 +02:00  
				
					
						
							
							
								 
						
							
								7a8949e0ce 
								
							 
						 
						
							
							
								
								Merge branch 'develop' of  https://github.com/TimothyGu/OpenBLAS  into TimothyGu-develop  
							
							... 
							
							
							
							Conflicts:
	driver/others/memory.c 
							
						 
						
							2014-06-28 20:51:31 +08:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								f41f03ab83 
								
							 
						 
						
							
							
								
								fix   #394 . this cleans up some handles after using them, and doesn't disable ALL process privileges upon success  
							
							
							
						 
						
							2014-06-27 12:16:57 -04:00  
				
					
						
							
							
								 
						
							
								bfaaa975e6 
								
							 
						 
						
							
							
								
								Added BULLDOZER target. So far it uses barcelona kernels.  
							
							
							
						 
						
							2012-12-07 00:53:31 +08:00  
				
					
						
							
							
								 
						
							
								d3b67d0bd8 
								
							 
						 
						
							
							
								
								Refs  #113 . Fixed the typo BOBCATE -> BOBCAT  
							
							
							
						 
						
							2012-05-31 22:40:15 +08:00  
				
					
						
							
							
								 
						
							
								d6cab3f37e 
								
							 
						 
						
							
							
								
								Refs  #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.  
							
							
							
						 
						
							2012-05-31 18:17:45 +08:00  
				
					
						
							
							
								 
						
							
								19a48b82cf 
								
							 
						 
						
							
							
								
								Init Sandybridge codes based on Nehalem.  
							
							
							
						 
						
							2012-03-30 20:01:03 +08:00  
				
					
						
							
							
								 
						
							
								8163ab7e55 
								
							 
						 
						
							
							
								
								Change the block size on Loongson 3B.  
							
							
							
						 
						
							2011-11-23 18:41:49 +00:00  
				
					
						
							
							
								 
						
							
								b95ad4cfaf 
								
							 
						 
						
							
							
								
								Support detecting ICT Loongson-3B CPU.  
							
							
							
						 
						
							2011-11-09 19:29:50 +00:00  
				
					
						
							
							
								 
						
							
								831858b883 
								
							 
						 
						
							
							
								
								Modify aligned address of sa and sb to improve the performance of multi-threads.  
							
							
							
						 
						
							2011-09-23 20:59:48 +00:00  
				
					
						
							
							
								 
						
							
								16fc083322 
								
							 
						 
						
							
							
								
								Refs  #47 . Fixed the seting parameter bug on Loongson 3A single thread version.  
							
							
							
						 
						
							2011-09-08 16:39:34 +00:00  
				
					
						
							
							
								 
						
							
								4727fe8abf 
								
							 
						 
						
							
							
								
								Refs  #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.  
							
							
							
						 
						
							2011-09-05 15:13:52 +00:00  
				
					
						
							
							
								 
						
							
								342bbc3871 
								
							 
						 
						
							
							
								
								Import GotoBLAS2 1.13 BSD version codes.  
							
							
							
						 
						
							2011-01-24 14:54:24 +00:00