3dc6052c7e 
								
							 
						 
						
							
							
								
								initial support for Sapphire Rapids platform  
							
							
							
						 
						
							2021-10-12 01:30:40 -07:00  
				
					
						
							
							
								 
						
							
								e740c4873d 
								
							 
						 
						
							
							
								
								Enable COOPERLAKE build target  
							
							... 
							
							
							
							Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target. 
							
						 
						
							2020-08-13 06:18:00 +08:00  
				
					
						
							
							
								 
						
							
								aef9804089 
								
							 
						 
						
							
							
								
								Fix unwanted case-sensitivity in x86 LSAME for (AMD) processors without CMOV  
							
							... 
							
							
							
							Problem was already noticed some years ago in #238 , but back then the problem was only corrected in one of the #ifdef branches.
Fixes  #2214  
							
						 
						
							2019-08-13 10:19:10 +02:00  
				
					
						
							
							
								 
						
							
								100d94f94e 
								
							 
						 
						
							
							
								
								Add ?sum  
							
							
							
						 
						
							2019-03-31 13:55:05 +02:00  
				
					
						
							
							
								 
						
							
								e3bc83f2a8 
								
							 
						 
						
							
							
								
								Add x86 implementation of ?sum  
							
							... 
							
							
							
							as trivial copy of ?asum with the fabs calls removed 
							
						 
						
							2019-03-30 22:26:10 +01:00  
				
					
						
							
							
								 
						
							
								0023515733 
								
							 
						 
						
							
							
								
								Typo fix (misplaced parenthesis)  
							
							
							
						 
						
							2018-06-03 13:22:59 +02:00  
				
					
						
							
							
								 
						
							
								99c7bba8e4 
								
							 
						 
						
							
							
								
								Initial support for SkylakeX / AVX512  
							
							... 
							
							
							
							This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change. 
							
						 
						
							2018-06-03 07:58:52 +00:00  
				
					
						
							
							
								 
						
							
								7df8c4f76f 
								
							 
						 
						
							
							
								
								typo fix  
							
							
							
						 
						
							2018-05-31 17:23:08 +02:00  
				
					
						
							
							
								 
						
							
								2fc748bf72 
								
							 
						 
						
							
							
								
								Restore optimized swap kernel now that we have a proper fix  
							
							
							
						 
						
							2018-05-31 13:41:12 +02:00  
				
					
						
							
							
								 
						
							
								d1b7be14aa 
								
							 
						 
						
							
							
								
								Handle INCX=0,INCY=0 case  
							
							... 
							
							
							
							Fixes  #1575  (sswap/dswap failing the swap utest on x86) as suggested by atsampson. 
						
							2018-05-31 12:52:04 +02:00  
				
					
						
							
							
								 
						
							
								28ac9ea5a6 
								
							 
						 
						
							
							
								
								Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT  
							
							... 
							
							
							
							to resolve dsdot utest failure seen in #1492  
							
						 
						
							2018-03-17 13:49:15 +01:00  
				
					
						
							
							
								 
						
							
								e7366a4161 
								
							 
						 
						
							
							
								
								Restore the remaining utests ( #1462 )  
							
							... 
							
							
							
							* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API) 
							
						 
						
							2018-02-20 10:07:17 +01:00  
				
					
						
							
							
								 
						
							
								c9ff735da6 
								
							 
						 
						
							
							
								
								Add ZEN support (tested for auto-detected static backend)  
							
							
							
						 
						
							2017-03-19 15:32:50 +01:00  
				
					
						
							
							
								 
						
							
								53b6023a6c 
								
							 
						 
						
							
							
								
								Fix cmake bug on MSVC 32-bit.  
							
							
							
						 
						
							2015-10-26 14:52:13 -05:00  
				
					
						
							
							
								 
						
							
								7df0820160 
								
							 
						 
						
							
							
								
								Use C kernels for s/dgemv on x86.  
							
							
							
						 
						
							2015-08-19 08:07:47 -05:00  
				
					
						
							
							
								 
						
							
								1cf2b10224 
								
							 
						 
						
							
							
								
								Use pure C generic target on x86 and x86_64.  
							
							... 
							
							
							
							make TARGET=GENERIC
?gemm3m is unimplemented on generic target. 
							
						 
						
							2015-08-03 23:55:56 -05:00  
				
					
						
							
							
								 
						
							
								0884b73c69 
								
							 
						 
						
							
							
								
								Lapack-test Windows 32bit now error free  
							
							
							
						 
						
							2014-07-10 11:01:47 +02:00  
				
					
						
							
							
								 
						
							
								9bd9472ae9 
								
							 
						 
						
							
							
								
								Lapack-test: cleanup of x86 32bit KERNEL file  
							
							
							
						 
						
							2014-07-09 16:08:19 +02:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								793509a3b5 
								
							 
						 
						
							
							
								
								replaced files for sdot, sgemv_n and sgemv_t for bug  #348  
							
							
							
						 
						
							2014-05-06 15:29:39 +02:00  
				
					
						
							
							
								 
						
							
								9423f980f6 
								
							 
						 
						
							
							
								
								modified trsm kernel  
							
							
							
						 
						
							2013-12-02 10:08:14 +01:00  
				
					
						
							
							
								 
						
							
								c6156b2ef2 
								
							 
						 
						
							
							
								
								added trsm kernels from origin  
							
							
							
						 
						
							2013-12-01 22:39:39 +01:00  
				
					
						
							
							
								 
						
							
								6216ab8a7e 
								
							 
						 
						
							
							
								
								removed obsolete gemm_kernels from haswell branch  
							
							
							
						 
						
							2013-11-04 08:33:04 +01:00  
				
					
						
							
							
								 
						
							
								f51a849d91 
								
							 
						 
						
							
							
								
								Merge pull request  #278  from wernsaar/haswell  
							
							... 
							
							
							
							Merge wernsaar's Haswell gemm kernels. 
							
						 
						
							2013-08-17 08:24:37 -07:00  
				
					
						
							
							
								 
						
							
								4070d9a123 
								
							 
						 
						
							
							
								
								added dgemm_kernel_16x2_haswell.S  
							
							
							
						 
						
							2013-08-15 19:17:20 +02:00  
				
					
						
							
							
								 
						
							
								0b90c0ec64 
								
							 
						 
						
							
							
								
								added sgemm_kernel_16x4_haswell.S  
							
							
							
						 
						
							2013-08-15 18:46:14 +02:00  
				
					
						
							
							
								 
						
							
								2638370844 
								
							 
						 
						
							
							
								
								Init code base for Intel Haswell.  
							
							
							
						 
						
							2013-08-13 00:54:59 +08:00  
				
					
						
							
							
								 
						
							
								886cbaf4e4 
								
							 
						 
						
							
							
								
								Support AMD Piledriver by bulldozer kernels.  
							
							
							
						 
						
							2013-07-06 12:06:43 -03:00  
				
					
						
							
							
								 
						
							
								fa916a0fac 
								
							 
						 
						
							
							
								
								Fixed   #238  bug in lsame on x86.  
							
							
							
						 
						
							2013-06-28 22:43:41 +08:00  
				
					
						
							
							
								 
						
							
								6a72840945 
								
							 
						 
						
							
							
								
								Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.  
							
							
							
						 
						
							2013-05-29 13:23:12 +08:00  
				
					
						
							
							
								 
						
							
								5c8bf6ae0e 
								
							 
						 
						
							
							
								
								Merge branch 'bulldozer' into develop  
							
							
							
						 
						
							2013-02-10 01:19:42 +08:00  
				
					
						
							
							
								 
						
							
								0b08f7479e 
								
							 
						 
						
							
							
								
								Refs  #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.  
							
							
							
						 
						
							2013-01-20 21:22:12 +08:00  
				
					
						
							
							
								 
						
							
								69200884e1 
								
							 
						 
						
							
							
								
								Refs  #173 . Fixed overflow internal buffer bug of gemv_n on x86  
							
							
							
						 
						
							2012-12-25 09:27:49 +08:00  
				
					
						
							
							
								 
						
							
								0d1518add9 
								
							 
						 
						
							
							
								
								Refs  #173 . Fixed overflow internal buffer bug of sgemv_t on x86  
							
							
							
						 
						
							2012-12-25 09:10:17 +08:00  
				
					
						
							
							
								 
						
							
								91ed4e4450 
								
							 
						 
						
							
							
								
								Refs  #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.  
							
							
							
						 
						
							2012-12-23 23:14:17 +08:00  
				
					
						
							
							
								 
						
							
								fd3046b32a 
								
							 
						 
						
							
							
								
								Refs  #173 . Fixed overflow internal buffer bug of gemv_t on x86.  
							
							
							
						 
						
							2012-12-23 21:47:22 +08:00  
				
					
						
							
							
								 
						
							
								bfaaa975e6 
								
							 
						 
						
							
							
								
								Added BULLDOZER target. So far it uses barcelona kernels.  
							
							
							
						 
						
							2012-12-07 00:53:31 +08:00  
				
					
						
							
							
								 
						
							
								b7c0fa6bd2 
								
							 
						 
						
							
							
								
								Init AMD Bulldozer codebase.  
							
							
							
						 
						
							2012-12-06 07:29:54 -05:00  
				
					
						
							
							
								 
						
							
								2573311308 
								
							 
						 
						
							
							
								
								refs  #140 . Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.  
							
							... 
							
							
							
							GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes. 
							
						 
						
							2012-09-24 20:34:33 +08:00  
				
					
						
							
							
								 
						
							
								d3b67d0bd8 
								
							 
						 
						
							
							
								
								Refs  #113 . Fixed the typo BOBCATE -> BOBCAT  
							
							
							
						 
						
							2012-05-31 22:40:15 +08:00  
				
					
						
							
							
								 
						
							
								d6cab3f37e 
								
							 
						 
						
							
							
								
								Refs  #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.  
							
							
							
						 
						
							2012-05-31 18:17:45 +08:00  
				
					
						
							
							
								 
						
							
								a53c6e2440 
								
							 
						 
						
							
							
								
								Merge branch 'develop' into sandybridge  
							
							
							
						 
						
							2012-05-25 23:16:44 +08:00  
				
					
						
							
							
								 
						
							
								5d657c6e67 
								
							 
						 
						
							
							
								
								Fixed   #96  a SEGFAULT bug in samax on x86.  
							
							
							
						 
						
							2012-04-26 16:50:57 +08:00  
				
					
						
							
							
								 
						
							
								03b0eb19f7 
								
							 
						 
						
							
							
								
								Refs  #86 . Test alpha=Nan in x86/x86_64 dscale.  
							
							
							
						 
						
							2012-04-05 18:16:18 +08:00  
				
					
						
							
							
								 
						
							
								19a48b82cf 
								
							 
						 
						
							
							
								
								Init Sandybridge codes based on Nehalem.  
							
							
							
						 
						
							2012-03-30 20:01:03 +08:00  
				
					
						
							
							
								 
						
							
								dff146e306 
								
							 
						 
						
							
							
								
								refs  #80 . Used GEMV SSE2 kernels on x86.  
							
							
							
						 
						
							2012-03-19 17:56:22 +08:00  
				
					
						
							
							
								 
						
							
								7b410b7f0e 
								
							 
						 
						
							
							
								
								Fixed   #58  zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.  
							
							... 
							
							
							
							In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32 ). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes). 
							
						 
						
							2011-09-14 23:52:51 +08:00  
				
					
						
							
							
								 
						
							
								b1fe26c45a 
								
							 
						 
						
							
							
								
								refs  #55 . Changed  DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes.  
							
							
							
						 
						
							2011-09-06 14:14:07 +08:00  
				
					
						
							
							
								 
						
							
								31040e4d80 
								
							 
						 
						
							
							
								
								Fixed   #32  a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.  
							
							
							
						 
						
							2011-06-03 13:19:54 +08:00  
				
					
						
							
							
								 
						
							
								272f62a2b6 
								
							 
						 
						
							
							
								
								Changed movlps macro name in capital in x86/zdot_sse2.S file.  
							
							
							
						 
						
							2011-03-03 00:46:39 +08:00