Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								e964ebd0d0 
								
							 
						 
						
							
							
								
								Add compiler option for AVX512-capable Ryzen(4)  
							
							 
							
							
							
						 
						
							2023-02-02 19:04:05 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								a0a4f7c447 
								
							 
						 
						
							
							
								
								Add -mfma to -mavx2 for clang, and add AVX2 declaration for Zen in DYNAMIC_ARCH builds  
							
							 
							
							
							
						 
						
							2022-09-13 22:47:00 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								85fd3c4279 
								
							 
						 
						
							
							
								
								Support compilation with the Cray C and Fortran compilers ( #3712 )  
							
							 
							
							... 
							
							
							
							* Add support for the Cray Fortran compiler 
							
						 
						
							2022-08-04 20:42:18 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								18b19d135b 
								
							 
						 
						
							
							
								
								C_LAPACK: Fixes to make it compile with MSVC ( #3605 )  
							
							 
							
							... 
							
							
							
							* Fix f2c-like support functions to compile with MSVC, and
re-enable C_LAPACK for MSVC in CMAKE
* Add MSVC&flang build to Azure CI in order to check C_LAPACK correctness 
							
						 
						
							2022-04-17 17:49:38 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								b7873605d4 
								
							 
						 
						
							
							
								
								Use f2c translations of LAPACK when no Fortran compiler is available ( #3539 )  
							
							 
							
							... 
							
							
							
							* Add C equivalents of the Fortran routines from Reference-LAPACK as fallbacks, and C_LAPACK variable to trigger their use 
							
						 
						
							2022-04-09 22:38:58 +02:00  
						
					 
				
					
						
							
							
								 
								Rafael Cardoso Fernandes Sousa
							
						 
						
							 
							
							
							
							
								
							
							
								d38110a5ce 
								
							 
						 
						
							
							
								
								Use CMake variables instead of as  
							
							 
							
							
							
						 
						
							2021-12-10 17:46:53 -06:00  
						
					 
				
					
						
							
							
								 
								Rafael Cardoso Fernandes Sousa
							
						 
						
							 
							
							
							
							
								
							
							
								214fbcee15 
								
							 
						 
						
							
							
								
								Fix cmake for power  
							
							 
							
							
							
						 
						
							2021-12-09 08:28:17 -06:00  
						
					 
				
					
						
							
							
								 
								Markus Mützel
							
						 
						
							 
							
							
							
							
								
							
							
								de2ed66596 
								
							 
						 
						
							
							
								
								cmake: Set SUFFIX64 also for NOFORTRAN  
							
							 
							
							
							
						 
						
							2021-11-15 08:53:52 +01:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								3dc6052c7e 
								
							 
						 
						
							
							
								
								initial support for Sapphire Rapids platform  
							
							 
							
							
							
						 
						
							2021-10-12 01:30:40 -07:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								e02df9fc55 
								
							 
						 
						
							
							
								
								Propagate BUILD_BFLOAT16 to CFLAGS  
							
							 
							
							
							
						 
						
							2021-09-14 16:12:27 +02:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								76ea8db4da 
								
							 
						 
						
							
							
								
								Small Matrix: enable by default for x86_64 arch  
							
							 
							
							... 
							
							
							
							If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path. 
							
						 
						
							2021-08-05 02:59:36 +00:00  
						
					 
				
					
						
							
							
								 
								Wangyang Guo
							
						 
						
							 
							
							
							
							
								
							
							
								fee5abd84b 
								
							 
						 
						
							
							
								
								Small Matrix: support cmake build  
							
							 
							
							
							
						 
						
							2021-08-04 08:50:15 +00:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								30f23be0f9 
								
							 
						 
						
							
							
								
								Rework setting of -mfma to only apply it where necessary  
							
							 
							
							
							
						 
						
							2021-07-22 12:00:03 +02:00  
						
					 
				
					
						
							
							
								 
								User User-User
							
						 
						
							 
							
							
							
							
								
							
							
								91e2b11d3c 
								
							 
						 
						
							
							
								
								add to cmake listings too  
							
							 
							
							
							
						 
						
							2021-06-20 15:32:42 +02:00  
						
					 
				
					
						
							
							
								 
								刘雨培
							
						 
						
							 
							
							
							
							
								
							
							
								725432efaa 
								
							 
						 
						
							
							
								
								pass NO_AVX512 macro def  
							
							 
							
							
							
						 
						
							2021-04-07 00:10:41 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								33b5670122 
								
							 
						 
						
							
							
								
								Merge pull request  #3096  from martin-frbg/fixclangcmake  
							
							 
							
							... 
							
							
							
							Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows 
							
						 
						
							2021-02-02 13:33:15 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								95e19e2e23 
								
							 
						 
						
							
							
								
								fix case in compiler name check  
							
							 
							
							... 
							
							
							
							Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com> 
							
						 
						
							2021-02-02 10:53:46 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								99ac042702 
								
							 
						 
						
							
							
								
								remove spurious lines (probably editor malfunction)  
							
							 
							
							
							
						 
						
							2021-02-01 21:02:53 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								774b9f8653 
								
							 
						 
						
							
							
								
								handle AppleClang in Cooperlake support condition  
							
							 
							
							
							
						 
						
							2021-02-01 20:18:53 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								eb1d2344f7 
								
							 
						 
						
							
							
								
								Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)  
							
							 
							
							
							
						 
						
							2021-02-01 19:45:25 +01:00  
						
					 
				
					
						
							
							
								 
								xoviat
							
						 
						
							 
							
							
							
							
								
							
							
								b60de4447a 
								
							 
						 
						
							
							
								
								add cortex-m platform  
							
							 
							
							
							
						 
						
							2021-01-19 08:57:44 -06:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								438a8e5624 
								
							 
						 
						
							
							
								
								Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds  
							
							 
							
							
							
						 
						
							2020-11-07 20:26:12 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								0155cd53a3 
								
							 
						 
						
							
							
								
								Add -msse3 where needed for DYNAMIC_ARCH builds  
							
							 
							
							
							
						 
						
							2020-11-03 23:45:49 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								b9bc76aec4 
								
							 
						 
						
							
							
								
								Add files via upload  
							
							 
							
							
							
						 
						
							2020-11-02 22:43:50 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								f64243ff57 
								
							 
						 
						
							
							
								
								Add compiler options for sse/sse2/ssse3/sse4.1  
							
							 
							
							
							
						 
						
							2020-10-16 10:47:06 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								e3a29f6b58 
								
							 
						 
						
							
							
								
								Change "HALF" and "sh" to "BFLOAT16" and "sb"  
							
							 
							
							
							
						 
						
							2020-10-12 00:07:37 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								68e6823d36 
								
							 
						 
						
							
							
								
								Adapt for supporting only a subset of variable types  
							
							 
							
							
							
						 
						
							2020-10-11 15:01:32 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								e1b7123bbe 
								
							 
						 
						
							
							
								
								Merge pull request  #2867  from Qiyu8/usimd-floatdot  
							
							 
							
							... 
							
							
							
							Optimize the performance of dot by using universal intrinsics in X86/ARM 
							
						 
						
							2020-10-10 12:10:25 +02:00  
						
					 
				
					
						
							
							
								 
								Qiyu8
							
						 
						
							 
							
							
							
							
								
							
							
								f32d34a015 
								
							 
						 
						
							
							
								
								add sse3 compiler flag  
							
							 
							
							
							
						 
						
							2020-10-10 10:36:15 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								a5feea6611 
								
							 
						 
						
							
							
								
								make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows  
							
							 
							
							
							
						 
						
							2020-10-04 23:01:06 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								c4aeeeb9f4 
								
							 
						 
						
							
							
								
								Activate all BUILD_ options if none was specified  
							
							 
							
							
							
						 
						
							2020-09-15 23:15:34 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								26792d2096 
								
							 
						 
						
							
							
								
								Copy BUILD_* directives to the compiler options to allow ifdef in tests  
							
							 
							
							
							
						 
						
							2020-09-13 21:47:55 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								68b1713c30 
								
							 
						 
						
							
							
								
								Merge pull request  #2811  from martin-frbg/issue2806  
							
							 
							
							... 
							
							
							
							Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well 
							
						 
						
							2020-09-01 17:19:14 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								bd3207b4b4 
								
							 
						 
						
							
							
								
								Update system.cmake  
							
							 
							
							
							
						 
						
							2020-08-19 22:51:10 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								b8ebfc9335 
								
							 
						 
						
							
							
								
								Update system.cmake  
							
							 
							
							
							
						 
						
							2020-08-19 22:30:19 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								71d33c952d 
								
							 
						 
						
							
							
								
								Typo fix  
							
							 
							
							
							
						 
						
							2020-08-19 17:44:23 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								6a3c074786 
								
							 
						 
						
							
							
								
								-march=cooperlake requires gcc10  
							
							 
							
							
							
						 
						
							2020-08-19 17:22:12 +02:00  
						
					 
				
					
						
							
							
								 
								Chen, Guobing
							
						 
						
							 
							
							
							
							
								
							
							
								e740c4873d 
								
							 
						 
						
							
							
								
								Enable COOPERLAKE build target  
							
							 
							
							... 
							
							
							
							Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target. 
							
						 
						
							2020-08-13 06:18:00 +08:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								6876221cf3 
								
							 
						 
						
							
							
								
								Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead  
							
							 
							
							
							
						 
						
							2020-06-14 17:40:24 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								3ce469a34f 
								
							 
						 
						
							
							
								
								Limit optimization level to O1 for flang and add -frecursive  
							
							 
							
							
							
						 
						
							2020-06-09 16:11:13 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								bb12c2c854 
								
							 
						 
						
							
							
								
								Limit MAX_STACK_ALLOC availability to non-Wndows  
							
							 
							
							
							
						 
						
							2020-06-04 19:07:27 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								6e97df7b47 
								
							 
						 
						
							
							
								
								Add CMAKE support for MAX_STACK_ALLOC setting  
							
							 
							
							
							
						 
						
							2020-06-04 14:45:31 +02:00  
						
					 
				
					
						
							
							
								 
								Rajalakshmi Srinivasaraghavan
							
						 
						
							 
							
							
							
							
								
							
							
								7eb55504b1 
								
							 
						 
						
							
							
								
								RFC : Add half precision gemm for bfloat16 in OpenBLAS  
							
							 
							
							... 
							
							
							
							This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved. 
							
						 
						
							2020-04-14 14:55:08 -05:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
							
							
								
							
							
								7f0d523b42 
								
							 
						 
						
							
							
								
								Make BUFFER_SIZE configurable  
							
							 
							
							
							
						 
						
							2020-02-09 23:32:57 +01:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								e3d846ab57 
								
							 
						 
						
							
							
								
								Do not use -march=native with the PGI compiler  
							
							 
							
							
							
						 
						
							2019-08-16 08:58:10 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								f69a0be712 
								
							 
						 
						
							
							
								
								Add getarch flags to disable AVX on x86  
							
							 
							
							... 
							
							
							
							(and other small fixes to match Makefile behaviour) 
							
						 
						
							2019-07-06 15:02:39 +02:00  
						
					 
				
					
						
							
							
								 
								Michael Lass
							
						 
						
							 
							
							
							
							
								
							
							
								7a9a4dbc4f 
								
							 
						 
						
							
							
								
								Fix detection of AVX512 capable compilers in getarch  
							
							 
							
							... 
							
							
							
							21eda8b5  introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway. 
							
						 
						
							2019-06-05 17:30:56 +02:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								1e52572be3 
								
							 
						 
						
							
							
								
								Add option USE_LOCKING for single-threaded build with locking support  
							
							 
							
							
							
						 
						
							2019-05-15 23:19:30 +02:00  
						
					 
				
					
						
							
							
								 
								luz.paz
							
						 
						
							 
							
							
							
							
								
							
							
								daf2fec12d 
								
							 
						 
						
							
							
								
								Misc. typo fixes  
							
							 
							
							... 
							
							
							
							Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` 
							
						 
						
							2019-04-29 17:03:56 -04:00  
						
					 
				
					
						
							
							
								 
								Martin Kroeker
							
						 
						
							 
							
							
								
								
							
							
							
								
							
							
								5952e586ce 
								
							 
						 
						
							
							
								
								Support DYNAMIC_LIST option in cmake  
							
							 
							
							... 
							
							
							
							e.g. cmake -DDYNAMIC_ARCH=1 -DDYNAMIC_LIST="NEHALEM;HASWELL;ZEN" ..
original issue was #1639  
							
						 
						
							2019-02-05 23:51:40 +01:00