b07d733a71 
								
							 
						 
						
							
							
								
								added updates for syrk and syr2k  
							
							
							
						 
						
							2016-01-21 13:16:44 +01:00  
				
					
						
							
							
								 
						
							
								055b481386 
								
							 
						 
						
							
							
								
								Fixed CMake bug for single core.  
							
							
							
						 
						
							2016-01-15 06:42:54 +08:00  
				
					
						
							
							
								 
						
							
								fbc21266e6 
								
							 
						 
						
							
							
								
								Minor C code fixes in driver/  
							
							
							
						 
						
							2015-11-09 14:15:49 +05:30  
				
					
						
							
							
								 
						
							
								d8392c1245 
								
							 
						 
						
							
							
								
								Fixe cmake config bugs.  
							
							
							
						 
						
							2015-10-20 04:30:55 +08:00  
				
					
						
							
							
								 
						
							
								f874465bb8 
								
							 
						 
						
							
							
								
								Use cmake to build OpenBLAS GENERIC Target on MSVC x86 64-bit.  
							
							... 
							
							
							
							Disable CBLAS and LAPACK. 
							
						 
						
							2015-08-10 14:10:44 -05:00  
				
					
						
							
							
								 
						
							
								9eaea02f33 
								
							 
						 
						
							
							
								
								Added additional gemm defines for complex types.  
							
							
							
						 
						
							2015-02-25 09:39:11 -06:00  
				
					
						
							
							
								 
						
							
								0d8e227ea7 
								
							 
						 
						
							
							
								
								Changed strategy for setting preprocessor definitions.  
							
							... 
							
							
							
							Instead of generating separate object files for each permutation of
defines for a source file, GenerateNamedObjects now writes an entirely
new source file and inserts the defines as #define c statements.
This solves a problem I ran into with ar.exe where it was refusing to
link objects that had the same filename despite having different paths. 
							
						 
						
							2015-02-24 12:26:33 -06:00  
				
					
						
							
							
								 
						
							
								371071d461 
								
							 
						 
						
							
							
								
								Added CONJ defines for trmm/trsm.  
							
							
							
						 
						
							2015-02-21 10:59:02 -06:00  
				
					
						
							
							
								 
						
							
								8a143516e3 
								
							 
						 
						
							
							
								
								Added alternate_name to a couple of the name mangling schemes.  
							
							... 
							
							
							
							Added zherk_k sources to driver/level3. 
							
						 
						
							2015-02-20 17:03:33 -06:00  
				
					
						
							
							
								 
						
							
								e5897ecb9b 
								
							 
						 
						
							
							
								
								Added zherk_kernel.c objects to driver/level3.  
							
							
							
						 
						
							2015-02-19 16:19:56 -06:00  
				
					
						
							
							
								 
						
							
								4662a0b13a 
								
							 
						 
						
							
							
								
								Changed generate functions to iterate through a list of float types.  
							
							... 
							
							
							
							This will generate obj files for SINGLE/DOUBLE/COMPLEX/DOUBLE COMPLEX. 
							
						 
						
							2015-02-15 17:44:37 -06:00  
				
					
						
							
							
								 
						
							
								e74462a3f5 
								
							 
						 
						
							
							
								
								Moved declarations to start of functions to satisfy MSVC C89 implementation.  
							
							
							
						 
						
							2015-02-11 11:16:57 -06:00  
				
					
						
							
							
								 
						
							
								056ba26755 
								
							 
						 
						
							
							
								
								Changed a number of inline calls to use __inline.  
							
							... 
							
							
							
							MSVC doesn't inmplement C99, so can't use the inline keyword. __inline
appears to work in MSVC and GCC. 
							
						 
						
							2015-02-11 11:13:17 -06:00  
				
					
						
							
							
								 
						
							
								e8c39138c6 
								
							 
						 
						
							
							
								
								Removed return value from GenerateNamedObjects.  
							
							... 
							
							
							
							It sets DBLAS_OBJS directly to save a bunch of list appending in the
CMakeLists.txt files. 
							
						 
						
							2015-02-09 12:28:09 -06:00  
				
					
						
							
							
								 
						
							
								627d5e7401 
								
							 
						 
						
							
							
								
								Added SMP objects to driver/level3.  
							
							
							
						 
						
							2015-02-05 12:22:48 -06:00  
				
					
						
							
							
								 
						
							
								943fa2fb58 
								
							 
						 
						
							
							
								
								Fixed object names in level2.  
							
							
							
						 
						
							2015-02-05 10:49:11 -06:00  
				
					
						
							
							
								 
						
							
								461e691127 
								
							 
						 
						
							
							
								
								Codes when define is absent are now a parameter to AllCombinations.  
							
							... 
							
							
							
							The level3 object names should now be correct. 
							
						 
						
							2015-02-05 09:23:47 -06:00  
				
					
						
							
							
								 
						
							
								cfaf1c678f 
								
							 
						 
						
							
							
								
								Added option to append define codes with an underscore.  
							
							... 
							
							
							
							Fixed the code array not getting reset on subsequent AllCombinations
calls. 
							
						 
						
							2015-02-05 09:17:18 -06:00  
				
					
						
							
							
								 
						
							
								0d7bad1f35 
								
							 
						 
						
							
							
								
								Changed GenerateObjects to append combination codes (e.g. dtrmm_TU).  
							
							
							
						 
						
							2015-02-05 09:02:54 -06:00  
				
					
						
							
							
								 
						
							
								d11bde60d0 
								
							 
						 
						
							
							
								
								DOUBLE define for DBLAS objects is now set in main CMakeLists.txt.  
							
							... 
							
							
							
							Since the objects are the same, could generate SINGLE/COMPLEX/etc here
without having to rewrite all the object enumeration code again. 
							
						 
						
							2015-02-02 15:00:44 -06:00  
				
					
						
							
							
								 
						
							
								5057a4b4df 
								
							 
						 
						
							
							
								
								Added openblas add_library call that uses DBLAS_OBJS ojbects.  
							
							
							
						 
						
							2015-01-30 15:21:21 -06:00  
				
					
						
							
							
								 
						
							
								d3dcdddf75 
								
							 
						 
						
							
							
								
								Moved functions into util cmake file.  
							
							
							
						 
						
							2015-01-30 13:47:40 -06:00  
				
					
						
							
							
								 
						
							
								e5e7595bf9 
								
							 
						 
						
							
							
								
								Added paramater to GenerateObjects for defines that affect all sources.  
							
							
							
						 
						
							2015-01-30 13:31:13 -06:00  
				
					
						
							
							
								 
						
							
								7693887d61 
								
							 
						 
						
							
							
								
								Added empty set to the combinations generated by AllCombinations.  
							
							
							
						 
						
							2015-01-30 13:01:11 -06:00  
				
					
						
							
							
								 
						
							
								8d9b196e0d 
								
							 
						 
						
							
							
								
								Moved loop over define combos into a function.  
							
							... 
							
							
							
							This function takes a set of sources and a set of preprocessor
definitions. It will iterate over the sources and build an object
file for each combination of preprocessor definitions for each
source file. 
							
						 
						
							2015-01-30 12:14:44 -06:00  
				
					
						
							
							
								 
						
							
								a6cf8aafc0 
								
							 
						 
						
							
							
								
								Updated level3/CMakeLists with correct defines using all combos.  
							
							
							
						 
						
							2015-01-30 11:21:50 -06:00  
				
					
						
							
							
								 
						
							
								dbdca7bf0c 
								
							 
						 
						
							
							
								
								Added first pass at driver/level3 Makefile conversion.  
							
							... 
							
							
							
							Added a rather convoluted CMake function to find all combinations
of a given list. This will be useful for the object files that are
compiled multiple times with different combinations of preprocessor
definitions. 
							
						 
						
							2015-01-29 22:53:11 -06:00  
				
					
						
							
							
								 
						
							
								7aae4a62e7 
								
							 
						 
						
							
							
								
								enabled use of GEMM3M functions  
							
							
							
						 
						
							2014-09-20 14:27:10 +02:00  
				
					
						
							
							
								 
						
							
								1d33547222 
								
							 
						 
						
							
							
								
								optimized zgemm kernel for haswell  
							
							
							
						 
						
							2014-07-27 11:51:42 +02:00  
				
					
						
							
							
								 
						
							
								3ea4dadd30 
								
							 
						 
						
							
							
								
								optimizations for trsm  
							
							
							
						 
						
							2014-07-25 11:59:17 +02:00  
				
					
						
							
							
								 
						
							
								1b10ff129a 
								
							 
						 
						
							
							
								
								optimizations for trmm  
							
							
							
						 
						
							2014-07-25 10:00:23 +02:00  
				
					
						
							
							
								 
						
							
								125610d23b 
								
							 
						 
						
							
							
								
								allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk  
							
							
							
						 
						
							2014-07-24 18:43:31 +02:00  
				
					
						
							
							
								 
						
							
								be94db096c 
								
							 
						 
						
							
							
								
								disabled *3M functions for x86_64 platforms  
							
							
							
						 
						
							2014-07-01 16:18:05 +02:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								c947ab85dc 
								
							 
						 
						
							
							
								
								changed level3.c  
							
							
							
						 
						
							2013-12-01 13:46:30 +01:00  
				
					
						
							
							
								 
						
							
								2840d56aeb 
								
							 
						 
						
							
							
								
								added dgemm_kernel for Piledriver  
							
							
							
						 
						
							2013-10-19 09:47:15 +02:00  
				
					
						
							
							
								 
						
							
								77b572fa0b 
								
							 
						 
						
							
							
								
								Merge branch 'loongson3a' into develop  
							
							... 
							
							
							
							Conflicts:
	Makefile.system 
							
						 
						
							2013-07-20 22:33:17 +08:00  
				
					
						
							
							
								 
						
							
								32d2ca3035 
								
							 
						 
						
							
							
								
								Refs  #214 ,  #221 ,  #246 . Fixed the getrf overflow bug on Windows.  
							
							... 
							
							
							
							I used a smaller threshold since the stack size is 1MB on windows. 
							
						 
						
							2013-07-11 03:20:02 +08:00  
				
					
						
							
							
								 
						
							
								6f008abcef 
								
							 
						 
						
							
							
								
								replaced defined(DOUBLE) by !defined(XDOUBLE)  
							
							
							
						 
						
							2013-07-09 18:17:50 +02:00  
				
					
						
							
							
								 
						
							
								5d3312142a 
								
							 
						 
						
							
							
								
								Refs  #221   #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.  
							
							... 
							
							
							
							When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
  volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t          job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation. 
							
						 
						
							2013-07-08 01:07:05 +08:00  
				
					
						
							
							
								 
						
							
								25491e42f9 
								
							 
						 
						
							
							
								
								New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S  
							
							
							
						 
						
							2013-06-08 09:40:17 +02:00  
				
					
						
							
							
								 
						
							
								6b01d58712 
								
							 
						 
						
							
							
								
								Disable the optimization of muli-threading gemm on the Loongson3A.  
							
							
							
						 
						
							2013-03-30 20:12:43 +00:00  
				
					
						
							
							
								 
						
							
								8163ab7e55 
								
							 
						 
						
							
							
								
								Change the block size on Loongson 3B.  
							
							
							
						 
						
							2011-11-23 18:41:49 +00:00  
				
					
						
							
							
								 
						
							
								9fe3049de6 
								
							 
						 
						
							
							
								
								Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.  
							
							
							
						 
						
							2011-09-26 15:21:45 +00:00  
				
					
						
							
							
								 
						
							
								831858b883 
								
							 
						 
						
							
							
								
								Modify aligned address of sa and sb to improve the performance of multi-threads.  
							
							
							
						 
						
							2011-09-23 20:59:48 +00:00  
				
					
						
							
							
								 
						
							
								1b97ec1a7c 
								
							 
						 
						
							
							
								
								Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.  
							
							
							
						 
						
							2011-02-26 11:19:54 +08:00  
				
					
						
							
							
								 
						
							
								342bbc3871 
								
							 
						 
						
							
							
								
								Import GotoBLAS2 1.13 BSD version codes.  
							
							
							
						 
						
							2011-01-24 14:54:24 +00:00