711ca33bc6 
								
							 
						 
						
							
							
								
								Improved Ximatcopy when lda==ldb.  
							
							... 
							
							
							
							The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change. 
							
						 
						
							2015-09-07 14:36:16 +02:00  
				
					
						
							
							
								 
						
							
								f8f2e261fe 
								
							 
						 
						
							
							
								
								use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD  
							
							
							
						 
						
							2015-05-06 10:41:53 +02:00  
				
					
						
							
							
								 
						
							
								ab567d8443 
								
							 
						 
						
							
							
								
								gemv: Ensure stack buffer is large enough to handle memory alignment  
							
							... 
							
							
							
							Ref #478  
							
						 
						
							2015-04-24 10:12:49 +02:00  
				
					
						
							
							
								 
						
							
								847e19c04e 
								
							 
						 
						
							
							
								
								Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)  
							
							
							
						 
						
							2015-04-20 23:22:40 -05:00  
				
					
						
							
							
								 
						
							
								fd9fd42936 
								
							 
						 
						
							
							
								
								Refs  #478 ,  #482 . Fixed bug on previous commit.  
							
							
							
						 
						
							2015-04-13 23:22:27 -05:00  
				
					
						
							
							
								 
						
							
								9798481979 
								
							 
						 
						
							
							
								
								Refs  #478 ,  #482 . Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.  
							
							... 
							
							
							
							For gemv_t, directly use malloc to create the buffer. 
							
						 
						
							2015-04-13 19:45:27 -05:00  
				
					
						
							
							
								 
						
							
								cdefdb21cd 
								
							 
						 
						
							
							
								
								Refs  #492 . Fixed c/zsyr bug with negative incx.  
							
							
							
						 
						
							2015-02-26 06:37:03 +08:00  
				
					
						
							
							
								 
						
							
								39cc6b21d3 
								
							 
						 
						
							
							
								
								Add ATLAS-style ?geadd function  
							
							
							
						 
						
							2015-02-16 13:46:20 +01:00  
				
					
						
							
							
								 
						
							
								b17ccb4c5c 
								
							 
						 
						
							
							
								
								Fix a segfault in gemv when MAX_STACK_ALLOC is set  
							
							... 
							
							
							
							* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed. 
							
						 
						
							2015-01-29 09:55:57 +01:00  
				
					
						
							
							
								 
						
							
								e9d9a8eae3 
								
							 
						 
						
							
							
								
								Allow to do gemv and ger buffer allocation on the stack  
							
							... 
							
							
							
							ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.
Fix  #478  
							
						 
						
							2014-12-27 14:33:12 +01:00  
				
					
						
							
							
								 
						
							
								9e829ce98f 
								
							 
						 
						
							
							
								
								enabled cblas gemm3m functions  
							
							
							
						 
						
							2014-09-20 17:20:02 +02:00  
				
					
						
							
							
								 
						
							
								d49fd33885 
								
							 
						 
						
							
							
								
								disabled SYMM3M and HEMM3M functions because segment violations  
							
							
							
						 
						
							2014-09-20 15:27:40 +02:00  
				
					
						
							
							
								 
						
							
								7aae4a62e7 
								
							 
						 
						
							
							
								
								enabled use of GEMM3M functions  
							
							
							
						 
						
							2014-09-20 14:27:10 +02:00  
				
					
						
							
							
								 
						
							
								3300f5ebff 
								
							 
						 
						
							
							
								
								optimized multithreading lower limits  
							
							
							
						 
						
							2014-09-15 11:38:25 +02:00  
				
					
						
							
							
								 
						
							
								fd2478c9e2 
								
							 
						 
						
							
							
								
								optimized interface/zgemv.c for multithreading  
							
							
							
						 
						
							2014-09-12 19:18:23 +02:00  
				
					
						
							
							
								 
						
							
								1cba8e7b11 
								
							 
						 
						
							
							
								
								Merge pull request  #446  from grisuthedragon/cblas_matcopy  
							
							... 
							
							
							
							Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines. 
							
						 
						
							2014-09-10 16:31:31 +08:00  
				
					
						
							
							
								 
						
							
								a057e5434d 
								
							 
						 
						
							
							
								
								add CBLAS interface for s/d/c/zimatcopy  
							
							
							
						 
						
							2014-09-09 09:52:13 +02:00  
				
					
						
							
							
								 
						
							
								7794766d3c 
								
							 
						 
						
							
							
								
								Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.  
							
							
							
						 
						
							2014-09-08 17:57:44 +02:00  
				
					
						
							
							
								 
						
							
								f511807fc0 
								
							 
						 
						
							
							
								
								modified multithreading threshold  
							
							
							
						 
						
							2014-09-08 12:27:32 +02:00  
				
					
						
							
							
								 
						
							
								d1800397f5 
								
							 
						 
						
							
							
								
								optimized interface/gemv.c for multithreading  
							
							
							
						 
						
							2014-09-02 17:36:07 +02:00  
				
					
						
							
							
								 
						
							
								f4ff889491 
								
							 
						 
						
							
							
								
								updated interface/gemv.c for multithreading  
							
							
							
						 
						
							2014-09-02 16:30:04 +02:00  
				
					
						
							
							
								 
						
							
								51413925bd 
								
							 
						 
						
							
							
								
								adjust number of threads for small size in cgemv and zgemv  
							
							
							
						 
						
							2014-07-15 16:27:02 +02:00  
				
					
						
							
							
								 
						
							
								b985cea65d 
								
							 
						 
						
							
							
								
								adjust number of threads for sgemv and dgemv  
							
							
							
						 
						
							2014-07-15 16:04:46 +02:00  
				
					
						
							
							
								 
						
							
								d286daa2ba 
								
							 
						 
						
							
							
								
								adjusted number of threads for small size  
							
							
							
						 
						
							2014-07-15 14:41:35 +02:00  
				
					
						
							
							
								 
						
							
								cedc1f4b14 
								
							 
						 
						
							
							
								
								Ref  #410 : disabled optimized potri functions ( single threading bug)  
							
							
							
						 
						
							2014-07-10 13:42:32 +02:00  
				
					
						
							
							
								 
						
							
								02a504c0b8 
								
							 
						 
						
							
							
								
								fixed my bug in ger.c  
							
							
							
						 
						
							2014-07-02 10:39:33 +02:00  
				
					
						
							
							
								 
						
							
								be94db096c 
								
							 
						 
						
							
							
								
								disabled *3M functions for x86_64 platforms  
							
							
							
						 
						
							2014-07-01 16:18:05 +02:00  
				
					
						
							
							
								 
						
							
								aee61456a4 
								
							 
						 
						
							
							
								
								disabled SMP for sbmv and zsbmv again  
							
							
							
						 
						
							2014-06-29 21:18:38 +02:00  
				
					
						
							
							
								 
						
							
								01a119abfc 
								
							 
						 
						
							
							
								
								enabled SMP for sbmv and zsbmv, but only for 64bit binaries  
							
							
							
						 
						
							2014-06-29 20:35:56 +02:00  
				
					
						
							
							
								 
						
							
								1fad2b759f 
								
							 
						 
						
							
							
								
								enabled smp for ger.c and zger.c, but only for 64bit binaries  
							
							
							
						 
						
							2014-06-29 16:43:04 +02:00  
				
					
						
							
							
								 
						
							
								6c2ead30f0 
								
							 
						 
						
							
							
								
								Remove all trailing whitespace except lapack-netlib  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-06-27 12:05:18 -07:00  
				
					
						
							
							
								 
						
							
								15d5dfa92c 
								
							 
						 
						
							
							
								
								fixed compiler warnings  
							
							
							
						 
						
							2014-06-25 11:32:44 +02:00  
				
					
						
							
							
								 
						
							
								86d8c8978b 
								
							 
						 
						
							
							
								
								Ref  #391 : disabled SMP in ger.c and zger.c  
							
							
							
						 
						
							2014-06-22 12:01:24 +02:00  
				
					
						
							
							
								 
						
							
								a19d209005 
								
							 
						 
						
							
							
								
								Ref  #103 : enhancement for small matrix dimensions  
							
							
							
						 
						
							2014-06-18 15:04:11 +02:00  
				
					
						
							
							
								 
						
							
								faeab93df0 
								
							 
						 
						
							
							
								
								Ref  #51 : added blas extensions simatcopy, dimatcopy, cimatcopy, zimatcopy  
							
							
							
						 
						
							2014-06-10 16:14:34 +02:00  
				
					
						
							
							
								 
						
							
								cee257f384 
								
							 
						 
						
							
							
								
								Ref  #51 : added blas extensions zomatcopy and comatcopy  
							
							
							
						 
						
							2014-06-10 10:34:54 +02:00  
				
					
						
							
							
								 
						
							
								7bfb3011e8 
								
							 
						 
						
							
							
								
								Ref  #51 : added blas extension somatcopy  
							
							
							
						 
						
							2014-06-09 20:21:13 +02:00  
				
					
						
							
							
								 
						
							
								8c8f596238 
								
							 
						 
						
							
							
								
								Ref  #51 : added blas extension domatcopy as not opimized reference  
							
							
							
						 
						
							2014-06-09 17:11:07 +02:00  
				
					
						
							
							
								 
						
							
								bff575d0b1 
								
							 
						 
						
							
							
								
								Ref  #375 : added workaround for small sizes to scal.c and zscal.c  
							
							
							
						 
						
							2014-06-08 13:49:19 +02:00  
				
					
						
							
							
								 
						
							
								faf3ac0aad 
								
							 
						 
						
							
							
								
								Ref  #285 : added axpby kernels  
							
							
							
						 
						
							2014-06-08 11:54:24 +02:00  
				
					
						
							
							
								 
						
							
								b31ec99372 
								
							 
						 
						
							
							
								
								Fixed   #374 .  
							
							... 
							
							
							
							Merge branch 'TimothyGu-develop' into develop 
							
						 
						
							2014-06-05 17:01:44 +08:00  
				
					
						
							
							
								 
						
							
								25e899b60b 
								
							 
						 
						
							
							
								
								fixed function profile in zpotri.c  
							
							
							
						 
						
							2014-05-25 09:15:22 +02:00  
				
					
						
							
							
								 
						
							
								89da450800 
								
							 
						 
						
							
							
								
								enabled and tested optimized potri lapack functions  
							
							
							
						 
						
							2014-05-23 12:14:30 +02:00  
				
					
						
							
							
								 
						
							
								c26bbee489 
								
							 
						 
						
							
							
								
								enabled abd tested optimized trtri lapack functions  
							
							
							
						 
						
							2014-05-23 10:55:39 +02:00  
				
					
						
							
							
								 
						
							
								ced13574a0 
								
							 
						 
						
							
							
								
								Random "walk (a)round" --> "work-around" typo fixes  
							
							... 
							
							
							
							Signed-off-by: Timothy Gu <timothygu99@gmail.com> 
							
						 
						
							2014-05-22 18:11:52 -07:00  
				
					
						
							
							
								 
						
							
								a748d3a75d 
								
							 
						 
						
							
							
								
								enabled optimized trti2 lapack functions again  
							
							
							
						 
						
							2014-05-21 11:02:07 +02:00  
				
					
						
							
							
								 
						
							
								a5ab231ad4 
								
							 
						 
						
							
							
								
								enabled optimized complex lauum lapack functions again  
							
							
							
						 
						
							2014-05-21 10:35:28 +02:00  
				
					
						
							
							
								 
						
							
								dbaeea7b59 
								
							 
						 
						
							
							
								
								enabled lauu2 and lauum lapack functions again  
							
							
							
						 
						
							2014-05-21 09:49:18 +02:00  
				
					
						
							
							
								 
						
							
								0d75f3b6a2 
								
							 
						 
						
							
							
								
								enabled and tested optimized gesv lapack functions  
							
							
							
						 
						
							2014-05-19 14:44:53 +02:00  
				
					
						
							
							
								 
						
							
								abad6f66d6 
								
							 
						 
						
							
							
								
								marked trti2.c and ztrti2.c as bad  
							
							
							
						 
						
							2014-05-19 13:50:02 +02:00