|  Jia-Chen | 302f22693a | MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 | 2021-11-18 21:14:43 +08:00 | 
				
					
						|  Martin Kroeker | 22bf5c27ba | Add basic support for the Fujitsu A64FX (#3415) * Add initial support for Fujitsu A64FX as generic ARMV8 | 2021-10-18 15:00:19 +02:00 | 
				
					
						|  Wangyang Guo | 63a103ba6e | sbgemm: spr: disable small matrix path by default | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 82194ea9d2 | sbgemm: spr: implement otcopy_16 | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 8632380a96 | sbgemm: spr: reuse ncopy_16 from cooperlake as incopy | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 6bc8204ce5 | sbgemm: spr: optimization for tmp_c buffer | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | f018aa342a | sbgemm: spr: kernel handle alpha != 1.0 | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | a52456b168 | sbgemm: spr: oncopy: use tile load/store instead | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | f2485352a6 | sbgemm: spr: only load A once in tail_k handling | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 9ab33228bb | sbgemm: spr: process k2 and odd k at the same time | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 10d52646e2 | sbgemm: spr: oncopy: avoid handling too much pointer at a time | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 88154ed02d | sbgemm: spr: reduce tile conf loading by seperate tail k handling | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | a70bfb52d5 | sbgemm: spr: kernel works for NN case when alpha is 1.0 | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 6051c86741 | sbgemm: spr: kernel works for m32 in NN case | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | d0b253ac6e | sbgemm: spr: implement oncopy_16 | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 1d48b7cb16 | sbgemm: spr: add dummy source files | 2021-10-17 19:08:03 -07:00 | 
				
					
						|  Wangyang Guo | 3dc6052c7e | initial support for Sapphire Rapids platform | 2021-10-12 01:30:40 -07:00 | 
				
					
						|  Martin Kroeker | 8c20ca345a | Use Neoverse's current mix of ThunderX2 kernels for Vortex as well | 2021-10-06 11:06:43 +02:00 | 
				
					
						|  Martin Kroeker | 8e4c209002 | Merge pull request #3398 from kavanabhat/aix_p10_gnuas Big Endian Changes for Power10 kernels | 2021-10-05 18:59:47 +02:00 | 
				
					
						|  kavanabhat | 9cc95e5657 | AIX changes for P10 with GNU Compiler | 2021-10-01 05:18:35 -05:00 | 
				
					
						|  kavanabhat | fe3c778c51 | AIX changes for P10 with GNU Compiler | 2021-09-30 06:06:27 -05:00 | 
				
					
						|  Wangyang Guo | ee5ca8a328 | x86_64: BFLOAT16: fix build warning | 2021-09-28 18:30:06 +08:00 | 
				
					
						|  Martin Kroeker | 90cc944625 | Move alphaI to x22 to leave x18 unused (reserved on OSX) | 2021-09-17 09:53:18 +02:00 | 
				
					
						|  Martin Kroeker | 590fbff06e | move alpha to x19/x20 to leave x18 unused for OSX | 2021-09-17 09:42:17 +02:00 | 
				
					
						|  Martin Kroeker | 380940271b | Move temp to x21 to leave x18 unused (reserved on OSX) | 2021-09-17 09:28:19 +02:00 | 
				
					
						|  Martin Kroeker | 7d75177446 | Move temp to x21 to leave x18 unused (reserved on OSX) | 2021-09-17 09:24:11 +02:00 | 
				
					
						|  Martin Kroeker | 0a4ac4b585 | Use x21 for I to leave x18 unused (reserved on OSX) | 2021-09-17 09:19:51 +02:00 | 
				
					
						|  Martin Kroeker | 7d4a221579 | Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) | 2021-09-17 09:18:25 +02:00 | 
				
					
						|  Martin Kroeker | d3a9c7ef7f | Merge pull request #3382 from rafaelcfsousa/rafael/cwarnings [POWER] Remove unused variable warnings. | 2021-09-17 09:15:16 +02:00 | 
				
					
						|  Martin Kroeker | 8dfa61a61c | Initialize abs_mask1 with itself to silence a gcc warning | 2021-09-15 22:11:35 +02:00 | 
				
					
						|  Martin Kroeker | 99aa10b3ff | Initialize abs_mask1 with itself to silence a gcc warning actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s | 2021-09-15 22:10:43 +02:00 | 
				
					
						|  Rafael Cardoso Fernandes Sousa | b751edf624 | Fix unused variable warnings on Power | 2021-09-15 13:36:07 -05:00 | 
				
					
						|  Martin Kroeker | 80346b8813 | Merge pull request #3379 from martin-frbg/issue3369-2 Add casts to fix compiler warnings for SkylakeX sasum/dasum | 2021-09-15 07:18:57 +02:00 | 
				
					
						|  Martin Kroeker | ce036a2fc0 | Add casts | 2021-09-14 21:41:53 +02:00 | 
				
					
						|  Martin Kroeker | ddf106f769 | Add dedicated entries for BFLOAT16 kernels | 2021-09-14 16:17:18 +02:00 | 
				
					
						|  Martin Kroeker | af8843875a | Merge pull request #3376 from martin-frbg/issue3370 Fix a few harmless compiler warnings | 2021-09-12 00:01:31 +02:00 | 
				
					
						|  Martin Kroeker | 0925dfe2c9 | One instance of kernel_4x1 is used even on SKX | 2021-09-11 15:30:19 +02:00 | 
				
					
						|  Martin Kroeker | 7d873a329f | Add ifdefs around conditionally used functions | 2021-09-11 14:38:47 +02:00 | 
				
					
						|  Martin Kroeker | ef24712030 | Move a conditionally used variable | 2021-09-11 14:37:44 +02:00 | 
				
					
						|  Martin Kroeker | d17238599b | Add casts | 2021-09-11 13:38:28 +02:00 | 
				
					
						|  Wangyang Guo | 59a1114d03 | sbgemm: cooperlake: tuning for small matrix | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | 682d66555d | sbgemm: cooperlake: implement ncopy_16 | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | beccb83b16 | sbgemm: cooperlake: add n24 kernel for tcopy_4 | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | 5fcacad32b | sbgemm: cooperlake: implement tcopy_4 | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | bb1c4fa5bd | sbgemm: cooperlake: prefetch A & B | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | 7a2d1601ec | sbgemm: cooperlake: unroll core loop by 2 | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | 45fdf951b6 | sbgemm: cooperlake: reorder ptr increase for performance | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | cece3541ab | sbgemm: cooperlake: fix bug in m64n12 | 2021-09-07 21:30:46 +08:00 | 
				
					
						|  Wangyang Guo | 9df0953cde | sbgemm: cooperlake: kernel works for NN | 2021-09-07 21:30:45 +08:00 | 
				
					
						|  Wangyang Guo | 2ec9f3a8aa | sbgemm: cooperlake: change kernel size to 16x4 | 2021-09-07 21:30:45 +08:00 |