OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Jia-Chen	b610d2de37	optimize cgemm on ARM cortex A53 & cortex A55	2021-12-12 17:22:52 +08:00
Bine Brank	a8f62a347b	fix UNROLL_MN and add to targets for SVE	2021-12-11 16:37:23 +01:00
Bine Brank	a1fea1fe2a	sgemm v2x8 SVE kernel	2021-12-05 18:47:29 +01:00
Bine Brank	abe1ce3434	strmm sve v1x8 kernel	2021-12-05 14:03:08 +01:00
Bine Brank	0de36f7b5c	trmm sve copy fucntions for single precision	2021-11-29 21:25:05 +01:00
Bine Brank	86ae89bf33	add sgemm kernel and copy functions for sgemm and ssymm	2021-11-28 18:12:47 +01:00
Martin Kroeker	454edd741c	Merge pull request #3425 from binebrank/arm_sve_dgemm Add dgemm kernel for arm64 SVE	2021-11-26 16:14:55 +01:00
Jia-Chen	5c1cd5e0c2	MOD: add comments to a53 zgemm kernel	2021-11-25 22:48:48 +08:00
Jia-Chen	9f59b19fcd	MOD: optimize zgemm on cortex-A53/cortex-A55	2021-11-24 21:51:45 +08:00
Bine Brank	531a28b6a0	removed unused code (compiler warnings)	2021-11-22 10:12:34 +01:00
Bine Brank	9b9cb90bb1	modify Makefile for SVE copy	2021-11-22 09:54:20 +01:00
Bine Brank	b58d4f31ab	some clean-up & commentary	2021-11-21 14:56:27 +01:00
Bine Brank	e6ed4be02e	symm SVE copy rutines	2021-11-20 16:35:29 +01:00
Jia-Chen	302f22693a	MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55	2021-11-18 21:14:43 +08:00
Bine Brank	3c7eed0e53	add remaining trmm copy rutines for SVE	2021-11-14 16:00:10 +01:00
Bine Brank	7d996b1c36	dtrmm_utcopy sve function	2021-11-13 18:48:53 +01:00
Bine Brank	ab7917910d	add v2x8 kernel + fix sve dtrmm	2021-11-07 20:37:51 +01:00
Bine Brank	7093372e32	add ARMV8SVE target	2021-11-01 22:53:21 +01:00
Bine Brank	a8fbdbac34	fix sve dgemm kernel + sve dtrmm	2021-10-31 10:24:25 +01:00
Bine Brank	746b4f0f17	added SVE ncopy and tcopy	2021-10-30 12:11:44 +02:00
Bine Brank	1a10d3e09d	add sve dgemm prototype	2021-10-27 16:37:18 +02:00
Martin Kroeker	22bf5c27ba	Add basic support for the Fujitsu A64FX (#3415 ) * Add initial support for Fujitsu A64FX as generic ARMV8	2021-10-18 15:00:19 +02:00
Martin Kroeker	8c20ca345a	Use Neoverse's current mix of ThunderX2 kernels for Vortex as well	2021-10-06 11:06:43 +02:00
Martin Kroeker	90cc944625	Move alphaI to x22 to leave x18 unused (reserved on OSX)	2021-09-17 09:53:18 +02:00
Martin Kroeker	590fbff06e	move alpha to x19/x20 to leave x18 unused for OSX	2021-09-17 09:42:17 +02:00
Martin Kroeker	380940271b	Move temp to x21 to leave x18 unused (reserved on OSX)	2021-09-17 09:28:19 +02:00
Martin Kroeker	7d75177446	Move temp to x21 to leave x18 unused (reserved on OSX)	2021-09-17 09:24:11 +02:00
Martin Kroeker	0a4ac4b585	Use x21 for I to leave x18 unused (reserved on OSX)	2021-09-17 09:19:51 +02:00
Martin Kroeker	7d4a221579	Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)	2021-09-17 09:18:25 +02:00
User User-User	39ef0880ae	copy conf	2021-06-19 21:49:58 +02:00
Gilles Gouaillardet	9d292d37b2	arm64: add the missing d9 register to the clobber list Refs. numpy/numpy#18422 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2021-06-14 17:01:28 +09:00
CodesWithWolves	d2bda3b56a	Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro There appears to have been some code leak when copying from the COPY2x8 macro above where we're reading 8 bytes into d4-d7 directly after reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can possibly overrun the boundary of allocated memory -- Valgrind detected this which is what dragged my attention to it for a 128,1 copy. Additionally, there is no need to update the addresses stored in A0-A7 as the only possible paths after running this macro will overwrite A0-7 if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows -- in which case A4-7 are unused.	2021-03-31 15:44:25 -04:00
Martin Kroeker	b716c0ef01	Add workaround for NVIDIA HPC	2021-01-12 16:51:35 +01:00
Martin Kroeker	2efa3b70dc	Add workaround for NVIDIA HPC	2021-01-12 16:49:39 +01:00
Martin Kroeker	49959d4f1c	Add workaround for NVIDIA HPC	2021-01-12 16:47:15 +01:00
Martin Kroeker	0f27a03607	Add workaround for NVIDIA HPC mishandling of the asm DOT kernels	2021-01-12 16:39:35 +01:00
Martin Kroeker	c2a8ebfe69	Add workaround for NVIDIA HPC mishandling of the asm DOT kernels	2021-01-12 16:38:51 +01:00
Ashwin Sekhar T K	1b2508362b	arm64: Fix nrm2 for input vectors with Inf Fix double precision nrm2 kernels returning NaN when the input vectors contain Inf/-Inf.	2021-01-01 02:49:37 -08:00
Martin Kroeker	8631e2976a	Temporarily revert to the old nrm2 kernels	2020-12-21 07:45:13 +01:00
Martin Kroeker	2768bc1764	Temporarily revert to the old nrm2 kernels	2020-12-21 07:42:51 +01:00
Martin Kroeker	6f4698ee1f	Temporarily revert to the old nrm2 kernel	2020-12-21 07:41:18 +01:00
Martin Kroeker	e1b7123bbe	Merge pull request #2867 from Qiyu8/usimd-floatdot Optimize the performance of dot by using universal intrinsics in X86/ARM	2020-10-10 12:10:25 +02:00
User User-User	d2333e7842	aarch64 fix std=c18 compilation	2020-10-03 18:00:34 +03:00
Qiyu8	60e6c68e38	Adapt ARM architect	2020-09-29 16:36:14 +08:00
Martin Kroeker	775a87242d	Rename KERNEL.SILICON to KERNEL.VORTEX	2020-09-03 08:44:20 +02:00
Martin Kroeker	80794fe8fd	Create KERNEL.SILICON	2020-09-02 22:56:58 +02:00
Ashwin Sekhar T K	4e1be0e481	ARM64: Add THUNDERX3T110 Target	2020-07-26 23:32:24 -07:00
ZhangDanfeng	bc6fd20a40	fix INIT8x4 Signed-off-by: ZhangDanfeng <467688405@qq.com>	2020-06-10 01:01:16 +08:00
ZhangDanfeng	9b7877ccf1	sgemm copy source init Signed-off-by: ZhangDanfeng <467688405@qq.com>	2020-06-04 02:10:45 +08:00
ZhangDanfeng	f82fa802d1	Insert prefetch Signed-off-by: ZhangDanfeng <467688405@qq.com>	2020-06-04 02:08:48 +08:00

1 2 3 4

166 Commits