OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Martin Kroeker	25b0c48082	Update zscal.c	2024-01-08 09:49:18 +01:00
Martin Kroeker	5e7f714e93	Update zscal.c	2024-01-08 08:17:40 +01:00
Martin Kroeker	cf8b03ae8b	Use NAN rather than SNAN for portability	2024-01-07 23:09:57 +01:00
Martin Kroeker	f0808d856b	Handle NAN in input	2024-01-07 20:27:29 +01:00
Martin Kroeker	acf17a825d	Handle NAN in input	2024-01-07 20:26:16 +01:00
Martin Kroeker	c9df62e883	Fix handling of NAN	2024-01-07 17:49:40 +01:00
Martin Kroeker	def4996170	Fix handling of NAN and INF arguments	2024-01-07 15:29:42 +01:00
Martin Kroeker	519b40fad9	Merge pull request #4398 from yinshiyou/la-dev Add Optimizations for LoongArch.	2023-12-30 19:51:08 +01:00
pengxu	a5d0d21378	loongarch64: Add zgemm and cgemm optimization	2023-12-29 18:06:26 +08:00
gxw	546f13558c	loongarch64: Add {c/z}swap and {c/z}sum optimization	2023-12-29 17:30:57 +08:00
Hao Chen	edabb93668	loongarch64: Refine axpby optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	1ec5dded43	loongarch64: Add c/zrot optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	3c53ded315	loongarch64: Add c/znrm2 optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	fbd612f8c4	loongarch64: Add ic/zamin optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	d97272cb35	loongarch64: Add c/zdot optimization functions.	2023-12-29 17:30:57 +08:00
Hao Chen	65a0aeb128	loongarch64: Add c/zcopy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	2a34fb4b80	loongarch64: Add and refine scal optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	8785e948b5	loongarch64: Add camin optimization function.	2023-12-29 17:30:57 +08:00
Hao Chen	0753848e03	loongarch64: Refine and add axpy optimization functions. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	06fd5b5995	loongarch64: Add and Refine asum optimization functions.	2023-12-29 17:30:57 +08:00
guxiwei	e771be185e	Optimize copy functions with lsx. Signed-off-by: Hao Chen <chenhao@loongson.cn>	2023-12-29 17:30:57 +08:00
Hao Chen	179ed51d3b	Add dgemm_kernel_8x4.S file.	2023-12-29 17:30:57 +08:00
Hao Chen	173a65d4e6	loongarch64: Add and refine iamax optimization functions.	2023-12-29 17:30:57 +08:00
zhoupeng	ea70e165c7	loongarch64: Refine rot optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	116aee7527	loongarch64: Refine imin optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	8be2654193	loongarch64: Refine imax optimization.	2023-12-29 17:30:57 +08:00
zhoupeng	154baad454	loongarch64: Refine iamin optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	36c12c4971	loongarch64: Refine copy,swap,nrm2,sum optimization.	2023-12-29 17:30:57 +08:00
Shiyou Yin	c6996a80e9	loongarch64: Refine amax,amin,max,min optimization.	2023-12-29 17:30:57 +08:00
Chris Sidebottom	ecae1389df	Reduce duplication in kernel definitions These files are exactly the same, so I believe we can reduce these files down. Other files require a slightly more complex unpicking.	2023-12-23 12:39:53 +00:00
Chris Sidebottom	60e66725e4	Use numeric labels to allow repeated inlining	2023-12-19 13:11:06 +00:00
Chris Sidebottom	7a4fef4f60	Tweak SVE dot kernel This changes the SVE dot kernel to only predicate when necessary as well as streamlining the assembly a bit. The benchmarks seem to indicate this can improve performance by ~33%.	2023-12-19 12:08:54 +00:00
Martin Kroeker	f06b535566	Use C kernel for dgemv_t due to limitations of the old assembly one	2023-12-15 09:58:44 +01:00
barracuda156	d9653af018	KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366	2023-12-14 12:00:11 +08:00
Chip-Kerchner	93747fb377	Merge remote-tracking branch 'origin/develop' into power10Copies	2023-12-12 09:32:49 -06:00
Chip-Kerchner	4e738e561a	Replace two vector loads with one vector pair load and fix endianess of stores.	2023-12-08 12:36:08 -06:00
yancheng	d32f38fb37	loongarch64: Add optimizations for nrm2.	2023-12-07 14:36:26 +08:00
yancheng	f9b468990e	loongarch64: Add optimizations for rot.	2023-12-07 14:36:26 +08:00
yancheng	c80e7e27d1	loongarch64: Add optimizations for sum and asum.	2023-12-07 14:36:26 +08:00
yancheng	d4c96a35a8	loongarch64: Add optimizations for axpy and axpby.	2023-12-07 14:36:26 +08:00
yancheng	360acc0a41	loongarch64: Add optimizations for swap.	2023-12-07 14:36:26 +08:00
yancheng	174c25766b	loongarch64: Add optimizations for copy.	2023-12-07 14:36:26 +08:00
yancheng	49829b2b7d	loongarch64: Add optimizations for iamin.	2023-12-07 14:36:07 +08:00
yancheng	be83f5e4e0	loongarch64: Add optimizations for iamax.	2023-12-07 14:36:07 +08:00
yancheng	e3fb2b5afa	loongarch64: Add optimizations for imin.	2023-12-07 14:36:07 +08:00
yancheng	e46b48e372	loongarch64: Add optimizations for imax.	2023-12-07 14:36:07 +08:00
yancheng	702fc1d56d	loongarch64: Add optimization for min.	2023-12-07 14:36:07 +08:00
yancheng	346b384d1c	loongarch64: Add optimization for max.	2023-12-07 14:36:07 +08:00
yancheng	ff2ecc6cda	loongarch64: Add optimization for amin.	2023-12-07 14:36:07 +08:00
yancheng	265b5f2e80	loongarch64: Add optimizations for amax.	2023-12-07 14:36:07 +08:00
yancheng	993ede7c70	loongarch64: Add optimizations for scal.	2023-12-07 14:36:07 +08:00
Octavian Maghiar	4a12cf53ec	[RISC-V] Improve RVV kernel generator LMUL usage The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers. Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax. At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW. Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.	2023-12-04 11:13:35 +00:00
Octavian Maghiar	e4586e81b8	[RISC-V] Add RISC-V Vector 128-bit target Current RVV x280 target depends on vlen=512-bits for Level 3 operations. Commit adds generic target that supports vlen=128-bits. New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations. Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.	2023-12-04 11:02:18 +00:00
Martin Kroeker	39bf8ece20	Merge pull request #4340 from yinshiyou/la-dev Add some refines and optimizations for LoongArch.	2023-11-29 08:22:25 +01:00
Shiyou Yin	9fe07d82fd	loongarch: Add LSX optimization for dot.	2023-11-28 20:24:18 +08:00
Shiyou Yin	13b8c44b44	loongarch: Add optimization for dsdot kernel.	2023-11-28 20:24:16 +08:00
Shiyou Yin	3def6a8143	loongarch: Add LASX optimization for dot.	2023-11-28 20:24:14 +08:00
Bart Oldeman	c34e2cf380	Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum for skylake kernels. This is the same method as used in [sd]asum. _mm_set1_epi64x was commented out for zasum, but has the advantage of avoiding possible undefined behaviour (using an uninitialized variable), optimized out by NVHPC and icx. The new code works fine with those compilers. For GCC 12.3 the generated code is identical; no matter what method you use, the compiler optimizes the code into a compile-time constant, there is no performance benefit using mm_cmpeq_epi8 since the corresponding instruction (VPCMPEQB) isn't actually generated!	2023-11-19 21:28:35 +00:00
Martin Kroeker	22aa401656	Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327 ) * Temporarily disable the C/ZASUM microkernels for any version of NVHPC	2023-11-19 00:04:31 +01:00
Bart Oldeman	f8ad5344c2	Fix casum fallback kernel. This kernel is only used on Skylake+ if the kernel with AVX512 intrinsics can't be used, but used the variable x1 incorrectly in the tail end of the loop, as it is still at the initial value instead of where x points to. This caused 55 "other error"s in the LAPACK tests (https://github.com/OpenMathLib/OpenBLAS/issues/4282) This change makes casum.c as similar as possible as zasum.c, because zasum.c does this correctly.	2023-11-17 23:53:56 +00:00
Martin Kroeker	04bc801999	(Re)apply fixes for supporting only a subset of precision types from PR 3915	2023-11-04 23:48:59 +01:00
Martin Kroeker	9019bc4945	Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well	2023-11-04 22:10:06 +01:00
Martin Kroeker	3bfa4d4dcc	Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE	2023-11-03 14:55:31 +01:00
Rajalakshmi Srinivasaraghavan	980f702f72	POWER: AIX: Make use of power10 optimization POWER10 optimizations are disabled when using default AIX assembler. As we have fixed many issues recently, enabling optimization path for default assembler.	2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan	9f42570e33	POWER: Increase macro size limit for AIX This patch increases the macro size limit from 4096 to 16384 to allow compiling larger assembly files in AIX. Tested with GCC and IBM Open XL C.	2023-10-12 12:37:40 -05:00
Martin Kroeker	9f49aef91b	Merge pull request #4255 from RajalakshmiSR/AIX-P10 POWER10: Fix compilation issues with Open XL C	2023-10-12 18:59:17 +02:00
Martin Kroeker	e7d05402e0	Fix up S/D GEMM copy function definitions after #4009	2023-10-12 14:24:53 +02:00
Rajalakshmi Srinivasaraghavan	71d733e5f7	POWER: Avoid m4 conversions for C files This patch removes intermediate m4 conversions used in sbgemm compilation as it is not needed for .c files. Tested on AIX with gcc and IBM Open XL C.	2023-10-11 17:18:42 -05:00
Rajalakshmi Srinivasaraghavan	82fc29a57a	POWER10: Fallback to POWER8 functions As cgemm and zgemm kernels are not optimized for big endian falling back to POWER8 versions. Tested on AIX using gcc and Open XL C.	2023-10-11 17:04:42 -05:00
Rajalakshmi Srinivasaraghavan	db0805906b	powerpc: Fix build errors with Open XL C This patch fixes errors when using Open XL C compiler on AIX. Tested with gcc/xlf and ibm-clang/xlf compiler combinations.	2023-10-04 14:04:03 -05:00
Martin Kroeker	675cd551da	fix improper function prototypes (empty parentheses)	2023-09-30 12:56:38 +02:00
gxw	d15e0a055c	LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH	2023-09-27 10:05:27 +08:00
gxw	4670eb1462	LoongArch64: Add dtrsm kernel	2023-09-26 15:45:14 +08:00
gxw	f2cf929374	LoongArch64: Add sgemv kernel	2023-09-04 14:28:37 +08:00
Martin Kroeker	8e6d93359d	Merge pull request #4196 from TiborGY/obsolete_inlines Modernize obsolete inline order	2023-09-03 14:12:42 +02:00
gxw	394a1fd1bf	LoongArch64: Compatible with early internal toolchain __loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0 (Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the general and floating-point register widths. However, previous versions did not have them, requiring additional checks to be added.	2023-08-31 16:55:29 +08:00
Martin Kroeker	9c4ae4d4fb	Merge pull request #4206 from martin-frbg/issue4201-2 Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler	2023-08-26 10:17:27 +02:00
Martin Kroeker	88435104c8	Merge pull request #4204 from martin-frbg/llvm17-2 Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM	2023-08-26 00:32:18 +02:00
Martin Kroeker	fc8894dd98	Workaround miscompilation by NVIDIA nvc	2023-08-26 00:30:17 +02:00
Martin Kroeker	7a6203ffa1	restore default Neoverse SVE build instructions for non-NVIDIA compilers	2023-08-25 18:25:51 +02:00
Martin Kroeker	2c3034ff7f	Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well	2023-08-25 17:22:51 +02:00
Martin Kroeker	8794544b43	Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler	2023-08-25 16:47:32 +02:00
gxw	553cc1372f	LoongArch64: Add sgemm_kernel	2023-08-23 16:08:43 +08:00
Martin Kroeker	12ede72ab7	Merge pull request #4192 from imciner2/im/clangfix Fix cooperlake and sapphire rapids march flags on clang	2023-08-21 15:46:35 +02:00
Ian McInerney	79c15db348	Fix power10 gcc intrinsic check __builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by __builtin_vsx_build_pair thereafter.	2023-08-17 15:05:29 +01:00
TGY	b5ba95a6c0	Modernize obsolete inline order	2023-08-16 00:48:40 +02:00
Ian McInerney	8a8a8479be	Fix cooperlake and sapphire rapids march flags on clang The march=cooperlake and march=sapphirerapids flags were never getting added when building with Clang targetting those architectures. Instead it was falling back to the skylake AVX512 implementation. Clang added support for these two architectures in Clang 9 and Clang 12, so introduce new checks for those versions to enable the appropriate march flag, and fallback to skylake otherwise.	2023-08-14 16:12:35 +01:00
Martin Kroeker	34da1a067d	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 17:01:50 +02:00
Martin Kroeker	07e32c4cb8	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 17:00:18 +02:00
Martin Kroeker	c211da0688	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:58:57 +02:00
Martin Kroeker	a34a0a7abc	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:56:52 +02:00
Martin Kroeker	54d3246fc6	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:55:17 +02:00
Martin Kroeker	7dd441d5db	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:53:33 +02:00
Martin Kroeker	f692178792	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:52:09 +02:00
Martin Kroeker	d15ffb7fdf	Allow negative INCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:50:44 +02:00
Martin Kroeker	a2d867f4d1	Allow negative iNCX (API change from version 3.10 of the reference implementation)	2023-08-10 16:49:05 +02:00
gxw	4d0f000db6	MIPS: Enable MSA	2023-08-07 21:00:10 +08:00
Martin Kroeker	afdc56a421	Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel LoongArch64: Update dgemm kernel	2023-08-07 12:44:09 +02:00
gxw	e8b571d245	LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2	2023-08-07 11:20:42 +08:00
gxw	71fcee6eef	LoongArch64: Update dgemm kernel	2023-08-07 11:06:52 +08:00
Martin Kroeker	0f521ece25	Merge pull request #4183 from martin-frbg/issue4181 Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds	2023-08-06 18:59:50 +02:00
Martin Kroeker	41c31bc1d4	Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"	2023-08-06 16:00:03 +02:00
Martin Kroeker	61d803547a	Apply USE_TRMM to MIPS64_GENERIC as to GENERIC	2023-08-06 15:17:38 +02:00
Martin Kroeker	f8ee309402	Merge pull request #4153 from XiWeiGu/dgemv LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S	2023-08-06 08:49:16 +02:00
gxw	ec1e96aac8	LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S	2023-08-05 10:24:17 +08:00
gxw	d46772e037	LoongArch64: Add compiler feature checks	2023-08-05 10:21:43 +08:00
Martin Kroeker	4664b57e6e	use shortcut only when both incx and incy are zero	2023-08-04 12:25:34 +02:00
Martin Kroeker	09131f79a6	Merge pull request #4164 from martin-frbg/issue4162 Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3	2023-07-29 15:07:20 +02:00
Martin Kroeker	6a428b5629	Update casum_microk_skylakex-2.c	2023-07-29 12:24:30 +02:00
Martin Kroeker	ebb447e32e	Update zasum_microk_skylakex-2.c	2023-07-29 12:23:57 +02:00
Martin Kroeker	9f6847583a	nvc currently miscompiles this, hopefully fixed in release 23.09	2023-07-29 11:50:16 +02:00
Martin Kroeker	fe54ee3d15	nvc currently miscompiles this, hopefully fixed in release 23.09	2023-07-29 11:48:38 +02:00
Martin Kroeker	5720fa02c5	Merge pull request #4168 from Mousius/sve-zgemm-cgemm Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core	2023-07-27 17:41:45 +02:00
Chris Sidebottom	84a268b6ca	Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores. I also replicated the unrolling the copies from sgemm and dgemm.	2023-07-27 14:12:20 +01:00
Chris Sidebottom	730ca04b48	Fix ZHEMM copy for SVE Whilst disambiguating whilelt, I inadvertantly used the wrong datatype for offsets, which can be negative. This rectifies that.	2023-07-27 13:27:28 +01:00
Martin Kroeker	2a62d2df96	Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3	2023-07-26 19:39:11 +02:00
Martin Kroeker	849c8806b8	Merge pull request #4161 from Mousius/non-sve-kernels Use latest non-SVE kernels in ARMV8SVE	2023-07-26 15:49:40 +02:00
Chris Sidebottom	24586bc4ff	Disambiguate whilelt	2023-07-25 20:15:44 +01:00
Chris Sidebottom	aea2a4622b	Use latest non-SVE kernels in ARMV8SVE These are generally better and, in some cases, include threading which helps in the cores we're targeting here.	2023-07-25 14:12:26 +01:00
Octavian Maghiar	826a9d5fa4	Adds tail undisturbed for RVV Level 2 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-25 11:36:23 +01:00
martin-frbg	7976deff80	Fix file permissions (issue 4095)	2023-07-23 20:37:07 +02:00
Octavian Maghiar	8df0289db6	Adds tail undisturbed for RVV Level 1 operations During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic. Commit changes intrinsics tail policy to undistrubed.	2023-07-20 15:28:35 +01:00
Martin Kroeker	76ef1672f8	Override DSDOT with generic code to get rid of qemu precision error	2023-07-19 22:31:07 +02:00
Martin Kroeker	49077e7bde	Merge pull request #4145 from martin-frbg/issue4144 Restore zero-initialization of variables in generic ztrsm_utcopy	2023-07-14 12:44:05 +02:00
Martin Kroeker	3d31191b0f	Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140 ) * Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH * add casts to disambiguate svwhilelt for clang	2023-07-14 11:06:48 +02:00
Martin Kroeker	cfa0a80664	Restore initialization of data variables	2023-07-13 23:23:12 +02:00
Martin Kroeker	9567305e4c	Restore initialization of data01,data02	2023-07-13 23:21:18 +02:00
Octavian Maghiar	1e4a3a2b5e	Fixes RVV masked intrinsics for izamax/izamin kernels	2023-07-12 12:55:50 +01:00
Octavian Maghiar	e1958eb705	Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.	2023-07-05 11:34:00 +01:00
Xianyi Zhang	e14a025bb1	Temporily walk around zaxpy vector kernel bug.	2023-06-28 11:17:38 +00:00
Martin Kroeker	772b0cc715	Fix early bailout	2023-06-27 16:12:27 +02:00
Martin Kroeker	d6be5036d7	Fix IDAMAX	2023-06-26 21:19:33 +02:00
Martin Kroeker	1fe96f8da7	Fix failures to handle increments of zero	2023-06-25 22:36:57 +02:00
Martin Kroeker	73b30b1dec	Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1	2023-06-18 17:46:29 +02:00
Martin Kroeker	c3a2d407a0	Merge pull request #4048 from imzhuhl/spr_sbgemm_fix Sapphire Rapids sbgemm fix	2023-06-17 20:47:09 +02:00
Manjul Mohan	58b88aa5f0	POWER10: Fix compiler warnings This patch removes the warning messages related to unused variables in sbgemm_kernel_power10.c. Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>	2023-06-12 01:08:59 -04:00
ZhengSh	2a8bc38cdc	Merge branch 'xianyi:risc-v' into risc-v	2023-06-09 20:01:03 +08:00
Heller Zheng	0954746380	remove argument unused during compilation. fix wrong vr = VFMVVF_FLOAT(0, vl);	2023-06-04 20:06:58 -07:00
sh-zheng	d3bf5a5401	Combine two reduction operations of zhe/symv into one, with tail undisturbed setted.	2023-05-22 22:39:45 +08:00
Honglin Zhu	9e80a194d6	Fix dynamic_list build and gcc version check error	2023-05-21 19:52:58 +08:00
Honglin Zhu	a76afdc047	Compatible with older version of GNU make	2023-05-20 13:58:23 +08:00
sh-zheng	18d7afe69d	Add rvv support for zsymv and active rvv support for zhemv	2023-05-20 01:19:44 +08:00
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2023-05-19 10:48:18 +08:00
Honglin Zhu	0b83088887	spr dynamic arch support	2023-05-19 10:48:18 +08:00
Honglin Zhu	f249ccb741	Fix spr sbgemm error	2023-05-19 10:48:18 +08:00
Martin Kroeker	e9a8d5b45f	Merge pull request #4015 from martin-frbg/issue4013-2 [WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV	2023-04-23 18:51:12 +02:00
Martin Kroeker	72caceb324	Merge pull request #4009 from Mousius/sve-gemm Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1	2023-04-22 13:56:45 +02:00
Martin Kroeker	84bcf6639f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-20 23:24:52 +02:00
Martin Kroeker	c9174ae8d7	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:44 +02:00
Martin Kroeker	c2fe9cb91f	Disable gcc's tree-vectorizer pass on all operating systems	2023-04-19 23:45:14 +02:00

1 2 3 4 5 ...

2213 Commits