Commit Graph

2213 Commits

Author SHA1 Message Date
Martin Kroeker 25b0c48082
Update zscal.c 2024-01-08 09:49:18 +01:00
Martin Kroeker 5e7f714e93
Update zscal.c 2024-01-08 08:17:40 +01:00
Martin Kroeker cf8b03ae8b
Use NAN rather than SNAN for portability 2024-01-07 23:09:57 +01:00
Martin Kroeker f0808d856b
Handle NAN in input 2024-01-07 20:27:29 +01:00
Martin Kroeker acf17a825d
Handle NAN in input 2024-01-07 20:26:16 +01:00
Martin Kroeker c9df62e883
Fix handling of NAN 2024-01-07 17:49:40 +01:00
Martin Kroeker def4996170
Fix handling of NAN and INF arguments 2024-01-07 15:29:42 +01:00
Martin Kroeker 519b40fad9
Merge pull request #4398 from yinshiyou/la-dev
Add Optimizations for LoongArch.
2023-12-30 19:51:08 +01:00
pengxu a5d0d21378 loongarch64: Add zgemm and cgemm optimization 2023-12-29 18:06:26 +08:00
gxw 546f13558c loongarch64: Add {c/z}swap and {c/z}sum optimization 2023-12-29 17:30:57 +08:00
Hao Chen edabb93668 loongarch64: Refine axpby optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 1ec5dded43 loongarch64: Add c/zrot optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 3c53ded315 loongarch64: Add c/znrm2 optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen fbd612f8c4 loongarch64: Add ic/zamin optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen d97272cb35 loongarch64: Add c/zdot optimization functions. 2023-12-29 17:30:57 +08:00
Hao Chen 65a0aeb128 loongarch64: Add c/zcopy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 2a34fb4b80 loongarch64: Add and refine scal optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 8785e948b5 loongarch64: Add camin optimization function. 2023-12-29 17:30:57 +08:00
Hao Chen 0753848e03 loongarch64: Refine and add axpy optimization functions.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 06fd5b5995 loongarch64: Add and Refine asum optimization functions. 2023-12-29 17:30:57 +08:00
guxiwei e771be185e Optimize copy functions with lsx.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
2023-12-29 17:30:57 +08:00
Hao Chen 179ed51d3b Add dgemm_kernel_8x4.S file. 2023-12-29 17:30:57 +08:00
Hao Chen 173a65d4e6 loongarch64: Add and refine iamax optimization functions. 2023-12-29 17:30:57 +08:00
zhoupeng ea70e165c7 loongarch64: Refine rot optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 116aee7527 loongarch64: Refine imin optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 8be2654193 loongarch64: Refine imax optimization. 2023-12-29 17:30:57 +08:00
zhoupeng 154baad454 loongarch64: Refine iamin optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin 36c12c4971 loongarch64: Refine copy,swap,nrm2,sum optimization. 2023-12-29 17:30:57 +08:00
Shiyou Yin c6996a80e9 loongarch64: Refine amax,amin,max,min optimization. 2023-12-29 17:30:57 +08:00
Chris Sidebottom ecae1389df Reduce duplication in kernel definitions
These files are exactly the same, so I believe we can reduce these files
down. Other files require a slightly more complex unpicking.
2023-12-23 12:39:53 +00:00
Chris Sidebottom 60e66725e4 Use numeric labels to allow repeated inlining 2023-12-19 13:11:06 +00:00
Chris Sidebottom 7a4fef4f60 Tweak SVE dot kernel
This changes the SVE dot kernel to only predicate when necessary as well
as streamlining the assembly a bit. The benchmarks seem to indicate this
can improve performance by ~33%.
2023-12-19 12:08:54 +00:00
Martin Kroeker f06b535566
Use C kernel for dgemv_t due to limitations of the old assembly one 2023-12-15 09:58:44 +01:00
barracuda156 d9653af018 KERNEL.PPC970, KERNEL.PPCG4: unbreak CMake parsing
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/4366
2023-12-14 12:00:11 +08:00
Chip-Kerchner 93747fb377 Merge remote-tracking branch 'origin/develop' into power10Copies 2023-12-12 09:32:49 -06:00
Chip-Kerchner 4e738e561a Replace two vector loads with one vector pair load and fix endianess of stores. 2023-12-08 12:36:08 -06:00
yancheng d32f38fb37 loongarch64: Add optimizations for nrm2. 2023-12-07 14:36:26 +08:00
yancheng f9b468990e loongarch64: Add optimizations for rot. 2023-12-07 14:36:26 +08:00
yancheng c80e7e27d1 loongarch64: Add optimizations for sum and asum. 2023-12-07 14:36:26 +08:00
yancheng d4c96a35a8 loongarch64: Add optimizations for axpy and axpby. 2023-12-07 14:36:26 +08:00
yancheng 360acc0a41 loongarch64: Add optimizations for swap. 2023-12-07 14:36:26 +08:00
yancheng 174c25766b loongarch64: Add optimizations for copy. 2023-12-07 14:36:26 +08:00
yancheng 49829b2b7d loongarch64: Add optimizations for iamin. 2023-12-07 14:36:07 +08:00
yancheng be83f5e4e0 loongarch64: Add optimizations for iamax. 2023-12-07 14:36:07 +08:00
yancheng e3fb2b5afa loongarch64: Add optimizations for imin. 2023-12-07 14:36:07 +08:00
yancheng e46b48e372 loongarch64: Add optimizations for imax. 2023-12-07 14:36:07 +08:00
yancheng 702fc1d56d loongarch64: Add optimization for min. 2023-12-07 14:36:07 +08:00
yancheng 346b384d1c loongarch64: Add optimization for max. 2023-12-07 14:36:07 +08:00
yancheng ff2ecc6cda loongarch64: Add optimization for amin. 2023-12-07 14:36:07 +08:00
yancheng 265b5f2e80 loongarch64: Add optimizations for amax. 2023-12-07 14:36:07 +08:00
yancheng 993ede7c70 loongarch64: Add optimizations for scal. 2023-12-07 14:36:07 +08:00
Octavian Maghiar 4a12cf53ec [RISC-V] Improve RVV kernel generator LMUL usage
The RVV kernel generation script uses the provided LMUL to increase the number of accumulator registers.
Since the effect of the LMUL is to group together the vector registers into larger ones, it actually should be used as a multiplier in the calculation of vlenmax.
At the moment, no matter what LMUL is provided, the generated kernels would only set the maximum number of vector elements equal to VLEN/SEW.
Commit changes the use of LMUL to properly adjust vlenmax. Note that an increase in LMUL results in a decrease in the number of effective vector registers.
2023-12-04 11:13:35 +00:00
Octavian Maghiar e4586e81b8 [RISC-V] Add RISC-V Vector 128-bit target
Current RVV x280 target depends on vlen=512-bits for Level 3 operations.
Commit adds generic target that supports vlen=128-bits.
New target uses the same scalable kernels as x280 for Level 1&2 operations, and autogenerated kernels for Level 3 operations.
Functional correctness of Level 3 operations tested on vlen=128-bits using QEMU v8.1.1 for ctests and BLAS-Tester.
2023-12-04 11:02:18 +00:00
Martin Kroeker 39bf8ece20
Merge pull request #4340 from yinshiyou/la-dev
Add some refines and optimizations for LoongArch.
2023-11-29 08:22:25 +01:00
Shiyou Yin 9fe07d82fd loongarch: Add LSX optimization for dot. 2023-11-28 20:24:18 +08:00
Shiyou Yin 13b8c44b44 loongarch: Add optimization for dsdot kernel. 2023-11-28 20:24:16 +08:00
Shiyou Yin 3def6a8143 loongarch: Add LASX optimization for dot. 2023-11-28 20:24:14 +08:00
Bart Oldeman c34e2cf380 Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum
for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
2023-11-19 21:28:35 +00:00
Martin Kroeker 22aa401656
Temporarily disable the AVX512 CASUM/ZASUM microkernels for any version of NVIDIA HPC (#4327)
* Temporarily disable the C/ZASUM microkernels for any version of NVHPC
2023-11-19 00:04:31 +01:00
Bart Oldeman f8ad5344c2 Fix casum fallback kernel.
This kernel is only used on Skylake+ if the kernel with AVX512
intrinsics can't be used, but used the variable x1 incorrectly
in the tail end of the loop, as it is still at the initial
value instead of where x points to.

This caused 55 "other error"s in the LAPACK tests
(https://github.com/OpenMathLib/OpenBLAS/issues/4282)

This change makes casum.c as similar as possible as zasum.c,
because zasum.c does this correctly.
2023-11-17 23:53:56 +00:00
Martin Kroeker 04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915 2023-11-04 23:48:59 +01:00
Martin Kroeker 9019bc4945
Use SkylakeX ?ASUM microkernel for Cooperlake/Sapphirerapids as well 2023-11-04 22:10:06 +01:00
Martin Kroeker 3bfa4d4dcc
Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 2023-11-03 14:55:31 +01:00
Rajalakshmi Srinivasaraghavan 980f702f72 POWER: AIX: Make use of power10 optimization
POWER10 optimizations are disabled when using default AIX assembler.
As we have fixed many issues recently, enabling optimization path
for default assembler.
2023-10-19 18:48:19 -05:00
Rajalakshmi Srinivasaraghavan 9f42570e33 POWER: Increase macro size limit for AIX
This patch increases the macro size limit from 4096 to 16384 to
allow compiling larger assembly files in AIX.
Tested with GCC and IBM Open XL C.
2023-10-12 12:37:40 -05:00
Martin Kroeker 9f49aef91b
Merge pull request #4255 from RajalakshmiSR/AIX-P10
POWER10: Fix compilation issues with Open XL C
2023-10-12 18:59:17 +02:00
Martin Kroeker e7d05402e0
Fix up S/D GEMM copy function definitions after #4009 2023-10-12 14:24:53 +02:00
Rajalakshmi Srinivasaraghavan 71d733e5f7 POWER: Avoid m4 conversions for C files
This patch removes intermediate m4 conversions used in sbgemm
compilation as it is not needed for .c files.
Tested on AIX with gcc and IBM Open XL C.
2023-10-11 17:18:42 -05:00
Rajalakshmi Srinivasaraghavan 82fc29a57a POWER10: Fallback to POWER8 functions
As cgemm and zgemm kernels are not optimized for big endian falling
back to POWER8 versions.  Tested on AIX using gcc and Open XL C.
2023-10-11 17:04:42 -05:00
Rajalakshmi Srinivasaraghavan db0805906b powerpc: Fix build errors with Open XL C
This patch fixes errors when using Open XL C compiler on AIX.
Tested with gcc/xlf and ibm-clang/xlf compiler combinations.
2023-10-04 14:04:03 -05:00
Martin Kroeker 675cd551da
fix improper function prototypes (empty parentheses) 2023-09-30 12:56:38 +02:00
gxw d15e0a055c LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH 2023-09-27 10:05:27 +08:00
gxw 4670eb1462 LoongArch64: Add dtrsm kernel 2023-09-26 15:45:14 +08:00
gxw f2cf929374 LoongArch64: Add sgemv kernel 2023-09-04 14:28:37 +08:00
Martin Kroeker 8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2023-09-03 14:12:42 +02:00
gxw 394a1fd1bf LoongArch64: Compatible with early internal toolchain
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2023-08-31 16:55:29 +08:00
Martin Kroeker 9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2023-08-26 10:17:27 +02:00
Martin Kroeker 88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2023-08-26 00:32:18 +02:00
Martin Kroeker fc8894dd98
Workaround miscompilation by NVIDIA nvc 2023-08-26 00:30:17 +02:00
Martin Kroeker 7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers 2023-08-25 18:25:51 +02:00
Martin Kroeker 2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well 2023-08-25 17:22:51 +02:00
Martin Kroeker 8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler 2023-08-25 16:47:32 +02:00
gxw 553cc1372f LoongArch64: Add sgemm_kernel 2023-08-23 16:08:43 +08:00
Martin Kroeker 12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
Fix cooperlake and sapphire rapids march flags on clang
2023-08-21 15:46:35 +02:00
Ian McInerney 79c15db348 Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2023-08-17 15:05:29 +01:00
TGY b5ba95a6c0 Modernize obsolete inline order 2023-08-16 00:48:40 +02:00
Ian McInerney 8a8a8479be Fix cooperlake and sapphire rapids march flags on clang
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.

Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2023-08-14 16:12:35 +01:00
Martin Kroeker 34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:01:50 +02:00
Martin Kroeker 07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 17:00:18 +02:00
Martin Kroeker c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:58:57 +02:00
Martin Kroeker a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:56:52 +02:00
Martin Kroeker 54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:55:17 +02:00
Martin Kroeker 7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:53:33 +02:00
Martin Kroeker f692178792
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:52:09 +02:00
Martin Kroeker d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:50:44 +02:00
Martin Kroeker a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation) 2023-08-10 16:49:05 +02:00
gxw 4d0f000db6 MIPS: Enable MSA 2023-08-07 21:00:10 +08:00
Martin Kroeker afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2023-08-07 12:44:09 +02:00
gxw e8b571d245 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2 2023-08-07 11:20:42 +08:00
gxw 71fcee6eef LoongArch64: Update dgemm kernel 2023-08-07 11:06:52 +08:00
Martin Kroeker 0f521ece25
Merge pull request #4183 from martin-frbg/issue4181
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2023-08-06 18:59:50 +02:00
Martin Kroeker 41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S" 2023-08-06 16:00:03 +02:00
Martin Kroeker 61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC 2023-08-06 15:17:38 +02:00
Martin Kroeker f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2023-08-06 08:49:16 +02:00
gxw ec1e96aac8 LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S 2023-08-05 10:24:17 +08:00
gxw d46772e037 LoongArch64: Add compiler feature checks 2023-08-05 10:21:43 +08:00
Martin Kroeker 4664b57e6e
use shortcut only when both incx and incy are zero 2023-08-04 12:25:34 +02:00
Martin Kroeker 09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2023-07-29 15:07:20 +02:00
Martin Kroeker 6a428b5629
Update casum_microk_skylakex-2.c 2023-07-29 12:24:30 +02:00
Martin Kroeker ebb447e32e
Update zasum_microk_skylakex-2.c 2023-07-29 12:23:57 +02:00
Martin Kroeker 9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:50:16 +02:00
Martin Kroeker fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09 2023-07-29 11:48:38 +02:00
Martin Kroeker 5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2023-07-27 17:41:45 +02:00
Chris Sidebottom 84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868, this means I'm happy to enable this on any applicable cores.

I also replicated the unrolling the copies from sgemm and dgemm.
2023-07-27 14:12:20 +01:00
Chris Sidebottom 730ca04b48 Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2023-07-27 13:27:28 +01:00
Martin Kroeker 2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3 2023-07-26 19:39:11 +02:00
Martin Kroeker 849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2023-07-26 15:49:40 +02:00
Chris Sidebottom 24586bc4ff Disambiguate whilelt 2023-07-25 20:15:44 +01:00
Chris Sidebottom aea2a4622b Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2023-07-25 14:12:26 +01:00
Octavian Maghiar 826a9d5fa4 Adds tail undisturbed for RVV Level 2 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-25 11:36:23 +01:00
martin-frbg 7976deff80 Fix file permissions (issue 4095) 2023-07-23 20:37:07 +02:00
Octavian Maghiar 8df0289db6 Adds tail undisturbed for RVV Level 1 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-20 15:28:35 +01:00
Martin Kroeker 76ef1672f8
Override DSDOT with generic code to get rid of qemu precision error 2023-07-19 22:31:07 +02:00
Martin Kroeker 49077e7bde
Merge pull request #4145 from martin-frbg/issue4144
Restore zero-initialization of variables in generic ztrsm_utcopy
2023-07-14 12:44:05 +02:00
Martin Kroeker 3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140)
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH

* add casts to disambiguate svwhilelt for clang
2023-07-14 11:06:48 +02:00
Martin Kroeker cfa0a80664
Restore initialization of data variables 2023-07-13 23:23:12 +02:00
Martin Kroeker 9567305e4c
Restore initialization of data01,data02 2023-07-13 23:21:18 +02:00
Octavian Maghiar 1e4a3a2b5e Fixes RVV masked intrinsics for izamax/izamin kernels 2023-07-12 12:55:50 +01:00
Octavian Maghiar e1958eb705 Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
2023-07-05 11:34:00 +01:00
Xianyi Zhang e14a025bb1 Temporily walk around zaxpy vector kernel bug. 2023-06-28 11:17:38 +00:00
Martin Kroeker 772b0cc715
Fix early bailout 2023-06-27 16:12:27 +02:00
Martin Kroeker d6be5036d7
Fix IDAMAX 2023-06-26 21:19:33 +02:00
Martin Kroeker 1fe96f8da7
Fix failures to handle increments of zero 2023-06-25 22:36:57 +02:00
Martin Kroeker 73b30b1dec
Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1 2023-06-18 17:46:29 +02:00
Martin Kroeker c3a2d407a0
Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Manjul Mohan 58b88aa5f0 POWER10: Fix compiler warnings
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.

Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com>
2023-06-12 01:08:59 -04:00
ZhengSh 2a8bc38cdc
Merge branch 'xianyi:risc-v' into risc-v 2023-06-09 20:01:03 +08:00
Heller Zheng 0954746380 remove argument unused during compilation.
fix wrong vr = VFMVVF_FLOAT(0, vl);
2023-06-04 20:06:58 -07:00
sh-zheng d3bf5a5401 Combine two reduction operations of zhe/symv into one, with tail undisturbed setted. 2023-05-22 22:39:45 +08:00
Honglin Zhu 9e80a194d6 Fix dynamic_list build and gcc version check error 2023-05-21 19:52:58 +08:00
Honglin Zhu a76afdc047 Compatible with older version of GNU make 2023-05-20 13:58:23 +08:00
sh-zheng 18d7afe69d Add rvv support for zsymv and active rvv support for zhemv 2023-05-20 01:19:44 +08:00
Honglin Zhu 90f041e348 Invoke the syscall to allow the use of amx tiles 2023-05-19 10:48:18 +08:00
Honglin Zhu 0b83088887 spr dynamic arch support 2023-05-19 10:48:18 +08:00
Honglin Zhu f249ccb741 Fix spr sbgemm error 2023-05-19 10:48:18 +08:00
Martin Kroeker e9a8d5b45f
Merge pull request #4015 from martin-frbg/issue4013-2
[WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV
2023-04-23 18:51:12 +02:00
Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker 84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-20 23:24:52 +02:00
Martin Kroeker c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:44 +02:00
Martin Kroeker c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems 2023-04-19 23:45:14 +02:00