Commit Graph

1939 Commits

Author SHA1 Message Date
Octavian Maghiar 8df0289db6 Adds tail undisturbed for RVV Level 1 operations
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-20 15:28:35 +01:00
Octavian Maghiar 1e4a3a2b5e Fixes RVV masked intrinsics for izamax/izamin kernels 2023-07-12 12:55:50 +01:00
Octavian Maghiar e1958eb705 Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
2023-07-05 11:34:00 +01:00
ZhengSh 2a8bc38cdc
Merge branch 'xianyi:risc-v' into risc-v 2023-06-09 20:01:03 +08:00
Heller Zheng 0954746380 remove argument unused during compilation.
fix wrong vr = VFMVVF_FLOAT(0, vl);
2023-06-04 20:06:58 -07:00
sh-zheng d3bf5a5401 Combine two reduction operations of zhe/symv into one, with tail undisturbed setted. 2023-05-22 22:39:45 +08:00
sh-zheng 18d7afe69d Add rvv support for zsymv and active rvv support for zhemv 2023-05-20 01:19:44 +08:00
Heller Zheng 1374a2d08b This PR adapts latest spec changes
Add prefix (_riscv) for all riscv intrinsics
Update some intrinsics' parameter, like vfredxxxx, vmerge
2023-03-19 23:59:03 -07:00
Zhang Xianyi 19f17c8bc6
Merge pull request #3893 from HellerZheng/develop
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Sergei Lewis 9b61be4545 factoring riscv64/dot.c fix into separate PR as requested 2023-03-01 17:40:42 +00:00
Sergei Lewis 2406958629 * update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2023-02-24 10:45:03 +00:00
Heller Zheng 63cf4d0166 add riscv level3 C,Z kernel functions. 2023-02-01 19:13:44 -08:00
Xianyi Zhang c19dff0a31 Fix T-Head RVV intrinsic API changes. 2023-01-25 19:33:32 +08:00
Xianyi Zhang e5313f53d5 Merge branch 'develop' of https://github.com/HellerZheng/OpenBLAS_riscv_x280 into HellerZheng-develop 2022-12-03 12:00:52 +08:00
Chris Sidebottom eea006a688 Wrap SVE header with __has_include check 2022-12-01 12:07:55 +00:00
Chris Sidebottom fd4f52c797 Add SVE implementation for sdot/ddot
This adds an SVE implementation to sdot/ddot when available, falling back to the previous Advanced SIMD kernel where there's no SVE implementation for the kernel.

All the targets were essentially treating `dot_thunderx2t99.c` as the Advanced SIMD implementation so I've renamed it to better fit with the feature detection.
2022-12-01 12:07:50 +00:00
Chris Sidebottom 4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot
The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.

This has an impact on smaller sized dots and seemed like a quick fix
2022-11-25 16:19:03 +00:00
Heller Zheng 3918d8504e nrm2 simple optimization 2022-11-21 19:06:07 -08:00
HellerZheng 943372bdf5
Merge branch 'develop' into develop 2022-11-18 10:12:46 +08:00
Martin Kroeker f73cfb7e2c
change line endings from CRLF to LF 2022-11-17 09:39:56 +01:00
Martin Kroeker 1688c7da43
change line endings from CRLF to LF 2022-11-16 22:24:01 +01:00
Heller Zheng 5d0d1c5551 Remove redundant files 2022-11-15 18:22:21 -08:00
Heller Zheng bef47917bd Initial version for riscv sifive x280 2022-11-15 00:06:25 -08:00
Bart Oldeman 6c1043eb41 Add [cz]scal microkernels for SKYLAKEX
These are as similar to dscal_microk_skylakex-2.c as possible
for consistency.

Note that before this change SKYLAKEX+ uses generic C functions for
cscal/zscal via commit 2271c350 from #2610 (which is masked by
commit 086d87a30). However now #3799 disables FMAs (in turn enabled
by `-march=skylake-avx512`) in the plain C code which fixes excessive
LAPACK test failures more nicely.
2022-11-09 08:57:03 -05:00
Martin Kroeker c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793) 2022-10-31 16:57:03 +01:00
Martin Kroeker 65338a9493
Merge pull request #3799 from bartoldeman/cscal-zscal-no-fma
x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
2022-10-30 18:56:10 +01:00
Honglin Zhu 79066b6bf3 Change file name to match the norm and delete useless code. 2022-10-28 17:09:39 +08:00
Bart Oldeman e7e3aa2948 x86_64: prevent GCC and Clang from generating FMAs in cscal/zscal.
If e.g. -march=haswell is set in CFLAGS, GCC generates FMAs by default, which
is inconsistent with the microkernels, none of which use FMAs. These
inconsistencies cause a few failures in the LAPACK testcases, where
eigenvalue results with/without eigenvectors are compared.

Moreover using FMAs for multiplication of complex numbers can give surprising
results, see 22aa81f for more information.

This uses the same syntax as used in 22aa81f for zarch (s390x).
2022-10-27 18:16:43 -04:00
Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2022-10-27 14:10:26 +08:00
Honglin Zhu 843e9fd0b9 Fix typo error 2022-10-26 17:06:33 +08:00
Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2
1. Use UZP instructions but not gather load and scatter store instructions to get lower latency.
    2. Padding k to a power of 4.
2022-10-26 15:09:41 +08:00
Martin Kroeker f6f35a4288
fix copyobj declarations to work with DYNAMIC_ARCH 2022-09-29 08:47:14 +02:00
Martin Kroeker b1d69fb3ac
Add MIPS64_GENERIC as a copy of GENERIC 2022-09-17 23:52:32 +02:00
gxw edea1bcfaf MIPS64: Fixed failed utest dsdot:dsdot_n_1 when TARGET=I6500 2022-09-17 16:43:22 +08:00
Martin Kroeker 101a2c77c3
Fix warnings 2022-09-15 09:19:19 +02:00
Martin Kroeker 23d59baaf1
Add -mfma to -mavx2 for Apple clang, and set AVX2 options for Zen as well 2022-09-13 22:39:27 +02:00
gxw 365936ae1b MIPS64: Using the macro MTC rather than MTC1 2022-09-13 16:39:40 +08:00
Martin Kroeker 739c3c44a7
Work around windows/osx gcc12 x86_64 tree-optimizer problem and add an osx/gcc12 build to Azure CI (#3745)
Add pragma to disable the gcc tree-optimizer for some x86_64 S and Z kernels with gcc12 on OSX or Windows
2022-09-03 15:01:22 +02:00
Martin Kroeker bd30120ba7
Merge pull request #3720 from FlyGoat/mips64
Make it work on general MIPS64 processors
2022-08-19 20:24:27 +02:00
Jiaxun Yang a50b29c540 Provide a fallback MIPS64_GENERIC target
It is really dangerous to fallback to Loongson core on other
MIPS64 processors.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-12 13:13:28 +01:00
Jiaxun Yang 50c4eeb97d alpha: Remove include of version.h
It will be defined by preprocessor argument.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
2022-08-11 15:02:58 +01:00
Ivan Pribec 802e71bf05 Add const attribute to lsame 2022-08-08 15:15:52 +02:00
gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 2022-07-28 14:28:45 +08:00
Martin Kroeker cd8e57040c
Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
2022-07-25 15:41:15 +02:00
Martin Kroeker 6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-24 17:42:31 +02:00
Martin Kroeker c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
2022-07-19 13:59:16 +02:00
Martin Kroeker b7c65d08cb
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
2022-07-19 10:25:01 +02:00
Martin Kroeker 06ef015234
fix DNRM2 returning INF instead of zero due to intermediate overflow 2022-07-19 10:19:27 +02:00
Rajalakshmi Srinivasaraghavan a612e78a97 POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
2022-07-18 14:48:43 -05:00
Rajalakshmi Srinivasaraghavan 432fd99445 POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
2022-07-18 09:48:01 -05:00