martin-frbg
7976deff80
Fix file permissions (issue 4095)
2023-07-23 20:37:07 +02:00
Octavian Maghiar
8df0289db6
Adds tail undisturbed for RVV Level 1 operations
...
During the last iteration of some RVV operations, accumulators can get overwritten when VL < VLMAX and tail policy is agnostic.
Commit changes intrinsics tail policy to undistrubed.
2023-07-20 15:28:35 +01:00
Martin Kroeker
76ef1672f8
Override DSDOT with generic code to get rid of qemu precision error
2023-07-19 22:31:07 +02:00
Martin Kroeker
49077e7bde
Merge pull request #4145 from martin-frbg/issue4144
...
Restore zero-initialization of variables in generic ztrsm_utcopy
2023-07-14 12:44:05 +02:00
Martin Kroeker
3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI ( #4140 )
...
* Add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH
* add casts to disambiguate svwhilelt for clang
2023-07-14 11:06:48 +02:00
Martin Kroeker
cfa0a80664
Restore initialization of data variables
2023-07-13 23:23:12 +02:00
Martin Kroeker
9567305e4c
Restore initialization of data01,data02
2023-07-13 23:21:18 +02:00
Octavian Maghiar
1e4a3a2b5e
Fixes RVV masked intrinsics for izamax/izamin kernels
2023-07-12 12:55:50 +01:00
Octavian Maghiar
e1958eb705
Fixes RVV masked intrinsics for iamax/iamin/imax/imin kernels
...
Changes masked intrinsics from _m to _mu and reintroduces maskedoff argument.
2023-07-05 11:34:00 +01:00
Xianyi Zhang
e14a025bb1
Temporily walk around zaxpy vector kernel bug.
2023-06-28 11:17:38 +00:00
Martin Kroeker
772b0cc715
Fix early bailout
2023-06-27 16:12:27 +02:00
Martin Kroeker
d6be5036d7
Fix IDAMAX
2023-06-26 21:19:33 +02:00
Martin Kroeker
1fe96f8da7
Fix failures to handle increments of zero
2023-06-25 22:36:57 +02:00
Martin Kroeker
73b30b1dec
Fix VLEV_FLOAT/VSEV_FLOAT macros to compile with t-head 2.6.1
2023-06-18 17:46:29 +02:00
Martin Kroeker
c3a2d407a0
Merge pull request #4048 from imzhuhl/spr_sbgemm_fix
...
Sapphire Rapids sbgemm fix
2023-06-17 20:47:09 +02:00
Manjul Mohan
58b88aa5f0
POWER10: Fix compiler warnings
...
This patch removes the warning messages related to unused variables in
sbgemm_kernel_power10.c.
Signed-off-by: Manjul Mohan <manjul@linux.vnet.ibm.com >
2023-06-12 01:08:59 -04:00
ZhengSh
2a8bc38cdc
Merge branch 'xianyi:risc-v' into risc-v
2023-06-09 20:01:03 +08:00
Heller Zheng
0954746380
remove argument unused during compilation.
...
fix wrong vr = VFMVVF_FLOAT(0, vl);
2023-06-04 20:06:58 -07:00
sh-zheng
d3bf5a5401
Combine two reduction operations of zhe/symv into one, with tail undisturbed setted.
2023-05-22 22:39:45 +08:00
Honglin Zhu
9e80a194d6
Fix dynamic_list build and gcc version check error
2023-05-21 19:52:58 +08:00
Honglin Zhu
a76afdc047
Compatible with older version of GNU make
2023-05-20 13:58:23 +08:00
sh-zheng
18d7afe69d
Add rvv support for zsymv and active rvv support for zhemv
2023-05-20 01:19:44 +08:00
Honglin Zhu
90f041e348
Invoke the syscall to allow the use of amx tiles
2023-05-19 10:48:18 +08:00
Honglin Zhu
0b83088887
spr dynamic arch support
2023-05-19 10:48:18 +08:00
Honglin Zhu
f249ccb741
Fix spr sbgemm error
2023-05-19 10:48:18 +08:00
Martin Kroeker
e9a8d5b45f
Merge pull request #4015 from martin-frbg/issue4013-2
...
[WIP] Disable gcc's tree-vectorizer for x86_64 CGEMV
2023-04-23 18:51:12 +02:00
Martin Kroeker
72caceb324
Merge pull request #4009 from Mousius/sve-gemm
...
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
2023-04-22 13:56:45 +02:00
Martin Kroeker
84bcf6639f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-20 23:24:52 +02:00
Martin Kroeker
c9174ae8d7
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:44 +02:00
Martin Kroeker
c2fe9cb91f
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:45:14 +02:00
Martin Kroeker
66b39b835c
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:45 +02:00
Martin Kroeker
bb6d6735bf
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:44:15 +02:00
Martin Kroeker
d18efaed20
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:43:43 +02:00
Martin Kroeker
99f6d31ed5
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:42:55 +02:00
Martin Kroeker
7de9335c56
Disable gcc's tree-vectorizer pass on all operating systems
2023-04-19 23:42:09 +02:00
Martin Kroeker
437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio
...
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
2023-04-19 11:51:54 +02:00
Chris Sidebottom
ec334e69dc
Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1
...
This re-spins #3869 with some additional copy unrolling which helps maintain SYRK performance.
After #3868 , the SVE kernels represent a pretty good boost.
This re-uses ARMV8SVE as a base and I'm going to incrementally move everything to use ARMV8SVE in additional patches (as well as fix up anything that's not already in ARMV8SVE).
2023-04-17 17:38:42 +01:00
Chris Sidebottom
32f2fafde7
Propagate SWITCH_RATIO to DYNAMIC_ARCH builds
...
Previously dynamic builds were either using the default SWITCH_RATIO
or one from the higher level architecture; this patch ensures the
dynamic builds can use this parameter as well.
2023-04-17 15:34:12 +01:00
Martin Kroeker
44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw)
2023-04-17 14:23:13 +02:00
Martin Kroeker
8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now)
2023-04-15 12:02:39 +02:00
Martin Kroeker
3727672a74
Improve workaround and keep compilers from optimizing it out
2023-04-13 18:07:52 +02:00
Martin Kroeker
108a21e47a
Move ALPHA out of register 18 (reserved on OSX)
2023-04-13 18:05:14 +02:00
Martin Kroeker
0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 18:03:35 +02:00
Martin Kroeker
c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 18:00:47 +02:00
Martin Kroeker
cda29633a3
move ALPHA_I out of register 18 (reserved on OSX)
2023-04-13 17:59:48 +02:00
Martin Kroeker
09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt
...
Improve the performance of sbgemm_tcopy on neoversen2
2023-03-26 19:04:57 +02:00
Heller Zheng
1374a2d08b
This PR adapts latest spec changes
...
Add prefix (_riscv) for all riscv intrinsics
Update some intrinsics' parameter, like vfredxxxx, vmerge
2023-03-19 23:59:03 -07:00
Zhang Xianyi
19f17c8bc6
Merge pull request #3893 from HellerZheng/develop
...
add riscv level3 C,Z kernel functions.
2023-03-15 10:17:13 +08:00
Sergei Lewis
cb0a70e0e2
dot.c early bail fix
2023-03-02 09:51:10 +00:00
Sergei Lewis
9b61be4545
factoring riscv64/dot.c fix into separate PR as requested
2023-03-01 17:40:42 +00:00