The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and the generic kernel seems to have better precision. Tested on SiFive FU740 (HiFive Unmatched) and QEMU. Also see #1469.