The existing kernel was issuing extra instructions to organise the arguments into the same registers they would usually be in and similarly to put the result into the appropriate register.
This has an impact on smaller sized dots and seemed like a quick fix