Add check to use correct builtin name for older versions of gcc10 compilers.
This patch makes use of compiler builtins and matches with assembly performance. Tested with clang14 and gcc12.
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t. Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA) feature introduced in POWER ISA v3.1. Tested on simulator and there are no new test failures.