Martin Kroeker
81dcfdcf39
Multiply by 2 instead of left-shifting a potentially negative number
...
fixes GCC ubsan warning in the BLAS tests
2020-08-02 18:29:56 +02:00
Martin Kroeker
0ef4b3f1f2
Multiply instead of doing a left shift of a potentially negative number
...
fixes GCC ubsan report in the BLAS tests
2020-08-02 18:27:40 +02:00
Martin Kroeker
aa53a8a5cb
Multiply by two instead of left-shifting one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:25:09 +02:00
Martin Kroeker
aa3a1e7d8c
Multiply by two rather than left shift by one place
...
fixes GCC ubsan report of "left shift of negative value -2" in the BLAS tests
2020-08-02 18:22:31 +02:00
Rajalakshmi Srinivasaraghavan
f77b6a83f4
dgemv optimization for POWER10
...
Making use of new vector pair POWER10 instructions in dgemv_n and dgemv_t.
Also adding a new block 4x128 to make use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1. Tested on simulator and there
are no new test failures.
2020-07-29 18:59:32 -05:00
Rajalakshmi Srinivasaraghavan
d557584b71
Fix compilation issues with clang on POWER
...
As gcc defaults to -malign-power, removing that option. Also
adding -fno-integrated-as to use GNU assembler for powerpc
assembly optimization files. Fixed other compilation errors
reported in dgemv_t.c file.
2020-07-27 14:11:07 -05:00
Ashwin Sekhar T K
4e1be0e481
ARM64: Add THUNDERX3T110 Target
2020-07-26 23:32:24 -07:00
Rajalakshmi Srinivasaraghavan
9be2688c78
Fix to store results in correct order for POWER10 GEMM kernels
...
There is a recent compiler change in __builtin_mma_disassemble_acc() which
affects the order of storing result in POWER10. Also removing new LDFLAG
-mno-power10-stub as it is handled by linker automatically.
2020-07-24 23:08:11 -05:00
Martin Kroeker
6a2a60038c
Merge pull request #2720 from martin-frbg/issue2694
...
WIP Further fixes for 32bit POWER8
2020-07-24 23:19:45 +02:00
Martin Kroeker
251a09ec90
Typo fix
2020-07-24 16:04:58 +00:00
Martin Kroeker
95d37e1575
Regroup the 32 and 64bit sections and restore 64bit CAXPY
2020-07-24 10:13:46 +00:00
Martin Kroeker
3523bb778e
Merge pull request #2721 from martin-frbg/p8align
...
Fix alignment errors in the power8 saxpy kernel
2020-07-24 11:06:20 +02:00
Martin Kroeker
bf1f0734ff
Use OPENBLAS_MAKE_COMPLEX_FLOAT on PPC only
2020-07-23 20:40:13 +00:00
Martin Kroeker
ca3561cab9
Add ifdefs around call to altivec microkernel
2020-07-23 18:30:42 +00:00
Martin Kroeker
21072e502a
Typo fix
2020-07-23 17:34:56 +00:00
Martin Kroeker
7c6e56b5df
Rewrite assignment to complex for better portability
2020-07-23 17:10:59 +02:00
Martin Kroeker
661c6bfa5a
Exclude altivec code paths if the compiler does not support them
2020-07-23 17:08:20 +02:00
Martin Kroeker
0033f8be0d
Use vec_vsx_ld/st to fix misaligned accesses flagged by asan
2020-07-16 23:32:54 +02:00
Martin Kroeker
f308e741b2
remove debug output and revert changes to cdot and crot
2020-07-15 10:00:07 +02:00
Martin Kroeker
da17abec87
fix trailing whitespace
2020-07-14 18:20:03 +02:00
Martin Kroeker
f8c2697701
Use POWER6 GEMM, TRMM and DTRSM on 32bit POWER8
2020-07-14 18:11:19 +02:00
Martin Kroeker
b144423f0f
Do not define USE_TRMM for 32bit POWER8
2020-07-14 18:10:12 +02:00
Martin Kroeker
ed7e155c35
Merge branch 'develop' into aix
2020-07-07 18:52:06 +02:00
EGuesnet
634e1305f9
Update cgemm_kernel_8x4_power8.S
2020-06-30 15:16:39 +02:00
Martin Kroeker
28d69e0097
Merge pull request #2687 from martin-frbg/utfbom
...
Strip UTF8 byte order marker from source files
2020-06-26 22:53:09 +02:00
Martin Kroeker
c2467c9619
Merge pull request #2686 from RajalakshmiSR/p10_shgemm
...
powerpc: Optimized SHGEMM kernel for POWER10
2020-06-26 22:52:45 +02:00
Martin Kroeker
d199c2787d
Merge pull request #2680 from kavanabhat/aix_makefile_fix
...
Fix for #2671
2020-06-26 11:27:28 +02:00
Martin Kroeker
e30ad0e521
Strip UTF8 byte order marker from source
2020-06-26 09:00:43 +02:00
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
...
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
2020-06-25 22:19:08 -05:00
Martin Kroeker
c854ef5471
Fix variable names in conditional
2020-06-25 13:29:52 +02:00
Martin Kroeker
c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
...
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
2020-06-25 13:12:36 +02:00
Gordon Fossum
bb2f52844b
powerpc: Optimized ZGEMM kernel for POWER10
...
This patch introduces new optimized version of ZGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
2020-06-24 14:50:12 -05:00
Rajalakshmi Srinivasaraghavan
571eadb880
powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
...
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8ee2640bfdc62f835ec9740278f948034bc7d9f1
2020-06-24 14:48:15 -05:00
Kavana Bhat
df4ade070f
Fix for #2671
2020-06-24 04:25:47 -05:00
Martin Kroeker
93592d1260
Merge pull request #2675 from wjc404/develop
...
AVX512 DGEMM TCOPY_16 Function
2020-06-23 09:29:02 +02:00
wjc404
086d87a302
AVX512 dgemm tcopy_16 function
2020-06-20 00:07:43 +08:00
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
...
This is the initial patch to support build infrastructure
for POWER10 architecture.
2020-06-11 15:47:20 -05:00
ZhangDanfeng
bc6fd20a40
fix INIT8x4
...
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-10 01:01:16 +08:00
Martin Kroeker
89091e6b64
Merge pull request #2645 from martin-frbg/misc_fixes
...
Miscellaneous fixes
2020-06-07 19:44:50 +02:00
Martin Kroeker
c3574ffe53
Merge pull request #2646 from wjc404/develop
...
Optimize AVX512 parallel DGEMM performance
2020-06-07 13:18:22 +02:00
wjc404
0e3ac4a06b
Add files via upload
2020-06-06 14:56:57 +08:00
Martin Kroeker
7f60fb6b91
Delete spurious copy of common_param.h
2020-06-05 10:04:16 +02:00
ZhangDanfeng
9b7877ccf1
sgemm copy source init
...
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:10:45 +08:00
ZhangDanfeng
f82fa802d1
Insert prefetch
...
Signed-off-by: ZhangDanfeng <467688405@qq.com>
2020-06-04 02:08:48 +08:00
Martin Kroeker
b1ee81228a
Change complex DOT and ROT to generic kernels and switch CGEMM
...
in response to test failures seen in #2628 and BLAS-Tester
2020-06-03 09:13:29 +02:00
张丹枫
9df79ae9a3
update sgemm and strmm kernel selecting strategy
2020-05-20 22:26:58 +08:00
张丹枫
a1fc6041cd
use general register to speedup
2020-05-20 22:26:58 +08:00
张丹枫
edb423d772
align general register using to strmm_kernel_8x8
2020-05-20 22:26:58 +08:00
zhangdanfeng
0e6eb8c247
sgemm kernel use sgemm_kernel_8x8_cortexa53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00
zhangdanfeng
d475db29c6
optimized for cortex-a53
...
Signed-off-by: zhangdanfeng <zhangdanfeng@cloudwalk.cn>
2020-05-20 22:26:58 +08:00