Commit Graph

1568 Commits

Author SHA1 Message Date
Martin Kroeker f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Xianyi Zhang 7037849498 Merge branch 'develop' into risc-v 2020-11-22 16:04:50 +08:00
Martin Kroeker 7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Gengxin Xie d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined 2020-11-13 14:20:52 +08:00
Qiyu8 ae0b1dea19 modify system.cmake to enable fma flag 2020-11-13 10:20:24 +08:00
Qiyu8 e0dac6b53b fix the CI failure of target specific option mismatch 2020-11-12 20:31:03 +08:00
Qiyu8 e5c2ceb675 fix the CI failure of lack the head 2020-11-12 17:35:17 +08:00
Qiyu8 a87e537b8c modify macro 2020-11-11 15:53:48 +08:00
Qiyu8 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. 2020-11-11 15:18:01 +08:00
Qiyu8 8c0b206d4c Optimize the performance of rot by using universal intrinsics 2020-11-11 14:33:12 +08:00
Qiyu8 c4c591ac5a fix sum optimize issues 2020-11-10 16:16:38 +08:00
Xianyi Zhang fc35b72ae1 Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang 913cc9a4ca Merge branch 'develop' into risc-v 2020-11-10 09:18:25 +08:00
Martin Kroeker ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker 110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Rajalakshmi Srinivasaraghavan 6e364981a8 Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds 2020-11-07 20:39:56 +01:00
Martin Kroeker ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker 28d2dfe2b3
Fix macro name used in ifdef 2020-11-07 12:17:49 +01:00
Gengxin Xie 725ffbf041 fix typo 2020-11-05 16:25:17 +08:00
Gengxin Xie d9ba49165a Improve the performance of rot by using AVX512 and AVX2 intrinsic 2020-11-05 15:12:36 +08:00
Rajalakshmi Srinivasaraghavan dd7a9cc5bf POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature.   Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Rajalakshmi Srinivasaraghavan b435491885 Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker 67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Rajalakshmi Srinivasaraghavan c24ba8b1dd Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Martin Kroeker 6f9460f0f6
Merge pull request #2937 from martin-frbg/pwr-buffersz
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
2020-10-23 07:15:32 +02:00
Guillaume Horel 1917a4e7b8 reuse variables defined in Makefile.system 2020-10-22 22:04:25 -04:00
Martin Kroeker 34c3c407ef
label always_inline function as inline to silence a gcc warning 2020-10-22 22:14:26 +02:00
Martin Kroeker 2e48d560ba
Fix compiler version check 2020-10-22 16:23:29 +02:00
Rajalakshmi Srinivasaraghavan ad745c0bae Optimize scopy/ccopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
İsmail Dönmez 4a1d00f589
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker a61c086408
Fix spurious trailing whitespace in comment 2020-10-19 09:12:12 +02:00
Bart Oldeman 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker f1a4071d8c
Clean up STACKSIZE redefinition 2020-10-18 19:41:43 +02:00
Martin Kroeker 97cf10062f
Clean up STACKSIZE redefinition 2020-10-18 19:39:18 +02:00
Martin Kroeker 17e288e18d
Clean up STACKSIZE redefinition 2020-10-18 19:37:04 +02:00
Martin Kroeker c1422f3e46
Clean up STACKSIZE redefinition 2020-10-18 19:31:01 +02:00
Martin Kroeker d85b24e103
Clean up STACKSIZE redefinition 2020-10-18 19:29:45 +02:00
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker df70667043
fix core list for sse/sse2 2020-10-16 09:55:48 +02:00
Martin Kroeker f071d1207a
add sse2 2020-10-15 22:10:32 +02:00
Martin Kroeker dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels 2020-10-15 20:16:15 +02:00
Martin Kroeker c339c40c01
Silence a redefinition warning 2020-10-15 19:08:12 +02:00
Martin Kroeker 10379fc83b
Use ifdef instead of if 2020-10-15 19:05:37 +02:00
Martin Kroeker 4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
damonyu ef8e7d0279 Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
2020-10-15 16:09:02 +08:00
Martin Kroeker ae6ac83991
Revert "add double precision SSE" 2020-10-15 08:37:02 +02:00