Martin Kroeker
f1bf040b25
Merge pull request #2988 from xiegengxin/smp-asum
...
Improve the performance of dasum and sasum when SMP is defined
2020-11-22 12:24:13 +01:00
Xianyi Zhang
7037849498
Merge branch 'develop' into risc-v
2020-11-22 16:04:50 +08:00
Martin Kroeker
7e9cb39a25
Merge pull request #2981 from Qiyu8/fix-sum
...
Fix sum optimize issues
2020-11-16 08:40:46 +01:00
Gengxin Xie
d6e7e05bb3
Improve the performance of dasum and sasum when SMP is defined
2020-11-13 14:20:52 +08:00
Qiyu8
ae0b1dea19
modify system.cmake to enable fma flag
2020-11-13 10:20:24 +08:00
Qiyu8
e0dac6b53b
fix the CI failure of target specific option mismatch
2020-11-12 20:31:03 +08:00
Qiyu8
e5c2ceb675
fix the CI failure of lack the head
2020-11-12 17:35:17 +08:00
Qiyu8
a87e537b8c
modify macro
2020-11-11 15:53:48 +08:00
Qiyu8
5bc0a7583f
only FMA3 and vector larger than 128 have positive effects.
2020-11-11 15:18:01 +08:00
Qiyu8
8c0b206d4c
Optimize the performance of rot by using universal intrinsics
2020-11-11 14:33:12 +08:00
Qiyu8
c4c591ac5a
fix sum optimize issues
2020-11-10 16:16:38 +08:00
Xianyi Zhang
fc35b72ae1
Refs #2899
...
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910
2020-11-10 09:38:04 +08:00
Xianyi Zhang
913cc9a4ca
Merge branch 'develop' into risc-v
2020-11-10 09:18:25 +08:00
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
...
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-08 22:43:00 +01:00
Martin Kroeker
110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
...
Optimize sdot/ddot for POWER10
2020-11-08 10:19:34 +01:00
Rajalakshmi Srinivasaraghavan
6e364981a8
Optimize sdot/ddot for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-11-07 15:21:58 -06:00
Martin Kroeker
b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
2020-11-07 20:39:56 +01:00
Martin Kroeker
ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
...
Fix macro name used in ifdef for POWERPC/PGI
2020-11-07 14:41:34 +01:00
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
2020-11-07 12:17:49 +01:00
Gengxin Xie
725ffbf041
fix typo
2020-11-05 16:25:17 +08:00
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
2020-11-05 15:12:36 +08:00
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
...
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
2020-10-31 18:28:57 -05:00
Rajalakshmi Srinivasaraghavan
b435491885
Optimize caxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-29 14:57:51 -05:00
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
...
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2020-10-29 02:08:23 +08:00
Martin Kroeker
67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
...
reuse variables defined in Makefile.system
2020-10-28 09:38:40 +01:00
Rajalakshmi Srinivasaraghavan
c24ba8b1dd
Optimize saxpy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
2020-10-26 13:24:59 -05:00
Martin Kroeker
6f9460f0f6
Merge pull request #2937 from martin-frbg/pwr-buffersz
...
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
2020-10-23 07:15:32 +02:00
Guillaume Horel
1917a4e7b8
reuse variables defined in Makefile.system
2020-10-22 22:04:25 -04:00
Martin Kroeker
34c3c407ef
label always_inline function as inline to silence a gcc warning
2020-10-22 22:14:26 +02:00
Martin Kroeker
2e48d560ba
Fix compiler version check
2020-10-22 16:23:29 +02:00
Rajalakshmi Srinivasaraghavan
ad745c0bae
Optimize scopy/ccopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
İsmail Dönmez
4a1d00f589
Fix build with -Werror=return-type
...
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Bart Oldeman
b073d759d0
x86_64: clobber all xmm registers after vzeroupper
...
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
...
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
a61c086408
Fix spurious trailing whitespace in comment
2020-10-19 09:12:12 +02:00
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26
regression.
...
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c
Clean up STACKSIZE redefinition
2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f
Clean up STACKSIZE redefinition
2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d
Clean up STACKSIZE redefinition
2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46
Clean up STACKSIZE redefinition
2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
2020-10-18 19:29:45 +02:00
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
2020-10-16 23:27:38 +08:00
Martin Kroeker
df70667043
fix core list for sse/sse2
2020-10-16 09:55:48 +02:00
Martin Kroeker
f071d1207a
add sse2
2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01
Silence a redefinition warning
2020-10-15 19:08:12 +02:00
Martin Kroeker
10379fc83b
Use ifdef instead of if
2020-10-15 19:05:37 +02:00
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
...
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
damonyu
ef8e7d0279
Add the support for RISC-V Vector.
...
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
2020-10-15 16:09:02 +08:00
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
2020-10-15 08:37:02 +02:00