Commit Graph

5181 Commits

Author SHA1 Message Date
Martin Kroeker
dc35477317 Merge pull request #2942 from martin-frbg/makebuildtypes
Comment out  BUILD_SINGLE etc. in Makefile.rule and add a short explanation
2020-10-24 09:26:50 +02:00
Martin Kroeker
365f28787c Comment out BUILD_SINGLE etc. and add a short explanation 2020-10-23 23:32:06 +02:00
Martin Kroeker
2f2e9ddb65 Merge pull request #2941 from martin-frbg/exportsfix
Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol
2020-10-23 20:47:35 +02:00
Martin Kroeker
0d140e61ac Fix wrong grouping of dcombssq 2020-10-23 15:53:40 +02:00
Martin Kroeker
4c45cd6294 fix missing split of sladiv1/dladiv/ilaenv2stage by build type 2020-10-23 15:31:25 +02:00
Martin Kroeker
680f744abf Merge pull request #108 from xianyi/develop
rebase
2020-10-23 15:29:48 +02:00
Martin Kroeker
6f9460f0f6 Merge pull request #2937 from martin-frbg/pwr-buffersz
Increase and unify BUFFERSIZE on POWER;fix gcc inline warning
2020-10-23 07:15:32 +02:00
Martin Kroeker
6c970fa998 Merge pull request #2938 from martin-frbg/2934-3
Fix twisted spelling that broke the gfortran version test again
2020-10-23 00:19:49 +02:00
Martin Kroeker
b23cb05231 Fix twisted spelling that broke the gfortran version test again 2020-10-23 00:18:29 +02:00
Martin Kroeker
1d4c96fa0c Increase BUFFERSIZE further 2020-10-23 00:12:06 +02:00
Martin Kroeker
34c3c407ef label always_inline function as inline to silence a gcc warning 2020-10-22 22:14:26 +02:00
Martin Kroeker
3f84a9ca15 Merge pull request #2936 from martin-frbg/issue2934-2
Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)
2020-10-22 22:08:46 +02:00
Martin Kroeker
7e265c50bf Merge pull request #2935 from martin-frbg/lapack458
Fix macro used in argument conversion (LAPACK PR 458)
2020-10-22 19:25:58 +02:00
Martin Kroeker
ee90f30384 Increase BUFFERSIZE for POWER8-10 and use same value for POWER6
to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH
2020-10-22 18:47:07 +02:00
Martin Kroeker
2e48d560ba Fix compiler version check 2020-10-22 16:23:29 +02:00
Martin Kroeker
ab7f466467 Merge pull request #106 from xianyi/develop
rebase
2020-10-22 16:21:09 +02:00
Martin Kroeker
f95031204e Fix macro used in argument conversion (LAPACK PR 458) 2020-10-22 16:19:26 +02:00
Martin Kroeker
909068facf Merge pull request #2932 from RajalakshmiSR/copyp10
Optimize scopy/ccopy for POWER10
2020-10-22 00:29:46 +02:00
Martin Kroeker
5b7438fdde Merge pull request #2934 from thrasibule/improve_version_check
actually check that version is greater than 4.7
2020-10-22 00:29:02 +02:00
Guillaume Horel
47696b43e9 actually check that version is greater than 4.7 2020-10-21 16:42:37 -04:00
Rajalakshmi Srinivasaraghavan
ad745c0bae Optimize scopy/ccopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
Martin Kroeker
17c46bf06a Merge pull request #2930 from ismail/fix-no-return
Fix build with -Werror=return-type
2020-10-21 11:43:01 +02:00
Martin Kroeker
28242096cd Merge pull request #2928 from martin-frbg/issue2917
Enable -mavx2 for flang as well where supported
2020-10-21 10:11:02 +02:00
İsmail Dönmez
4a1d00f589 Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Martin Kroeker
00813363be Enable -mavx2 for flang as well 2020-10-20 23:56:30 +02:00
Martin Kroeker
336e35469a Merge pull request #105 from xianyi/develop
rebase
2020-10-20 23:48:53 +02:00
Martin Kroeker
29668458f7 Merge pull request #2925 from martin-frbg/issue2911-2
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
2020-10-20 11:27:36 +02:00
Martin Kroeker
ee83e29046 Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
x86_64: clobber all xmm registers after vzeroupper
2020-10-20 09:24:47 +02:00
Martin Kroeker
1a0f57c8f0 Fix missing backquotes 2020-10-20 08:37:53 +02:00
Bart Oldeman
b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
eddc65c7b7 Add POWER10 support flag (unconditionally for now) 2020-10-20 01:09:49 +02:00
Martin Kroeker
bb8c3f6861 Add ld/binutils version check for POWER10 support 2020-10-20 01:04:20 +02:00
Martin Kroeker
ff65952e46 Move HAVE_P10_SUPPORT to the build system
to be able to include a binutils version check
2020-10-20 00:55:41 +02:00
Martin Kroeker
6208c9899e Merge pull request #104 from xianyi/develop
rebase
2020-10-20 00:52:08 +02:00
Martin Kroeker
8e20ab21c8 Merge pull request #2924 from martin-frbg/issue2920
Put back all symbols accidentally dropped in the reorganization of gensymbol
2020-10-19 23:33:45 +02:00
Martin Kroeker
dc6e44c3f8 Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
4ad33c46b0 Add back symbols that got dropped when splitting by type 2020-10-19 20:37:52 +02:00
Martin Kroeker
fe2a922ada Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT 2020-10-19 17:43:53 +02:00
Martin Kroeker
9cac379655 Merge pull request #103 from xianyi/develop
rebase
2020-10-19 15:56:20 +02:00
Martin Kroeker
a61c086408 Fix spurious trailing whitespace in comment 2020-10-19 09:12:12 +02:00
Martin Kroeker
5b9ebe4f8a Merge pull request #2919 from isuruf/export
Fix exporting some lapack and cblas symbols
2020-10-19 08:14:27 +02:00
Martin Kroeker
7eddaf0d6f Remove -mmma again (reduntant with cpu=power10) and add override statements 2020-10-19 08:11:22 +02:00
Isuru Fernando
14b1d33933 Fix exporting some lapack and cblas 2020-10-18 22:45:58 -05:00
Martin Kroeker
77669b019d Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
sgemm_direct_skylakex: fix 75eeb26 regression.
2020-10-19 00:09:54 +02:00
Martin Kroeker
5e8ddc9001 Merge pull request #2913 from martin-frbg/issue2910
Support cross-compiling for Apple Vortex
2020-10-18 23:04:56 +02:00
Bart Oldeman
03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c Clean up STACKSIZE redefinition 2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f Clean up STACKSIZE redefinition 2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d Clean up STACKSIZE redefinition 2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46 Clean up STACKSIZE redefinition 2020-10-18 19:31:01 +02:00