Commit Graph

5876 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan ad745c0bae Optimize scopy/ccopy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
Martin Kroeker 17c46bf06a
Merge pull request #2930 from ismail/fix-no-return
Fix build with -Werror=return-type
2020-10-21 11:43:01 +02:00
Martin Kroeker 28242096cd
Merge pull request #2928 from martin-frbg/issue2917
Enable -mavx2 for flang as well where supported
2020-10-21 10:11:02 +02:00
İsmail Dönmez 4a1d00f589
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Martin Kroeker 00813363be
Enable -mavx2 for flang as well 2020-10-20 23:56:30 +02:00
Martin Kroeker 336e35469a
Merge pull request #105 from xianyi/develop
rebase
2020-10-20 23:48:53 +02:00
Martin Kroeker 29668458f7
Merge pull request #2925 from martin-frbg/issue2911-2
Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build
2020-10-20 11:27:36 +02:00
Martin Kroeker ee83e29046
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all
x86_64: clobber all xmm registers after vzeroupper
2020-10-20 09:24:47 +02:00
Martin Kroeker 1a0f57c8f0
Fix missing backquotes 2020-10-20 08:37:53 +02:00
Bart Oldeman b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker eddc65c7b7
Add POWER10 support flag (unconditionally for now) 2020-10-20 01:09:49 +02:00
Martin Kroeker bb8c3f6861
Add ld/binutils version check for POWER10 support 2020-10-20 01:04:20 +02:00
Martin Kroeker ff65952e46
Move HAVE_P10_SUPPORT to the build system
to be able to include a binutils version check
2020-10-20 00:55:41 +02:00
Martin Kroeker 6208c9899e
Merge pull request #104 from xianyi/develop
rebase
2020-10-20 00:52:08 +02:00
Martin Kroeker 8e20ab21c8
Merge pull request #2924 from martin-frbg/issue2920
Put back all symbols accidentally dropped in the reorganization of gensymbol
2020-10-19 23:33:45 +02:00
Martin Kroeker dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker 4ad33c46b0
Add back symbols that got dropped when splitting by type 2020-10-19 20:37:52 +02:00
Martin Kroeker fe2a922ada
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT 2020-10-19 17:43:53 +02:00
Martin Kroeker 9cac379655
Merge pull request #103 from xianyi/develop
rebase
2020-10-19 15:56:20 +02:00
Martin Kroeker a61c086408
Fix spurious trailing whitespace in comment 2020-10-19 09:12:12 +02:00
Martin Kroeker 5b9ebe4f8a
Merge pull request #2919 from isuruf/export
Fix exporting some lapack and cblas symbols
2020-10-19 08:14:27 +02:00
Martin Kroeker 7eddaf0d6f
Remove -mmma again (reduntant with cpu=power10) and add override statements 2020-10-19 08:11:22 +02:00
Isuru Fernando 14b1d33933 Fix exporting some lapack and cblas 2020-10-18 22:45:58 -05:00
Martin Kroeker 77669b019d
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
sgemm_direct_skylakex: fix 75eeb26 regression.
2020-10-19 00:09:54 +02:00
Martin Kroeker 5e8ddc9001
Merge pull request #2913 from martin-frbg/issue2910
Support cross-compiling for Apple Vortex
2020-10-18 23:04:56 +02:00
Bart Oldeman 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker f1a4071d8c
Clean up STACKSIZE redefinition 2020-10-18 19:41:43 +02:00
Martin Kroeker 97cf10062f
Clean up STACKSIZE redefinition 2020-10-18 19:39:18 +02:00
Martin Kroeker 17e288e18d
Clean up STACKSIZE redefinition 2020-10-18 19:37:04 +02:00
Martin Kroeker c1422f3e46
Clean up STACKSIZE redefinition 2020-10-18 19:31:01 +02:00
Martin Kroeker d85b24e103
Clean up STACKSIZE redefinition 2020-10-18 19:29:45 +02:00
Martin Kroeker 7d6c85f9da
Add compiler option -mmma for POWER10 2020-10-18 19:27:51 +02:00
Martin Kroeker 2e7ee7c716
Fix naming of L2 cache size item reported for Vortex 2020-10-18 19:22:05 +02:00
Martin Kroeker efd47b0104
Merge pull request #2909 from isuruf/patch-1
Need a space when redirecting to file
2020-10-18 19:16:08 +02:00
Martin Kroeker f5902ab0a1
Support cross-compiling for Apple Vortex 2020-10-18 19:10:58 +02:00
Martin Kroeker 1a0c185122
Support cross-compiling for Apple Vortex 2020-10-18 18:54:54 +02:00
Martin Kroeker 89eea6b455
Merge pull request #102 from xianyi/develop
rebase
2020-10-18 18:49:59 +02:00
Isuru Fernando a5c667b55c
Need a space when redirecting to file
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
2020-10-18 09:40:31 -05:00
Martin Kroeker 0ac6102708
Update version string to 0.3.11.dev 2020-10-17 22:40:47 +02:00
Martin Kroeker 26a701f4ad
Update version string to 0.3.11.dev 2020-10-17 22:40:06 +02:00
Martin Kroeker fcd0fa1a3a
Merge pull request #2908 from xianyi/release-0.3.0
Synchronyse tag with release 0.3.11
2020-10-17 22:38:58 +02:00
Martin Kroeker 51c22612eb
Merge pull request #2907 from xianyi/develop
Update from develop for 0.3.11
2020-10-17 22:14:12 +02:00
Martin Kroeker b8f689200e
Update version number to 0.3.11 2020-10-17 22:11:34 +02:00
Martin Kroeker fe9015b619
Update version for 0.3.11 release 2020-10-17 22:10:50 +02:00
Martin Kroeker f99b8c1502
Merge pull request #2906 from martin-frbg/changelog-0311
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:07:14 +02:00
Martin Kroeker 5381a18056
Update Changelog.txt with the 0.3.11 changes 2020-10-17 22:05:36 +02:00
Martin Kroeker e35576c6fc
Merge pull request #2905 from martin-frbg/aocc-clang
Add -mavx for clang & aocc
2020-10-17 09:45:22 +02:00
Martin Kroeker f1bb85d378
Add AVX flags for clang/aocc as well 2020-10-16 20:52:15 +02:00
Martin Kroeker 25907e672b
Merge pull request #101 from xianyi/develop
rebase
2020-10-16 20:48:58 +02:00
Zhang Xianyi d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00