Commit Graph

5148 Commits

Author SHA1 Message Date
Bart Oldeman
b073d759d0 x86_64: clobber all xmm registers after vzeroupper
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
8e20ab21c8 Merge pull request #2924 from martin-frbg/issue2920
Put back all symbols accidentally dropped in the reorganization of gensymbol
2020-10-19 23:33:45 +02:00
Martin Kroeker
dc6e44c3f8 Merge pull request #2916 from martin-frbg/issue2911
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
4ad33c46b0 Add back symbols that got dropped when splitting by type 2020-10-19 20:37:52 +02:00
Martin Kroeker
fe2a922ada Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT 2020-10-19 17:43:53 +02:00
Martin Kroeker
9cac379655 Merge pull request #103 from xianyi/develop
rebase
2020-10-19 15:56:20 +02:00
Martin Kroeker
a61c086408 Fix spurious trailing whitespace in comment 2020-10-19 09:12:12 +02:00
Martin Kroeker
5b9ebe4f8a Merge pull request #2919 from isuruf/export
Fix exporting some lapack and cblas symbols
2020-10-19 08:14:27 +02:00
Martin Kroeker
7eddaf0d6f Remove -mmma again (reduntant with cpu=power10) and add override statements 2020-10-19 08:11:22 +02:00
Isuru Fernando
14b1d33933 Fix exporting some lapack and cblas 2020-10-18 22:45:58 -05:00
Martin Kroeker
77669b019d Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
sgemm_direct_skylakex: fix 75eeb26 regression.
2020-10-19 00:09:54 +02:00
Martin Kroeker
5e8ddc9001 Merge pull request #2913 from martin-frbg/issue2910
Support cross-compiling for Apple Vortex
2020-10-18 23:04:56 +02:00
Bart Oldeman
03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression.
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.

Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c Clean up STACKSIZE redefinition 2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f Clean up STACKSIZE redefinition 2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d Clean up STACKSIZE redefinition 2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46 Clean up STACKSIZE redefinition 2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103 Clean up STACKSIZE redefinition 2020-10-18 19:29:45 +02:00
Martin Kroeker
7d6c85f9da Add compiler option -mmma for POWER10 2020-10-18 19:27:51 +02:00
Martin Kroeker
2e7ee7c716 Fix naming of L2 cache size item reported for Vortex 2020-10-18 19:22:05 +02:00
Martin Kroeker
efd47b0104 Merge pull request #2909 from isuruf/patch-1
Need a space when redirecting to file
2020-10-18 19:16:08 +02:00
Martin Kroeker
f5902ab0a1 Support cross-compiling for Apple Vortex 2020-10-18 19:10:58 +02:00
Martin Kroeker
1a0c185122 Support cross-compiling for Apple Vortex 2020-10-18 18:54:54 +02:00
Martin Kroeker
89eea6b455 Merge pull request #102 from xianyi/develop
rebase
2020-10-18 18:49:59 +02:00
Isuru Fernando
a5c667b55c Need a space when redirecting to file
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0  0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
2020-10-18 09:40:31 -05:00
Martin Kroeker
0ac6102708 Update version string to 0.3.11.dev 2020-10-17 22:40:47 +02:00
Martin Kroeker
26a701f4ad Update version string to 0.3.11.dev 2020-10-17 22:40:06 +02:00
Martin Kroeker
fcd0fa1a3a Merge pull request #2908 from xianyi/release-0.3.0
Synchronyse tag with release 0.3.11
2020-10-17 22:38:58 +02:00
Martin Kroeker
51c22612eb Merge pull request #2907 from xianyi/develop
Update from develop for 0.3.11
v0.3.11
2020-10-17 22:14:12 +02:00
Martin Kroeker
b8f689200e Update version number to 0.3.11 2020-10-17 22:11:34 +02:00
Martin Kroeker
fe9015b619 Update version for 0.3.11 release 2020-10-17 22:10:50 +02:00
Martin Kroeker
f99b8c1502 Merge pull request #2906 from martin-frbg/changelog-0311
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:07:14 +02:00
Martin Kroeker
5381a18056 Update Changelog.txt with the 0.3.11 changes 2020-10-17 22:05:36 +02:00
Martin Kroeker
e35576c6fc Merge pull request #2905 from martin-frbg/aocc-clang
Add -mavx for clang & aocc
2020-10-17 09:45:22 +02:00
Martin Kroeker
f1bb85d378 Add AVX flags for clang/aocc as well 2020-10-16 20:52:15 +02:00
Martin Kroeker
25907e672b Merge pull request #101 from xianyi/develop
rebase
2020-10-16 20:48:58 +02:00
Martin Kroeker
9789375389 Merge pull request #2900 from martin-frbg/fixcmake_sse
Add compiler options for SSE to the cmake support files
2020-10-16 16:17:36 +02:00
Martin Kroeker
f64243ff57 Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker
786c0a3ce8 Add sse options for use of intrinics with older compilers 2020-10-16 10:41:53 +02:00
Martin Kroeker
df70667043 fix core list for sse/sse2 2020-10-16 09:55:48 +02:00
Martin Kroeker
e6c5b13a18 Merge pull request #2898 from martin-frbg/morefixes
More pre-release fixes
2020-10-16 07:26:39 +02:00
Martin Kroeker
f071d1207a add sse2 2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5 Expressly enable -msse for 32bit DYNAMIC_ARCH kernels 2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01 Silence a redefinition warning 2020-10-15 19:08:12 +02:00
Martin Kroeker
ac8af9cec6 Add -msse where supported, apparently required for older gcc 2020-10-15 19:06:45 +02:00
Martin Kroeker
10379fc83b Use ifdef instead of if 2020-10-15 19:05:37 +02:00
Martin Kroeker
a85ac71633 Merge pull request #100 from xianyi/develop
rebase
2020-10-15 18:54:20 +02:00
Martin Kroeker
4c25910da0 Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
Martin Kroeker
9b9ee92d5f Merge pull request #2897 from Qiyu8/usimd-double
Add double precision universal intrinsics for X86/ARM
2020-10-15 08:38:24 +02:00
Martin Kroeker
ae6ac83991 Revert "add double precision SSE" 2020-10-15 08:37:02 +02:00