Bart Oldeman
b073d759d0
x86_64: clobber all xmm registers after vzeroupper
...
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
8e20ab21c8
Merge pull request #2924 from martin-frbg/issue2920
...
Put back all symbols accidentally dropped in the reorganization of gensymbol
2020-10-19 23:33:45 +02:00
Martin Kroeker
dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
...
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
4ad33c46b0
Add back symbols that got dropped when splitting by type
2020-10-19 20:37:52 +02:00
Martin Kroeker
fe2a922ada
Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT
2020-10-19 17:43:53 +02:00
Martin Kroeker
9cac379655
Merge pull request #103 from xianyi/develop
...
rebase
2020-10-19 15:56:20 +02:00
Martin Kroeker
a61c086408
Fix spurious trailing whitespace in comment
2020-10-19 09:12:12 +02:00
Martin Kroeker
5b9ebe4f8a
Merge pull request #2919 from isuruf/export
...
Fix exporting some lapack and cblas symbols
2020-10-19 08:14:27 +02:00
Martin Kroeker
7eddaf0d6f
Remove -mmma again (reduntant with cpu=power10) and add override statements
2020-10-19 08:11:22 +02:00
Isuru Fernando
14b1d33933
Fix exporting some lapack and cblas
2020-10-18 22:45:58 -05:00
Martin Kroeker
77669b019d
Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex
...
sgemm_direct_skylakex: fix 75eeb26 regression.
2020-10-19 00:09:54 +02:00
Martin Kroeker
5e8ddc9001
Merge pull request #2913 from martin-frbg/issue2910
...
Support cross-compiling for Apple Vortex
2020-10-18 23:04:56 +02:00
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26 regression.
...
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c
Clean up STACKSIZE redefinition
2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f
Clean up STACKSIZE redefinition
2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d
Clean up STACKSIZE redefinition
2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46
Clean up STACKSIZE redefinition
2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
2020-10-18 19:29:45 +02:00
Martin Kroeker
7d6c85f9da
Add compiler option -mmma for POWER10
2020-10-18 19:27:51 +02:00
Martin Kroeker
2e7ee7c716
Fix naming of L2 cache size item reported for Vortex
2020-10-18 19:22:05 +02:00
Martin Kroeker
efd47b0104
Merge pull request #2909 from isuruf/patch-1
...
Need a space when redirecting to file
2020-10-18 19:16:08 +02:00
Martin Kroeker
f5902ab0a1
Support cross-compiling for Apple Vortex
2020-10-18 19:10:58 +02:00
Martin Kroeker
1a0c185122
Support cross-compiling for Apple Vortex
2020-10-18 18:54:54 +02:00
Martin Kroeker
89eea6b455
Merge pull request #102 from xianyi/develop
...
rebase
2020-10-18 18:49:59 +02:00
Isuru Fernando
a5c667b55c
Need a space when redirecting to file
...
Following two commands have two completely different meanings
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1 > objcopy.def
perl ./gensymbol objcopy x86_64 _ 0 0 0 0 0 0 "" "64_" 1 0 1 1 1 1> objcopy.def
2020-10-18 09:40:31 -05:00
Martin Kroeker
0ac6102708
Update version string to 0.3.11.dev
2020-10-17 22:40:47 +02:00
Martin Kroeker
26a701f4ad
Update version string to 0.3.11.dev
2020-10-17 22:40:06 +02:00
Martin Kroeker
fcd0fa1a3a
Merge pull request #2908 from xianyi/release-0.3.0
...
Synchronyse tag with release 0.3.11
2020-10-17 22:38:58 +02:00
Martin Kroeker
51c22612eb
Merge pull request #2907 from xianyi/develop
...
Update from develop for 0.3.11
v0.3.11
2020-10-17 22:14:12 +02:00
Martin Kroeker
b8f689200e
Update version number to 0.3.11
2020-10-17 22:11:34 +02:00
Martin Kroeker
fe9015b619
Update version for 0.3.11 release
2020-10-17 22:10:50 +02:00
Martin Kroeker
f99b8c1502
Merge pull request #2906 from martin-frbg/changelog-0311
...
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:07:14 +02:00
Martin Kroeker
5381a18056
Update Changelog.txt with the 0.3.11 changes
2020-10-17 22:05:36 +02:00
Martin Kroeker
e35576c6fc
Merge pull request #2905 from martin-frbg/aocc-clang
...
Add -mavx for clang & aocc
2020-10-17 09:45:22 +02:00
Martin Kroeker
f1bb85d378
Add AVX flags for clang/aocc as well
2020-10-16 20:52:15 +02:00
Martin Kroeker
25907e672b
Merge pull request #101 from xianyi/develop
...
rebase
2020-10-16 20:48:58 +02:00
Martin Kroeker
9789375389
Merge pull request #2900 from martin-frbg/fixcmake_sse
...
Add compiler options for SSE to the cmake support files
2020-10-16 16:17:36 +02:00
Martin Kroeker
f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1
2020-10-16 10:47:06 +02:00
Martin Kroeker
786c0a3ce8
Add sse options for use of intrinics with older compilers
2020-10-16 10:41:53 +02:00
Martin Kroeker
df70667043
fix core list for sse/sse2
2020-10-16 09:55:48 +02:00
Martin Kroeker
e6c5b13a18
Merge pull request #2898 from martin-frbg/morefixes
...
More pre-release fixes
2020-10-16 07:26:39 +02:00
Martin Kroeker
f071d1207a
add sse2
2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01
Silence a redefinition warning
2020-10-15 19:08:12 +02:00
Martin Kroeker
ac8af9cec6
Add -msse where supported, apparently required for older gcc
2020-10-15 19:06:45 +02:00
Martin Kroeker
10379fc83b
Use ifdef instead of if
2020-10-15 19:05:37 +02:00
Martin Kroeker
a85ac71633
Merge pull request #100 from xianyi/develop
...
rebase
2020-10-15 18:54:20 +02:00
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
...
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
Martin Kroeker
9b9ee92d5f
Merge pull request #2897 from Qiyu8/usimd-double
...
Add double precision universal intrinsics for X86/ARM
2020-10-15 08:38:24 +02:00
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
2020-10-15 08:37:02 +02:00