Martin Kroeker
2e48d560ba
Fix compiler version check
2020-10-22 16:23:29 +02:00
Rajalakshmi Srinivasaraghavan
ad745c0bae
Optimize scopy/ccopy for POWER10
...
This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.
2020-10-21 09:53:45 -05:00
İsmail Dönmez
4a1d00f589
Fix build with -Werror=return-type
...
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.
2020-10-21 08:43:39 +02:00
Bart Oldeman
b073d759d0
x86_64: clobber all xmm registers after vzeroupper
...
As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.
In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.
This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.
2020-10-20 02:16:47 +00:00
Martin Kroeker
dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
...
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
a61c086408
Fix spurious trailing whitespace in comment
2020-10-19 09:12:12 +02:00
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26
regression.
...
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c
Clean up STACKSIZE redefinition
2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f
Clean up STACKSIZE redefinition
2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d
Clean up STACKSIZE redefinition
2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46
Clean up STACKSIZE redefinition
2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
2020-10-18 19:29:45 +02:00
Martin Kroeker
df70667043
fix core list for sse/sse2
2020-10-16 09:55:48 +02:00
Martin Kroeker
f071d1207a
add sse2
2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01
Silence a redefinition warning
2020-10-15 19:08:12 +02:00
Martin Kroeker
10379fc83b
Use ifdef instead of if
2020-10-15 19:05:37 +02:00
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
...
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
2020-10-15 08:37:02 +02:00
Qiyu8
4fac91ef37
adapt arm platform
2020-10-15 11:08:10 +08:00
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
2020-10-15 10:29:42 +08:00
Martin Kroeker
ebf0470fc2
add sse4.1 for DYNAMIC_ARCH kernels
2020-10-14 20:34:33 +02:00
Martin Kroeker
c9c3ae07af
Add double precision operations
2020-10-14 18:10:45 +02:00
Martin Kroeker
756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
...
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
...
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
2020-10-13 16:05:10 -05:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
fecedc9c69
Add -mssse3
2020-10-13 11:55:41 +02:00
Martin Kroeker
0eacbca85f
Add Haswell and Zen to temporary sse3 whitelist
2020-10-13 11:42:39 +02:00
Martin Kroeker
6999086a2b
whitelist SANDYBRIDGE for SSE3
2020-10-13 10:32:19 +02:00
Martin Kroeker
8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
2020-10-13 00:14:29 +02:00
Martin Kroeker
08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
...
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker
0c84ffe05f
Merge pull request #2881 from mattip/fninit
...
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Matti Picus
403eb513a0
use emms instead, add WIN guards
2020-10-12 18:15:01 +03:00
Qiyu8
0ed1f07660
Optimize the performance of sum by using universal intrinsics
2020-10-12 19:48:53 +08:00
Martin Kroeker
3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:00:55 +02:00
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:56:17 +02:00
Martin Kroeker
2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:54:53 +02:00
Martin Kroeker
dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:53:50 +02:00
Martin Kroeker
fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:42:07 +02:00
Martin Kroeker
68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c
2020-10-11 23:41:13 +02:00
Martin Kroeker
d7dd9b396c
Rename shdot.c to sbdot.c
2020-10-11 23:40:43 +02:00
Martin Kroeker
9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:39:42 +02:00
Martin Kroeker
d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
2020-10-11 23:37:38 +02:00
Martin Kroeker
c589c3e2a1
Merge pull request #2882 from martin-frbg/issue2709
...
Use generic C for (D/Z)NRM2 on Windows x86_64
2020-10-11 22:22:30 +02:00
Martin Kroeker
ec638a82bf
Merge pull request #2852 from martin-frbg/issue2588-cmake
...
Support building only a subset of variable types
2020-10-11 22:21:33 +02:00
Martin Kroeker
6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types
2020-10-11 14:52:09 +02:00
Martin Kroeker
ac653c94f3
Merge branch 'develop' into issue2588-cmake
2020-10-11 13:57:07 +02:00
Martin Kroeker
7a53128481
Add whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled
2020-10-11 01:06:46 +02:00
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
...
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8
f32d34a015
add sse3 compiler flag
2020-10-10 10:36:15 +08:00