Martin Kroeker
dc6e44c3f8
Merge pull request #2916 from martin-frbg/issue2911
...
Clean up duplicate definitions in POWER8 kernels and fix power10 option passing
2020-10-19 23:33:31 +02:00
Martin Kroeker
a61c086408
Fix spurious trailing whitespace in comment
2020-10-19 09:12:12 +02:00
Bart Oldeman
03e781b766
sgemm_direct_skylakex: fix 75eeb26 regression.
...
The
`#if defined(SKYLAKEX) || defined (COOPERLAKE)`
from that commit was before #include "common.h" so caused the
compiled function to be empty, returning garbage results for
qualifying sgemm's on those architectures.
Closes #2914
2020-10-18 19:58:07 +00:00
Martin Kroeker
f1a4071d8c
Clean up STACKSIZE redefinition
2020-10-18 19:41:43 +02:00
Martin Kroeker
97cf10062f
Clean up STACKSIZE redefinition
2020-10-18 19:39:18 +02:00
Martin Kroeker
17e288e18d
Clean up STACKSIZE redefinition
2020-10-18 19:37:04 +02:00
Martin Kroeker
c1422f3e46
Clean up STACKSIZE redefinition
2020-10-18 19:31:01 +02:00
Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
2020-10-18 19:29:45 +02:00
Martin Kroeker
df70667043
fix core list for sse/sse2
2020-10-16 09:55:48 +02:00
Martin Kroeker
f071d1207a
add sse2
2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01
Silence a redefinition warning
2020-10-15 19:08:12 +02:00
Martin Kroeker
10379fc83b
Use ifdef instead of if
2020-10-15 19:05:37 +02:00
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
...
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
2020-10-15 08:37:02 +02:00
Qiyu8
4fac91ef37
adapt arm platform
2020-10-15 11:08:10 +08:00
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
2020-10-15 10:29:42 +08:00
Martin Kroeker
ebf0470fc2
add sse4.1 for DYNAMIC_ARCH kernels
2020-10-14 20:34:33 +02:00
Martin Kroeker
c9c3ae07af
Add double precision operations
2020-10-14 18:10:45 +02:00
Martin Kroeker
756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
...
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
...
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
2020-10-13 16:05:10 -05:00
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
...
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
fecedc9c69
Add -mssse3
2020-10-13 11:55:41 +02:00
Martin Kroeker
0eacbca85f
Add Haswell and Zen to temporary sse3 whitelist
2020-10-13 11:42:39 +02:00
Martin Kroeker
6999086a2b
whitelist SANDYBRIDGE for SSE3
2020-10-13 10:32:19 +02:00
Martin Kroeker
8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM
2020-10-13 00:14:29 +02:00
Martin Kroeker
08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
...
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker
0c84ffe05f
Merge pull request #2881 from mattip/fninit
...
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Matti Picus
403eb513a0
use emms instead, add WIN guards
2020-10-12 18:15:01 +03:00
Qiyu8
0ed1f07660
Optimize the performance of sum by using universal intrinsics
2020-10-12 19:48:53 +08:00
Martin Kroeker
3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-12 00:00:55 +02:00
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:56:17 +02:00
Martin Kroeker
2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:54:53 +02:00
Martin Kroeker
dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:53:50 +02:00
Martin Kroeker
fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:42:07 +02:00
Martin Kroeker
68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c
2020-10-11 23:41:13 +02:00
Martin Kroeker
d7dd9b396c
Rename shdot.c to sbdot.c
2020-10-11 23:40:43 +02:00
Martin Kroeker
9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb"
2020-10-11 23:39:42 +02:00
Martin Kroeker
d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c
2020-10-11 23:37:38 +02:00
Martin Kroeker
c589c3e2a1
Merge pull request #2882 from martin-frbg/issue2709
...
Use generic C for (D/Z)NRM2 on Windows x86_64
2020-10-11 22:22:30 +02:00
Martin Kroeker
ec638a82bf
Merge pull request #2852 from martin-frbg/issue2588-cmake
...
Support building only a subset of variable types
2020-10-11 22:21:33 +02:00
Martin Kroeker
6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types
2020-10-11 14:52:09 +02:00
Martin Kroeker
ac653c94f3
Merge branch 'develop' into issue2588-cmake
2020-10-11 13:57:07 +02:00
Martin Kroeker
7a53128481
Add whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled
2020-10-11 01:06:46 +02:00
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
...
Optimize the performance of dot by using universal intrinsics in X86/ARM
2020-10-10 12:10:25 +02:00
Qiyu8
f32d34a015
add sse3 compiler flag
2020-10-10 10:36:15 +08:00
Martin Kroeker
7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug
2020-10-06 21:33:16 +02:00
Matti Picus
a5b164946c
add fninit to reset fpu registers before assembler routines
2020-10-05 22:13:25 +03:00
User User-User
d2333e7842
aarch64 fix std=c18 compilation
2020-10-03 18:00:34 +03:00
Qiyu8
60e6c68e38
Adapt ARM architect
2020-09-29 16:36:14 +08:00