Commit Graph

5122 Commits

Author SHA1 Message Date
Zhang Xianyi
d7ba7679b6 Merge branch 'develop' into risc-v 2020-10-16 23:27:38 +08:00
Martin Kroeker
9789375389 Merge pull request #2900 from martin-frbg/fixcmake_sse
Add compiler options for SSE to the cmake support files
2020-10-16 16:17:36 +02:00
Martin Kroeker
f64243ff57 Add compiler options for sse/sse2/ssse3/sse4.1 2020-10-16 10:47:06 +02:00
Martin Kroeker
786c0a3ce8 Add sse options for use of intrinics with older compilers 2020-10-16 10:41:53 +02:00
Martin Kroeker
df70667043 fix core list for sse/sse2 2020-10-16 09:55:48 +02:00
Martin Kroeker
e6c5b13a18 Merge pull request #2898 from martin-frbg/morefixes
More pre-release fixes
2020-10-16 07:26:39 +02:00
Martin Kroeker
f071d1207a add sse2 2020-10-15 22:10:32 +02:00
Martin Kroeker
dc6cefd2f5 Expressly enable -msse for 32bit DYNAMIC_ARCH kernels 2020-10-15 20:16:15 +02:00
Martin Kroeker
c339c40c01 Silence a redefinition warning 2020-10-15 19:08:12 +02:00
Martin Kroeker
ac8af9cec6 Add -msse where supported, apparently required for older gcc 2020-10-15 19:06:45 +02:00
Martin Kroeker
10379fc83b Use ifdef instead of if 2020-10-15 19:05:37 +02:00
Martin Kroeker
a85ac71633 Merge pull request #100 from xianyi/develop
rebase
2020-10-15 18:54:20 +02:00
Martin Kroeker
4c25910da0 Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
2020-10-15 11:12:35 +02:00
Martin Kroeker
9b9ee92d5f Merge pull request #2897 from Qiyu8/usimd-double
Add double precision universal intrinsics for X86/ARM
2020-10-15 08:38:24 +02:00
Martin Kroeker
ae6ac83991 Revert "add double precision SSE" 2020-10-15 08:37:02 +02:00
Qiyu8
4fac91ef37 adapt arm platform 2020-10-15 11:08:10 +08:00
Qiyu8
bfdf4b56da Add double precision universal intrinsics for X86/ARM 2020-10-15 10:29:42 +08:00
Martin Kroeker
ebf0470fc2 add sse4.1 for DYNAMIC_ARCH kernels 2020-10-14 20:34:33 +02:00
Martin Kroeker
ca160bb440 Add -msse4.1 when SSE4.1 is supported 2020-10-14 19:18:07 +02:00
Martin Kroeker
c9c3ae07af Add double precision operations 2020-10-14 18:10:45 +02:00
Martin Kroeker
a897bc3bd2 Merge pull request #99 from xianyi/develop
rebase
2020-10-14 18:09:20 +02:00
Martin Kroeker
756802df61 Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
2020-10-14 09:02:03 +02:00
Martin Kroeker
01492decf4 Merge pull request #2895 from martin-frbg/sb-tests
Fix remaining build errors related to bfloat16 and cmake
2020-10-14 09:01:16 +02:00
Martin Kroeker
bd0752444a Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
2020-10-14 08:12:08 +02:00
Martin Kroeker
c1f4f5d4e7 Replace Makefile with simplified version again 2020-10-14 01:08:50 +02:00
Martin Kroeker
75e3a92df6 Add express -mavx and -msse options (and fix a stray = for cooperlake) 2020-10-14 01:01:58 +02:00
Martin Kroeker
2a329baa81 Add the BFLOAT16 functions to cmake builds 2020-10-13 23:21:38 +02:00
Rajalakshmi Srinivasaraghavan
0826d68f93 POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code.  This avoids permute instructions
in the gemm kernel inner loop.
2020-10-13 16:05:10 -05:00
Martin Kroeker
4bb73c0171 Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker
bc5c7f9578 Cleanup 2020-10-13 19:56:09 +02:00
Martin Kroeker
437b7fe261 sh prefix renamed to sb 2020-10-13 19:55:14 +02:00
Martin Kroeker
a0ada4bcb8 Merge pull request #98 from xianyi/develop
rebase
2020-10-13 18:50:30 +02:00
Martin Kroeker
602a0c7a69 Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
2020-10-13 18:48:37 +02:00
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
137ae618db Fix typo 2020-10-13 15:02:17 +02:00
Martin Kroeker
9e3cff5cf2 Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well 2020-10-13 14:41:25 +02:00
Martin Kroeker
d85b968424 Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
2020-10-13 13:46:17 +02:00
Martin Kroeker
5f60a32cac Add -mssse3 if supported by the hardware 2020-10-13 11:57:04 +02:00
Martin Kroeker
fecedc9c69 Add -mssse3 2020-10-13 11:55:41 +02:00
Martin Kroeker
0eacbca85f Add Haswell and Zen to temporary sse3 whitelist 2020-10-13 11:42:39 +02:00
Martin Kroeker
6999086a2b whitelist SANDYBRIDGE for SSE3 2020-10-13 10:32:19 +02:00
Martin Kroeker
9dca578c79 Cleanup 2020-10-13 10:14:08 +02:00
Martin Kroeker
1e7eb7b7a9 Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker
84949754a0 Fix bfloat16 conditional 2020-10-13 09:11:36 +02:00
Martin Kroeker
2ae8785603 Add a POWER9 build with BFLOAT16 enabled 2020-10-13 09:07:50 +02:00
Martin Kroeker
e05af6575e Fix some overlooked "SHBLAS" entries 2020-10-13 09:05:04 +02:00
Martin Kroeker
c1643006ae Merge pull request #97 from xianyi/develop
rebase
2020-10-13 09:01:49 +02:00
Martin Kroeker
8d2df7d066 Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 2020-10-13 00:14:29 +02:00
Martin Kroeker
08929430cd Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker
0c84ffe05f Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00