Commit Graph

5390 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan
b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker
137ae618db Fix typo 2020-10-13 15:02:17 +02:00
Martin Kroeker
9e3cff5cf2 Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well 2020-10-13 14:41:25 +02:00
Martin Kroeker
d85b968424 Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
2020-10-13 13:46:17 +02:00
Martin Kroeker
5f60a32cac Add -mssse3 if supported by the hardware 2020-10-13 11:57:04 +02:00
Martin Kroeker
fecedc9c69 Add -mssse3 2020-10-13 11:55:41 +02:00
Martin Kroeker
0eacbca85f Add Haswell and Zen to temporary sse3 whitelist 2020-10-13 11:42:39 +02:00
Martin Kroeker
6999086a2b whitelist SANDYBRIDGE for SSE3 2020-10-13 10:32:19 +02:00
Martin Kroeker
9dca578c79 Cleanup 2020-10-13 10:14:08 +02:00
Martin Kroeker
1e7eb7b7a9 Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker
84949754a0 Fix bfloat16 conditional 2020-10-13 09:11:36 +02:00
Martin Kroeker
2ae8785603 Add a POWER9 build with BFLOAT16 enabled 2020-10-13 09:07:50 +02:00
Martin Kroeker
e05af6575e Fix some overlooked "SHBLAS" entries 2020-10-13 09:05:04 +02:00
Martin Kroeker
c1643006ae Merge pull request #97 from xianyi/develop
rebase
2020-10-13 09:01:49 +02:00
Martin Kroeker
8d2df7d066 Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 2020-10-13 00:14:29 +02:00
Martin Kroeker
08929430cd Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker
0c84ffe05f Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Martin Kroeker
cb4274e3ad Merge pull request #2888 from Qiyu8/usimd-sum
Optimize the performance of sum by using universal intrinsics
2020-10-12 23:22:08 +02:00
Matti Picus
403eb513a0 use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
Martin Kroeker
cb839575ed Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme 2020-10-12 14:44:33 +02:00
Qiyu8
0ed1f07660 Optimize the performance of sum by using universal intrinsics 2020-10-12 19:48:53 +08:00
Martin Kroeker
bb74dd29db Restore -msse3 2020-10-12 00:42:05 +02:00
Martin Kroeker
629c497b6c common_sh.h renamed to common_sb.h 2020-10-12 00:27:11 +02:00
Martin Kroeker
2c552f1074 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:11:31 +02:00
Martin Kroeker
7ae9e8960e Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:08:29 +02:00
Martin Kroeker
e3a29f6b58 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker
006c7f6671 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:06:06 +02:00
Martin Kroeker
85154c2e18 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:05:05 +02:00
Martin Kroeker
ae1ab5bfdf Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:03:21 +02:00
Martin Kroeker
052f31bc3c Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker
3aecafad80 Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:00:55 +02:00
Martin Kroeker
756062afa5 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:56:17 +02:00
Martin Kroeker
2061f7fdff Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:54:53 +02:00
Martin Kroeker
dc8a1afa63 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:53:50 +02:00
Martin Kroeker
32733ded04 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker
3bc8e8c334 Rename "HALF" and "sh" to "BFLOAT16"and "sb" 2020-10-11 23:51:34 +02:00
Martin Kroeker
573508f0ee Rename common_sh.h to common_sb.h 2020-10-11 23:50:54 +02:00
Martin Kroeker
ca31c32693 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:49:22 +02:00
Martin Kroeker
5800758b43 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:44:38 +02:00
Martin Kroeker
924fd806d0 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:43:36 +02:00
Martin Kroeker
4db09c6cec Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c 2020-10-11 23:42:45 +02:00
Martin Kroeker
fd94236042 Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:42:07 +02:00
Martin Kroeker
68ce719fac Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c 2020-10-11 23:41:13 +02:00
Martin Kroeker
d7dd9b396c Rename shdot.c to sbdot.c 2020-10-11 23:40:43 +02:00
Martin Kroeker
9ae80490e0 rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:39:42 +02:00
Martin Kroeker
d314d1f49f Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c 2020-10-11 23:37:38 +02:00
Martin Kroeker
f0883740e4 Merge pull request #96 from xianyi/develop
rebase
2020-10-11 23:34:36 +02:00
Martin Kroeker
1c0b03efb4 Merge branch 'develop' into develop 2020-10-11 23:34:14 +02:00
Martin Kroeker
c589c3e2a1 Merge pull request #2882 from martin-frbg/issue2709
Use generic C for (D/Z)NRM2 on Windows x86_64
2020-10-11 22:22:30 +02:00
Martin Kroeker
ec638a82bf Merge pull request #2852 from martin-frbg/issue2588-cmake
Support building only a subset of variable types
2020-10-11 22:21:33 +02:00