Commit Graph

7452 Commits

Author SHA1 Message Date
Martin Kroeker c1f4f5d4e7
Replace Makefile with simplified version again 2020-10-14 01:08:50 +02:00
Martin Kroeker 75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake) 2020-10-14 01:01:58 +02:00
Martin Kroeker 2a329baa81
Add the BFLOAT16 functions to cmake builds 2020-10-13 23:21:38 +02:00
Rajalakshmi Srinivasaraghavan 0826d68f93 POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code.  This avoids permute instructions
in the gemm kernel inner loop.
2020-10-13 16:05:10 -05:00
Martin Kroeker 4bb73c0171
Rename "HALF" type to "BFLOAT16" 2020-10-13 20:07:19 +02:00
Martin Kroeker bc5c7f9578
Cleanup 2020-10-13 19:56:09 +02:00
Martin Kroeker 437b7fe261
sh prefix renamed to sb 2020-10-13 19:55:14 +02:00
Martin Kroeker a0ada4bcb8
Merge pull request #98 from xianyi/develop
rebase
2020-10-13 18:50:30 +02:00
Martin Kroeker 602a0c7a69
Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
2020-10-13 18:48:37 +02:00
Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
2020-10-13 11:00:22 -05:00
Martin Kroeker 137ae618db
Fix typo 2020-10-13 15:02:17 +02:00
Martin Kroeker 9e3cff5cf2
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well 2020-10-13 14:41:25 +02:00
Martin Kroeker d85b968424
Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
2020-10-13 13:46:17 +02:00
Martin Kroeker 5f60a32cac
Add -mssse3 if supported by the hardware 2020-10-13 11:57:04 +02:00
Martin Kroeker fecedc9c69
Add -mssse3 2020-10-13 11:55:41 +02:00
Martin Kroeker 0eacbca85f
Add Haswell and Zen to temporary sse3 whitelist 2020-10-13 11:42:39 +02:00
Martin Kroeker 6999086a2b
whitelist SANDYBRIDGE for SSE3 2020-10-13 10:32:19 +02:00
Martin Kroeker 9dca578c79
Cleanup 2020-10-13 10:14:08 +02:00
Martin Kroeker 1e7eb7b7a9
Fix typos in currently unused sections 2020-10-13 09:17:15 +02:00
Martin Kroeker 84949754a0
Fix bfloat16 conditional 2020-10-13 09:11:36 +02:00
Martin Kroeker 2ae8785603
Add a POWER9 build with BFLOAT16 enabled 2020-10-13 09:07:50 +02:00
Martin Kroeker e05af6575e
Fix some overlooked "SHBLAS" entries 2020-10-13 09:05:04 +02:00
Martin Kroeker c1643006ae
Merge pull request #97 from xianyi/develop
rebase
2020-10-13 09:01:49 +02:00
Martin Kroeker 8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 2020-10-13 00:14:29 +02:00
Martin Kroeker 08929430cd
Merge pull request #2886 from martin-frbg/issue_2767
Rename "HALF" precision functions (sh prefix) to "BFLOAT16" with "sb" prefix
2020-10-13 00:04:35 +02:00
Martin Kroeker 0c84ffe05f
Merge pull request #2881 from mattip/fninit
add fninit to reset fpu registers before assembler routines
2020-10-12 23:50:41 +02:00
Martin Kroeker cb4274e3ad
Merge pull request #2888 from Qiyu8/usimd-sum
Optimize the performance of sum by using universal intrinsics
2020-10-12 23:22:08 +02:00
Matti Picus 403eb513a0 use emms instead, add WIN guards 2020-10-12 18:15:01 +03:00
Martin Kroeker cb839575ed
Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme 2020-10-12 14:44:33 +02:00
Qiyu8 0ed1f07660 Optimize the performance of sum by using universal intrinsics 2020-10-12 19:48:53 +08:00
Martin Kroeker bb74dd29db
Restore -msse3 2020-10-12 00:42:05 +02:00
Martin Kroeker 629c497b6c
common_sh.h renamed to common_sb.h 2020-10-12 00:27:11 +02:00
Martin Kroeker 2c552f1074
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:11:31 +02:00
Martin Kroeker 7ae9e8960e
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:08:29 +02:00
Martin Kroeker e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:07:37 +02:00
Martin Kroeker 006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:06:06 +02:00
Martin Kroeker 85154c2e18
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:05:05 +02:00
Martin Kroeker ae1ab5bfdf
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:03:21 +02:00
Martin Kroeker 052f31bc3c
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:02:16 +02:00
Martin Kroeker 3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-12 00:00:55 +02:00
Martin Kroeker 756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:56:17 +02:00
Martin Kroeker 2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:54:53 +02:00
Martin Kroeker dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:53:50 +02:00
Martin Kroeker 32733ded04
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:52:45 +02:00
Martin Kroeker 3bc8e8c334
Rename "HALF" and "sh" to "BFLOAT16"and "sb" 2020-10-11 23:51:34 +02:00
Martin Kroeker 573508f0ee
Rename common_sh.h to common_sb.h 2020-10-11 23:50:54 +02:00
Martin Kroeker ca31c32693
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:49:22 +02:00
Martin Kroeker 5800758b43
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:44:38 +02:00
Martin Kroeker 924fd806d0
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 2020-10-11 23:43:36 +02:00
Martin Kroeker 4db09c6cec
Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c 2020-10-11 23:42:45 +02:00