Commit Graph

  • d85b24e103 Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:29:45 +02:00
  • 7d6c85f9da Add compiler option -mmma for POWER10 Martin Kroeker 2020-10-18 19:27:51 +02:00
  • 2e7ee7c716 Fix naming of L2 cache size item reported for Vortex Martin Kroeker 2020-10-18 19:22:05 +02:00
  • efd47b0104 Merge pull request #2909 from isuruf/patch-1 Martin Kroeker 2020-10-18 19:16:08 +02:00
  • f5902ab0a1 Support cross-compiling for Apple Vortex Martin Kroeker 2020-10-18 19:10:58 +02:00
  • 1a0c185122 Support cross-compiling for Apple Vortex Martin Kroeker 2020-10-18 18:54:54 +02:00
  • 89eea6b455 Merge pull request #102 from xianyi/develop Martin Kroeker 2020-10-18 18:49:59 +02:00
  • a5c667b55c Need a space when redirecting to file Isuru Fernando 2020-10-18 09:40:31 -05:00
  • 0ac6102708 Update version string to 0.3.11.dev Martin Kroeker 2020-10-17 22:40:47 +02:00
  • 26a701f4ad Update version string to 0.3.11.dev Martin Kroeker 2020-10-17 22:40:06 +02:00
  • fcd0fa1a3a Merge pull request #2908 from xianyi/release-0.3.0 Martin Kroeker 2020-10-17 22:38:58 +02:00
  • 51c22612eb Merge pull request #2907 from xianyi/develop v0.3.11 Martin Kroeker 2020-10-17 22:14:12 +02:00
  • b8f689200e Update version number to 0.3.11 Martin Kroeker 2020-10-17 22:11:34 +02:00
  • fe9015b619 Update version for 0.3.11 release Martin Kroeker 2020-10-17 22:10:50 +02:00
  • f99b8c1502 Merge pull request #2906 from martin-frbg/changelog-0311 Martin Kroeker 2020-10-17 22:07:14 +02:00
  • 5381a18056 Update Changelog.txt with the 0.3.11 changes Martin Kroeker 2020-10-17 22:05:36 +02:00
  • e35576c6fc Merge pull request #2905 from martin-frbg/aocc-clang Martin Kroeker 2020-10-17 09:45:22 +02:00
  • f1bb85d378 Add AVX flags for clang/aocc as well Martin Kroeker 2020-10-16 20:52:15 +02:00
  • 25907e672b Merge pull request #101 from xianyi/develop Martin Kroeker 2020-10-16 20:48:58 +02:00
  • d7ba7679b6 Merge branch 'develop' into risc-v Zhang Xianyi 2020-10-16 23:27:38 +08:00
  • 9789375389 Merge pull request #2900 from martin-frbg/fixcmake_sse Martin Kroeker 2020-10-16 16:17:36 +02:00
  • f64243ff57 Add compiler options for sse/sse2/ssse3/sse4.1 Martin Kroeker 2020-10-16 10:47:06 +02:00
  • 786c0a3ce8 Add sse options for use of intrinics with older compilers Martin Kroeker 2020-10-16 10:41:53 +02:00
  • df70667043 fix core list for sse/sse2 Martin Kroeker 2020-10-16 09:55:48 +02:00
  • e6c5b13a18 Merge pull request #2898 from martin-frbg/morefixes Martin Kroeker 2020-10-16 07:26:39 +02:00
  • f071d1207a add sse2 Martin Kroeker 2020-10-15 22:10:32 +02:00
  • dc6cefd2f5 Expressly enable -msse for 32bit DYNAMIC_ARCH kernels Martin Kroeker 2020-10-15 20:16:15 +02:00
  • c339c40c01 Silence a redefinition warning Martin Kroeker 2020-10-15 19:08:12 +02:00
  • ac8af9cec6 Add -msse where supported, apparently required for older gcc Martin Kroeker 2020-10-15 19:06:45 +02:00
  • 10379fc83b Use ifdef instead of if Martin Kroeker 2020-10-15 19:05:37 +02:00
  • a85ac71633 Merge pull request #100 from xianyi/develop Martin Kroeker 2020-10-15 18:54:20 +02:00
  • 4c25910da0 Merge pull request #2896 from martin-frbg/intrin-double Martin Kroeker 2020-10-15 11:12:35 +02:00
  • ef8e7d0279 Add the support for RISC-V Vector. damonyu 2020-10-15 16:05:37 +08:00
  • 9b9ee92d5f Merge pull request #2897 from Qiyu8/usimd-double Martin Kroeker 2020-10-15 08:38:24 +02:00
  • ae6ac83991 Revert "add double precision SSE" Martin Kroeker 2020-10-15 08:37:02 +02:00
  • 4fac91ef37 adapt arm platform Qiyu8 2020-10-15 11:08:10 +08:00
  • bfdf4b56da Add double precision universal intrinsics for X86/ARM Qiyu8 2020-10-15 10:29:42 +08:00
  • ebf0470fc2 add sse4.1 for DYNAMIC_ARCH kernels Martin Kroeker 2020-10-14 20:34:33 +02:00
  • ca160bb440 Add -msse4.1 when SSE4.1 is supported Martin Kroeker 2020-10-14 19:18:07 +02:00
  • c9c3ae07af Add double precision operations Martin Kroeker 2020-10-14 18:10:45 +02:00
  • a897bc3bd2 Merge pull request #99 from xianyi/develop Martin Kroeker 2020-10-14 18:09:20 +02:00
  • 756802df61 Merge pull request #2890 from martin-frbg/s-d-sum Martin Kroeker 2020-10-14 09:02:03 +02:00
  • 01492decf4 Merge pull request #2895 from martin-frbg/sb-tests Martin Kroeker 2020-10-14 09:01:16 +02:00
  • bd0752444a Merge pull request #2894 from RajalakshmiSR/bf16_packing Martin Kroeker 2020-10-14 08:12:08 +02:00
  • c1f4f5d4e7 Replace Makefile with simplified version again Martin Kroeker 2020-10-14 01:08:50 +02:00
  • 75e3a92df6 Add express -mavx and -msse options (and fix a stray = for cooperlake) Martin Kroeker 2020-10-14 01:01:58 +02:00
  • 2a329baa81 Add the BFLOAT16 functions to cmake builds Martin Kroeker 2020-10-13 23:21:38 +02:00
  • 0826d68f93 POWER10: Change the packing format for bfloat16 Rajalakshmi Srinivasaraghavan 2020-10-13 16:05:10 -05:00
  • 4bb73c0171 Rename "HALF" type to "BFLOAT16" Martin Kroeker 2020-10-13 20:07:19 +02:00
  • bc5c7f9578 Cleanup Martin Kroeker 2020-10-13 19:56:09 +02:00
  • 437b7fe261 sh prefix renamed to sb Martin Kroeker 2020-10-13 19:55:14 +02:00
  • a0ada4bcb8 Merge pull request #98 from xianyi/develop Martin Kroeker 2020-10-13 18:50:30 +02:00
  • 602a0c7a69 Merge pull request #2892 from RajalakshmiSR/bf16_make Martin Kroeker 2020-10-13 18:48:37 +02:00
  • b5d30b390d Fix build issues with bfloat16 Rajalakshmi Srinivasaraghavan 2020-10-13 11:00:22 -05:00
  • 137ae618db Fix typo Martin Kroeker 2020-10-13 15:02:17 +02:00
  • 9e3cff5cf2 Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well Martin Kroeker 2020-10-13 14:41:25 +02:00
  • d85b968424 Merge pull request #2891 from martin-frbg/fix-2886 Martin Kroeker 2020-10-13 13:46:17 +02:00
  • 5f60a32cac Add -mssse3 if supported by the hardware Martin Kroeker 2020-10-13 11:57:04 +02:00
  • fecedc9c69 Add -mssse3 Martin Kroeker 2020-10-13 11:55:41 +02:00
  • 0eacbca85f Add Haswell and Zen to temporary sse3 whitelist Martin Kroeker 2020-10-13 11:42:39 +02:00
  • 6999086a2b whitelist SANDYBRIDGE for SSE3 Martin Kroeker 2020-10-13 10:32:19 +02:00
  • 9dca578c79 Cleanup Martin Kroeker 2020-10-13 10:14:08 +02:00
  • 1e7eb7b7a9 Fix typos in currently unused sections Martin Kroeker 2020-10-13 09:17:15 +02:00
  • 84949754a0 Fix bfloat16 conditional Martin Kroeker 2020-10-13 09:11:36 +02:00
  • 2ae8785603 Add a POWER9 build with BFLOAT16 enabled Martin Kroeker 2020-10-13 09:07:50 +02:00
  • e05af6575e Fix some overlooked "SHBLAS" entries Martin Kroeker 2020-10-13 09:05:04 +02:00
  • c1643006ae Merge pull request #97 from xianyi/develop Martin Kroeker 2020-10-13 09:01:49 +02:00
  • 8d2df7d066 Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM Martin Kroeker 2020-10-13 00:14:29 +02:00
  • 08929430cd Merge pull request #2886 from martin-frbg/issue_2767 Martin Kroeker 2020-10-13 00:04:35 +02:00
  • 0c84ffe05f Merge pull request #2881 from mattip/fninit Martin Kroeker 2020-10-12 23:50:41 +02:00
  • cb4274e3ad Merge pull request #2888 from Qiyu8/usimd-sum Martin Kroeker 2020-10-12 23:22:08 +02:00
  • 403eb513a0 use emms instead, add WIN guards Matti Picus 2020-10-12 18:15:01 +03:00
  • cb839575ed Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme Martin Kroeker 2020-10-12 14:44:33 +02:00
  • 0ed1f07660 Optimize the performance of sum by using universal intrinsics Qiyu8 2020-10-12 19:48:53 +08:00
  • bb74dd29db Restore -msse3 Martin Kroeker 2020-10-12 00:42:05 +02:00
  • 629c497b6c common_sh.h renamed to common_sb.h Martin Kroeker 2020-10-12 00:27:11 +02:00
  • 2c552f1074 Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:11:31 +02:00
  • 7ae9e8960e Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:08:29 +02:00
  • e3a29f6b58 Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:07:37 +02:00
  • 006c7f6671 Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:06:06 +02:00
  • 85154c2e18 Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:05:05 +02:00
  • ae1ab5bfdf Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:03:21 +02:00
  • 052f31bc3c Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:02:16 +02:00
  • 3aecafad80 Change "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-12 00:00:55 +02:00
  • 756062afa5 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:56:17 +02:00
  • 2061f7fdff Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:54:53 +02:00
  • dc8a1afa63 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:53:50 +02:00
  • 32733ded04 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:52:45 +02:00
  • 3bc8e8c334 Rename "HALF" and "sh" to "BFLOAT16"and "sb" Martin Kroeker 2020-10-11 23:51:34 +02:00
  • 573508f0ee Rename common_sh.h to common_sb.h Martin Kroeker 2020-10-11 23:50:54 +02:00
  • ca31c32693 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:49:22 +02:00
  • 5800758b43 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:44:38 +02:00
  • 924fd806d0 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:43:36 +02:00
  • 4db09c6cec Rename compare_sgemm_shgemm.c to compare_sgemm_sbgemm.c Martin Kroeker 2020-10-11 23:42:45 +02:00
  • fd94236042 Rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:42:07 +02:00
  • 68ce719fac Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c Martin Kroeker 2020-10-11 23:41:13 +02:00
  • d7dd9b396c Rename shdot.c to sbdot.c Martin Kroeker 2020-10-11 23:40:43 +02:00
  • 9ae80490e0 rename "HALF" and "sh" to "BFLOAT16" and "sb" Martin Kroeker 2020-10-11 23:39:42 +02:00
  • d314d1f49f Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c Martin Kroeker 2020-10-11 23:37:38 +02:00
  • f0883740e4 Merge pull request #96 from xianyi/develop Martin Kroeker 2020-10-11 23:34:36 +02:00