Commit Graph

  • bb8c3f6861
    Add ld/binutils version check for POWER10 support Martin Kroeker 2020-10-20 01:04:20 +0200
  • ff65952e46
    Move HAVE_P10_SUPPORT to the build system Martin Kroeker 2020-10-20 00:55:41 +0200
  • 6208c9899e
    Merge pull request #104 from xianyi/develop Martin Kroeker 2020-10-20 00:52:08 +0200
  • 8e20ab21c8
    Merge pull request #2924 from martin-frbg/issue2920 Martin Kroeker 2020-10-19 23:33:45 +0200
  • dc6e44c3f8
    Merge pull request #2916 from martin-frbg/issue2911 Martin Kroeker 2020-10-19 23:33:31 +0200
  • 4ad33c46b0
    Add back symbols that got dropped when splitting by type Martin Kroeker 2020-10-19 20:37:52 +0200
  • fe2a922ada
    Add POWER10 compiler options to CCOMMON_OPT rather than COMMON_OPT Martin Kroeker 2020-10-19 17:43:53 +0200
  • 9cac379655
    Merge pull request #103 from xianyi/develop Martin Kroeker 2020-10-19 15:56:20 +0200
  • a61c086408
    Fix spurious trailing whitespace in comment Martin Kroeker 2020-10-19 09:12:12 +0200
  • 5b9ebe4f8a
    Merge pull request #2919 from isuruf/export Martin Kroeker 2020-10-19 08:14:27 +0200
  • 7eddaf0d6f
    Remove -mmma again (reduntant with cpu=power10) and add override statements Martin Kroeker 2020-10-19 08:11:22 +0200
  • 14b1d33933 Fix exporting some lapack and cblas Isuru Fernando 2020-10-18 21:42:32 -0500
  • 77669b019d
    Merge pull request #2915 from bartoldeman/no-empty_sgemm_direct_skylakex Martin Kroeker 2020-10-19 00:09:54 +0200
  • 5e8ddc9001
    Merge pull request #2913 from martin-frbg/issue2910 Martin Kroeker 2020-10-18 23:04:56 +0200
  • 03e781b766 sgemm_direct_skylakex: fix 75eeb26 regression. Bart Oldeman 2020-10-18 19:50:38 +0000
  • f1a4071d8c
    Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:41:43 +0200
  • 97cf10062f
    Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:39:18 +0200
  • 17e288e18d
    Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:37:04 +0200
  • c1422f3e46
    Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:31:01 +0200
  • d85b24e103
    Clean up STACKSIZE redefinition Martin Kroeker 2020-10-18 19:29:45 +0200
  • 7d6c85f9da
    Add compiler option -mmma for POWER10 Martin Kroeker 2020-10-18 19:27:51 +0200
  • 2e7ee7c716
    Fix naming of L2 cache size item reported for Vortex Martin Kroeker 2020-10-18 19:22:05 +0200
  • efd47b0104
    Merge pull request #2909 from isuruf/patch-1 Martin Kroeker 2020-10-18 19:16:08 +0200
  • f5902ab0a1
    Support cross-compiling for Apple Vortex Martin Kroeker 2020-10-18 19:10:58 +0200
  • bf1f1c66b4 VORTEX Isuru Fernando 2020-10-18 12:08:35 -0500
  • 1a0c185122
    Support cross-compiling for Apple Vortex Martin Kroeker 2020-10-18 18:54:54 +0200
  • 89eea6b455
    Merge pull request #102 from xianyi/develop Martin Kroeker 2020-10-18 18:49:59 +0200
  • a5c667b55c
    Need a space when redirecting to file Isuru Fernando 2020-10-18 09:40:31 -0500
  • 0ac6102708
    Update version string to 0.3.11.dev Martin Kroeker 2020-10-17 22:40:47 +0200
  • 26a701f4ad
    Update version string to 0.3.11.dev Martin Kroeker 2020-10-17 22:40:06 +0200
  • fcd0fa1a3a
    Merge pull request #2908 from xianyi/release-0.3.0 Martin Kroeker 2020-10-17 22:38:58 +0200
  • 51c22612eb
    Merge pull request #2907 from xianyi/develop v0.3.11 Martin Kroeker 2020-10-17 22:14:12 +0200
  • b8f689200e
    Update version number to 0.3.11 Martin Kroeker 2020-10-17 22:11:34 +0200
  • fe9015b619
    Update version for 0.3.11 release Martin Kroeker 2020-10-17 22:10:50 +0200
  • f99b8c1502
    Merge pull request #2906 from martin-frbg/changelog-0311 Martin Kroeker 2020-10-17 22:07:14 +0200
  • 5381a18056
    Update Changelog.txt with the 0.3.11 changes Martin Kroeker 2020-10-17 22:05:36 +0200
  • e35576c6fc
    Merge pull request #2905 from martin-frbg/aocc-clang Martin Kroeker 2020-10-17 09:45:22 +0200
  • f1bb85d378
    Add AVX flags for clang/aocc as well Martin Kroeker 2020-10-16 20:52:15 +0200
  • 25907e672b
    Merge pull request #101 from xianyi/develop Martin Kroeker 2020-10-16 20:48:58 +0200
  • d7ba7679b6 Merge branch 'develop' into risc-v Zhang Xianyi 2020-10-16 23:27:38 +0800
  • 9789375389
    Merge pull request #2900 from martin-frbg/fixcmake_sse Martin Kroeker 2020-10-16 16:17:36 +0200
  • 0eda7ac2ce Merge 'origin/release-0.3.0' into develop to get the 0.3.10 tag mattip 2020-10-16 13:15:43 +0300
  • f64243ff57
    Add compiler options for sse/sse2/ssse3/sse4.1 Martin Kroeker 2020-10-16 10:47:06 +0200
  • 786c0a3ce8
    Add sse options for use of intrinics with older compilers Martin Kroeker 2020-10-16 10:41:53 +0200
  • df70667043
    fix core list for sse/sse2 Martin Kroeker 2020-10-16 09:55:48 +0200
  • e6c5b13a18
    Merge pull request #2898 from martin-frbg/morefixes Martin Kroeker 2020-10-16 07:26:39 +0200
  • f071d1207a
    add sse2 Martin Kroeker 2020-10-15 22:10:32 +0200
  • dc6cefd2f5
    Expressly enable -msse for 32bit DYNAMIC_ARCH kernels Martin Kroeker 2020-10-15 20:16:15 +0200
  • c339c40c01
    Silence a redefinition warning Martin Kroeker 2020-10-15 19:08:12 +0200
  • ac8af9cec6
    Add -msse where supported, apparently required for older gcc Martin Kroeker 2020-10-15 19:06:45 +0200
  • 10379fc83b
    Use ifdef instead of if Martin Kroeker 2020-10-15 19:05:37 +0200
  • a85ac71633
    Merge pull request #100 from xianyi/develop Martin Kroeker 2020-10-15 18:54:20 +0200
  • 4c25910da0
    Merge pull request #2896 from martin-frbg/intrin-double Martin Kroeker 2020-10-15 11:12:35 +0200
  • ef8e7d0279 Add the support for RISC-V Vector. damonyu 2020-10-15 16:05:37 +0800
  • 9b9ee92d5f
    Merge pull request #2897 from Qiyu8/usimd-double Martin Kroeker 2020-10-15 08:38:24 +0200
  • ae6ac83991
    Revert "add double precision SSE" Martin Kroeker 2020-10-15 08:37:02 +0200
  • 4fac91ef37 adapt arm platform Qiyu8 2020-10-15 11:08:10 +0800
  • bfdf4b56da Add double precision universal intrinsics for X86/ARM Qiyu8 2020-10-15 10:29:42 +0800
  • ebf0470fc2
    add sse4.1 for DYNAMIC_ARCH kernels Martin Kroeker 2020-10-14 20:34:33 +0200
  • ca160bb440
    Add -msse4.1 when SSE4.1 is supported Martin Kroeker 2020-10-14 19:18:07 +0200
  • c9c3ae07af
    Add double precision operations Martin Kroeker 2020-10-14 18:10:45 +0200
  • a897bc3bd2
    Merge pull request #99 from xianyi/develop Martin Kroeker 2020-10-14 18:09:20 +0200
  • 756802df61
    Merge pull request #2890 from martin-frbg/s-d-sum Martin Kroeker 2020-10-14 09:02:03 +0200
  • 01492decf4
    Merge pull request #2895 from martin-frbg/sb-tests Martin Kroeker 2020-10-14 09:01:16 +0200
  • bd0752444a
    Merge pull request #2894 from RajalakshmiSR/bf16_packing Martin Kroeker 2020-10-14 08:12:08 +0200
  • c1f4f5d4e7
    Replace Makefile with simplified version again Martin Kroeker 2020-10-14 01:08:50 +0200
  • 75e3a92df6
    Add express -mavx and -msse options (and fix a stray = for cooperlake) Martin Kroeker 2020-10-14 01:01:58 +0200
  • 2a329baa81
    Add the BFLOAT16 functions to cmake builds Martin Kroeker 2020-10-13 23:21:38 +0200
  • 0826d68f93 POWER10: Change the packing format for bfloat16 Rajalakshmi Srinivasaraghavan 2020-10-13 16:05:10 -0500
  • 4bb73c0171
    Rename "HALF" type to "BFLOAT16" Martin Kroeker 2020-10-13 20:07:19 +0200
  • bc5c7f9578
    Cleanup Martin Kroeker 2020-10-13 19:56:09 +0200
  • 437b7fe261
    sh prefix renamed to sb Martin Kroeker 2020-10-13 19:55:14 +0200
  • a0ada4bcb8
    Merge pull request #98 from xianyi/develop Martin Kroeker 2020-10-13 18:50:30 +0200
  • 602a0c7a69
    Merge pull request #2892 from RajalakshmiSR/bf16_make Martin Kroeker 2020-10-13 18:48:37 +0200
  • b5d30b390d Fix build issues with bfloat16 Rajalakshmi Srinivasaraghavan 2020-10-13 11:00:22 -0500
  • 137ae618db
    Fix typo Martin Kroeker 2020-10-13 15:02:17 +0200
  • 9e3cff5cf2
    Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well Martin Kroeker 2020-10-13 14:41:25 +0200
  • d85b968424
    Merge pull request #2891 from martin-frbg/fix-2886 Martin Kroeker 2020-10-13 13:46:17 +0200
  • 5f60a32cac
    Add -mssse3 if supported by the hardware Martin Kroeker 2020-10-13 11:57:04 +0200
  • fecedc9c69
    Add -mssse3 Martin Kroeker 2020-10-13 11:55:41 +0200
  • 0eacbca85f
    Add Haswell and Zen to temporary sse3 whitelist Martin Kroeker 2020-10-13 11:42:39 +0200
  • 6999086a2b
    whitelist SANDYBRIDGE for SSE3 Martin Kroeker 2020-10-13 10:32:19 +0200
  • 9dca578c79
    Cleanup Martin Kroeker 2020-10-13 10:14:08 +0200
  • 1e7eb7b7a9
    Fix typos in currently unused sections Martin Kroeker 2020-10-13 09:17:15 +0200
  • 84949754a0
    Fix bfloat16 conditional Martin Kroeker 2020-10-13 09:11:36 +0200
  • 2ae8785603
    Add a POWER9 build with BFLOAT16 enabled Martin Kroeker 2020-10-13 09:07:50 +0200
  • e05af6575e
    Fix some overlooked "SHBLAS" entries Martin Kroeker 2020-10-13 09:05:04 +0200
  • c1643006ae
    Merge pull request #97 from xianyi/develop Martin Kroeker 2020-10-13 09:01:49 +0200
  • 8d2df7d066
    Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM Martin Kroeker 2020-10-13 00:14:29 +0200
  • 08929430cd
    Merge pull request #2886 from martin-frbg/issue_2767 Martin Kroeker 2020-10-13 00:04:35 +0200
  • 0c84ffe05f
    Merge pull request #2881 from mattip/fninit Martin Kroeker 2020-10-12 23:50:41 +0200
  • 36bd6ba6c7
    Use the new universal intrinsics for s/dSUM across all platforms, and generic C c/zSUM on Windows Martin Kroeker 2020-10-12 23:45:49 +0200
  • cb4274e3ad
    Merge pull request #2888 from Qiyu8/usimd-sum Martin Kroeker 2020-10-12 23:22:08 +0200
  • fac9afe645
    Reset the FPU stack on Windows to work around a bug in Windows10.19041 Martin Kroeker 2020-10-12 19:04:01 +0200
  • 403eb513a0 use emms instead, add WIN guards Matti Picus 2020-10-12 18:15:01 +0300
  • cb839575ed
    Convert the prototypes of the unimplemented BFLOAT16 functions to the new naming scheme Martin Kroeker 2020-10-12 14:44:33 +0200
  • 0ed1f07660 Optimize the performance of sum by using universal intrinsics Qiyu8 2020-10-12 19:48:53 +0800
  • 600054b0ac
    Use generic kernels for xSUM on Windows Martin Kroeker 2020-10-12 08:24:51 +0200
  • bb74dd29db
    Restore -msse3 Martin Kroeker 2020-10-12 00:42:05 +0200
  • 629c497b6c
    common_sh.h renamed to common_sb.h Martin Kroeker 2020-10-12 00:27:11 +0200