Commit Graph

  • ed9af2f7da
    Update KERNEL.HASWELL wjc404 2019-12-27 18:01:38 +0800
  • 5fd1edead9
    Create cgemm3m_kernel_8x4_haswell.c wjc404 2019-12-27 18:00:55 +0800
  • 26478eb0d0
    Merge pull request #2345 from wjc404/develop Martin Kroeker 2019-12-25 22:26:41 +0100
  • e9ed67ed7e LAPACK: avoid out-of-bound write in ?LANTR Vladimir Chalupecky 2019-12-19 12:17:14 +0100
  • eeecd623d8
    Update cgemm_kernel_8x2_haswell.c wjc404 2019-12-24 00:40:16 +0800
  • 3ce6bcdb5f
    Update CONTRIBUTORS.md wjc404 2019-12-24 00:30:16 +0800
  • 6fbe51072b
    Update CONTRIBUTORS.md wjc404 2019-12-24 00:24:40 +0800
  • 611445c7f8
    Update param.h wjc404 2019-12-23 23:44:55 +0800
  • 2cd9306bb5
    Update KERNEL.ZEN wjc404 2019-12-23 23:42:30 +0800
  • c418c81224
    Update KERNEL.HASWELL wjc404 2019-12-23 23:41:44 +0800
  • 025741f16a
    Fast Haswell CGEMM kernel wjc404 2019-12-23 23:40:03 +0800
  • 0ae49d2990
    Merge pull request #2344 from wjc404/develop Martin Kroeker 2019-12-21 12:16:55 +0100
  • 105e26e12a
    Adjust Haswell ZGEMM blocking parameters wjc404 2019-12-21 14:38:51 +0800
  • f41d52665d
    Fast Haswell ZGEMM kernel wjc404 2019-12-21 14:37:06 +0800
  • d573d24de7
    Fast Haswell ZGEMM kernel wjc404 2019-12-21 14:35:15 +0800
  • 31d6c2eb7d
    Merge pull request #2340 from Zeyiii/develop Martin Kroeker 2019-12-20 08:38:57 +0100
  • b7cc69ee62 declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL w00421467 2019-12-20 10:11:50 +0800
  • aeef942c4f use arm neon instructions to optimize gemm beta operation w00421467 2019-12-17 10:00:13 +0800
  • 445ca2f418
    Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout Martin Kroeker 2019-12-13 14:57:26 +0100
  • 13226e3101 driver: more reasonable thread wait timeout on Windows. Jehan 2019-12-11 17:51:42 +0100
  • 1a6ea8ee6d
    Merge pull request #2338 from kavanabhat/aix_mod Martin Kroeker 2019-12-09 17:54:49 +0100
  • c6ecb195e6
    Merge pull request #2337 from martin-frbg/issue2336 Martin Kroeker 2019-12-07 09:38:06 +0100
  • b28db31429
    Support two-digit version numbers in gcc version check Martin Kroeker 2019-12-06 21:23:56 +0100
  • 6baa9b07d7 AIX changes for Power8 Kavana Bhat 2019-12-06 04:33:32 -0600
  • a4896b5538
    Update DYNAMIC_ARCH support for ARM64 and PPC (#2332) Martin Kroeker 2019-12-04 11:06:03 +0100
  • 3938e59569 AIX changes for Power8 Kavana Bhat 2019-12-04 00:23:46 -0600
  • 9d5079008f
    Merge pull request #2334 from martin-frbg/fix2228 Martin Kroeker 2019-12-03 22:23:52 +0100
  • 8be499114e
    remove spurious copypasta Martin Kroeker 2019-12-03 21:52:24 +0100
  • 8269288aac
    Add back the additions to ARM64 dynamic_core Martin Kroeker 2019-12-03 20:27:27 +0100
  • 80f219e128
    Update dynamic_arm64.c Martin Kroeker 2019-12-03 17:05:07 +0100
  • c240abaffb
    Fix typo Martin Kroeker 2019-12-03 10:07:12 +0100
  • 8ca7d5a4f7
    Add test for gcc >=9 Martin Kroeker 2019-12-03 09:41:48 +0100
  • ef752b9937
    Need at least gcc9 for tsv110 support Martin Kroeker 2019-12-03 09:41:06 +0100
  • 3518617f5b
    Add Intel Goldmont+ cpuid Martin Kroeker 2019-12-03 08:32:29 +0100
  • 715f4650d9
    Delete stray copy of dynamic.c from PR 2228 Martin Kroeker 2019-12-03 08:24:10 +0100
  • 10705183ce
    Merge pull request #20 from xianyi/develop Martin Kroeker 2019-12-03 08:22:40 +0100
  • 26799ccbf7
    Fix typos Martin Kroeker 2019-12-03 08:18:14 +0100
  • f2d787429c
    Update zgemm3m_kernel_4x4_haswell.c wjc404 2019-12-03 13:51:27 +0800
  • 4275c7df31
    Update cgemm3m_kernel_8x4_haswell.c wjc404 2019-12-03 13:49:41 +0800
  • 2458f9ec3a
    update Haswell GEMM3M parameters wjc404 2019-12-03 13:41:34 +0800
  • fa93ec9ad4
    AVX2 CGEMM3M & ZGEMM3M kernels wjc404 2019-12-03 13:40:14 +0800
  • ba3eba1804
    Add prototypes Martin Kroeker 2019-12-02 23:02:01 +0100
  • 84695e63c8
    Update list of ARM64 targets for DYNAMIC_ARCH and add PPC targets Martin Kroeker 2019-12-02 20:23:55 +0100
  • fab6361ba1
    Update cpu list Martin Kroeker 2019-12-02 20:22:36 +0100
  • 4432f96feb
    Update DYNAMIC_ARCH list of ARM64 targets Martin Kroeker 2019-12-02 20:21:13 +0100
  • c49a0740be
    Update zgemm3m_kernel_8x4_skylakex.c wjc404 2019-12-02 16:29:15 +0800
  • 7c52e0a567
    update avx512 zgemm3m kernel wjc404 2019-12-02 16:01:35 +0800
  • 87773b9be8
    AVX512 ZGEMM3M kernel wjc404 2019-12-02 15:56:34 +0800
  • 685fb38ba2
    adjust some thresholds to improve performance wjc404 2019-12-02 15:52:40 +0800
  • b1934ace2d
    adjust avx512 zgemm3m parameters wjc404 2019-12-02 15:50:08 +0800
  • 235599f17a
    Merge pull request #2329 from isuruf/patch-1 Martin Kroeker 2019-12-02 08:30:43 +0100
  • b863b32ac5 Workaround an ICE in clang 9.0.0 Isuru Fernando 2019-12-01 11:55:49 -0600
  • dd04143d4a
    Merge pull request #2328 from martin-frbg/ppc9 Martin Kroeker 2019-11-30 12:23:57 +0100
  • f3a6164bff
    Merge pull request #2324 from antonblanchard/power9_segv Martin Kroeker 2019-11-30 00:03:42 +0100
  • dedd822d1a
    Fix caxpy/caxpyc naming in localentry Martin Kroeker 2019-11-29 23:56:57 +0100
  • 2181fb7047
    Fix caxpy/caxpyc naming in localentry Martin Kroeker 2019-11-29 23:54:15 +0100
  • a9b62c03f8
    Substitute precompiled gcc7 codes only when gcc is older than 9.x Martin Kroeker 2019-11-29 23:49:50 +0100
  • 97762234f9
    Add variable for gcc >=9 test Martin Kroeker 2019-11-29 23:47:23 +0100
  • 948d11fc51
    Merge pull request #19 from xianyi/develop Martin Kroeker 2019-11-29 23:44:09 +0100
  • c815b8fb85
    Merge pull request #2323 from wjc404/develop Martin Kroeker 2019-11-28 20:55:16 +0100
  • e20709e976
    Update param.h wjc404 2019-11-28 19:57:50 +0800
  • 934e601e93
    Update dgemm_kernel_4x8_skylakex_2.c wjc404 2019-11-28 19:56:35 +0800
  • a4c3668f99
    Merge pull request #2321 from martin-frbg/issue2319 Martin Kroeker 2019-11-28 09:30:24 +0100
  • 867232c6a4
    Merge pull request #2327 from martin-frbg/travisosx Martin Kroeker 2019-11-28 08:43:45 +0100
  • 5aaf70ef95
    Merge pull request #2326 from xianyi/revert-2325-travisosx Martin Kroeker 2019-11-28 00:17:19 +0100
  • ae2a0995cc
    Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now Martin Kroeker 2019-11-28 00:15:36 +0100
  • 83dae28ae2
    Revert "Cleanup Travis IOS xbuild and disable FORTRAN on 32bit and ios builds for now" revert-2325-travisosx Martin Kroeker 2019-11-28 00:09:06 +0100
  • da986d2e83
    Merge pull request #2325 from martin-frbg/travisosx Martin Kroeker 2019-11-27 21:59:36 +0100
  • 6bc487de35
    Cleanup IOS build and disable FORTRAN on 32bit and ios builds for now Martin Kroeker 2019-11-27 15:10:57 +0100
  • cf2a8e410c Fix SEGV in cdot_power9 Anton Blanchard 2019-11-26 21:55:04 -0700
  • eb1e9c8c92
    some optimizations wjc404 2019-11-26 14:12:20 +0800
  • f95989cbc1
    Fix AVX512 capability test (always returning zero) Martin Kroeker 2019-11-23 22:38:07 +0100
  • f3065a0eed
    Fix race conditions in multithreaded GEMM3M Martin Kroeker 2019-11-23 19:54:56 +0100
  • 04226f1e97
    Add the cpuid of the business/rackmount version of z15 as well Martin Kroeker 2019-11-21 18:14:29 +0100
  • 0925ef70db
    Merge pull request #2316 from sharkcz/s390x Martin Kroeker 2019-11-21 18:03:00 +0100
  • 371e6f73d4
    Merge pull request #2317 from aarnez/develop Martin Kroeker 2019-11-21 17:59:21 +0100
  • 8fd0197232 Correct Inline Assembly name mismatches Detrez 2019-11-21 14:19:26 +0100
  • d117dfd505 Change bad usage of "asum" to "sum" in ZARCH versions of ?sum Andreas Arnez 2019-09-20 18:32:47 +0200
  • 883c39773a zarch: treat z15 as z14 instead of generic Dan Horák 2019-11-21 12:49:54 +0100
  • b09b5be0a4
    Merge pull request #2315 from ewanglong/develop Martin Kroeker 2019-11-21 05:06:44 +0100
  • bfb5fbdb4d revised fix windows compatible for #2313 Wang, Long 2019-11-21 10:19:40 +0800
  • 3da6d66da9
    Merge pull request #2314 from Jehan/wip/Jehan/fix-openblas-crash Martin Kroeker 2019-11-20 16:16:35 +0100
  • 08fa83aba2
    Merge pull request #2312 from martin-frbg/power8be Martin Kroeker 2019-11-20 15:12:06 +0100
  • 63d3ee8dfc
    Merge pull request #2313 from ewanglong/develop Martin Kroeker 2019-11-20 14:49:15 +0100
  • 1191db1a49 For the sake of windows compatible, used "unsigned long long" to ensure 64-bit length Wang, Long 2019-11-20 21:30:16 +0800
  • 1f6071590d Fix usage of TerminateThread() causing critical section corruption. Jehan 2019-11-20 12:21:35 +0100
  • 0caf1434c9 Fix the integer overflow issue for large matrix size Wang, Long 2019-11-20 11:50:37 +0800
  • 73128f3883
    Merge pull request #2310 from martin-frbg/ppc440 Martin Kroeker 2019-11-17 23:19:48 +0100
  • cad0d150db
    Define alternate kernels for big-endian POWER8 Martin Kroeker 2019-11-17 23:12:10 +0100
  • eba0aeb7cd
    Fix compilation for big-endian POWER8 Martin Kroeker 2019-11-17 22:58:32 +0100
  • 0c07c356c1
    Define alternate kernels for big-endian PPC440 Martin Kroeker 2019-11-17 19:25:08 +0100
  • 82b75f97e5
    Disable the old QCDOC qalloc by default and copy utility functions from memory.c Martin Kroeker 2019-11-17 19:22:04 +0100
  • 7887c45077
    Merge pull request #17 from xianyi/develop Martin Kroeker 2019-11-17 19:09:49 +0100
  • 3e67017ac8
    Merge pull request #2309 from martin-frbg/ppc970-be Martin Kroeker 2019-11-17 18:22:24 +0100
  • b3ac6ee222
    Define alternate kernels for big-endian PPC970 Martin Kroeker 2019-11-17 15:19:39 +0100
  • 6082e556cd
    Use "generic" S/CGEMM unroll M on big-endian PPC970 Martin Kroeker 2019-11-17 15:10:26 +0100
  • 92315173d5
    Merge pull request #2308 from martin-frbg/ctestfix Martin Kroeker 2019-11-15 08:33:17 +0100
  • 351d12b94e
    Fix potential spurious failure from uninitialized variable Martin Kroeker 2019-11-15 00:20:36 +0100
  • bf73aa141b
    Fix potential spurious failure from uninitialized variable Martin Kroeker 2019-11-15 00:19:24 +0100
  • 71e96163db
    Merge pull request #2305 from wjc404/develop Martin Kroeker 2019-11-12 07:38:37 +0100