Commit Graph

  • 0ce2aa3163
    Fix data type of rwork array Martin Kroeker 2020-09-02 23:41:51 +0200
  • 80794fe8fd
    Create KERNEL.SILICON Martin Kroeker 2020-09-02 22:56:58 +0200
  • 4a4d1ca6e0
    Add AppleSIlicon cpu Martin Kroeker 2020-09-02 22:52:12 +0200
  • b37d17382a
    Add Apple Silicon Martin Kroeker 2020-09-02 22:48:49 +0200
  • 029fd01cfb
    Detect AppleSilicon cpu on OSX Martin Kroeker 2020-09-02 22:47:38 +0200
  • 9d1ea75aa0
    Merge pull request #80 from xianyi/develop Martin Kroeker 2020-09-02 22:16:41 +0200
  • 776d005f4c
    Merge pull request #2815 from mhillenibm/clang_s390x Martin Kroeker 2020-09-02 16:56:01 +0200
  • 2ee5b899ce s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang Marius Hillenbrand 2020-09-01 16:16:53 +0200
  • 095f4e6964 s390x: allow clang to emit fused multiply-adds (replicates gcc's default behavior) Marius Hillenbrand 2020-09-01 15:09:32 +0200
  • 87e5bbd887 s390x: avoid variable-length arrays in struct for asm operands Marius Hillenbrand 2020-09-01 12:08:05 +0200
  • b9b3265ec8 s390x: avoid inline assembly for vector loads for clang Marius Hillenbrand 2020-09-01 12:04:28 +0200
  • a1616a0b86 s390x: replace nop with "nop 0" in inline assembly Marius Hillenbrand 2020-09-01 11:58:48 +0200
  • 60ef193258 s390x: use "lghi" for immediate values to fix build with clang Marius Hillenbrand 2020-09-01 13:59:06 +0200
  • 18bfb6d6f7
    Merge pull request #2813 from martin-frbg/issue2804-2 Martin Kroeker 2020-09-01 23:39:46 +0200
  • e4900caa11
    Fix c_check misinterpreting arm64 in uname output to mean armv7 Martin Kroeker 2020-09-01 19:54:08 +0200
  • be9a20fb6f
    Accept uname output of arm64 as such Martin Kroeker 2020-09-01 17:45:41 +0200
  • 68b1713c30
    Merge pull request #2811 from martin-frbg/issue2806 Martin Kroeker 2020-09-01 17:19:14 +0200
  • 4074770d00
    Merge pull request #2797 from martin-frbg/relafixes1 Martin Kroeker 2020-09-01 16:04:03 +0200
  • 88bf71d02e
    Fix accidental deletion of Cooperlake entries with the preceding commit Martin Kroeker 2020-09-01 14:15:20 +0200
  • a76e56f912
    Report NO_AVX512 being set (as it is already done for NO_AVX, NO_AVX2) Martin Kroeker 2020-09-01 13:34:59 +0200
  • 47e75f0ac6
    Allow overriding the AVX512 check with a NO_AVX512 define Martin Kroeker 2020-09-01 12:09:25 +0200
  • b87a77da02
    Merge pull request #79 from xianyi/develop Martin Kroeker 2020-09-01 12:03:53 +0200
  • f42e84d46c
    Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ (#2808) Martin Kroeker 2020-09-01 10:44:48 +0200
  • 0a4c5c4c44
    Merge pull request #2807 from martin-frbg/issue2804 Martin Kroeker 2020-08-31 23:44:56 +0200
  • a5f7626bd3
    missing comma Martin Kroeker 2020-08-31 23:29:20 +0200
  • 3aaa6a47b7
    fix argument lists of LAPACK_?ggsvp prototypes Martin Kroeker 2020-08-31 23:18:04 +0200
  • 16b805b25b
    Update lapack.h Martin Kroeker 2020-08-31 22:53:41 +0200
  • deb119992f
    Update lapack.h Martin Kroeker 2020-08-31 22:38:15 +0200
  • defb15e71b
    Update lapack.h Martin Kroeker 2020-08-31 21:36:21 +0200
  • cd4fbf1244
    Need to drop the LAPACKE matrix_layout parameter for LAPACK ?ggsvp as well Martin Kroeker 2020-08-31 21:18:45 +0200
  • 1a05523709
    Fix misnaming of LAPACK_?ggsvp function prototypes as LAPACKE_ Martin Kroeker 2020-08-31 20:08:35 +0200
  • 3210a42734
    Report cpu as ARMV8 instead of just giving up on non-Linux hosts Martin Kroeker 2020-08-31 20:03:21 +0200
  • 5feb087c05
    Handle Apple labeling armv8 as arm64 rather than aarch64 Martin Kroeker 2020-08-31 20:02:08 +0200
  • 448152cdd8 define __AVX2__ to ensure the haswell code compiled with avx2 Gengxin Xie 2020-08-31 14:39:08 +0800
  • cb3c190a3a Implementaion of dasum, sasum with AVX2 & AVX512 intrinsic Gengxin Xie 2020-08-21 14:44:36 +0800
  • 59e01b1aec
    Merge pull request #2799 from RajalakshmiSR/p10_ger Martin Kroeker 2020-08-28 22:52:11 +0200
  • 317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels Rajalakshmi Srinivasaraghavan 2020-08-28 10:42:54 -0500
  • 4130d1732e Refs #2587 fix small matrix c/zgemm bug. Xianyi Zhang 2020-08-28 22:36:36 +0800
  • 255b6dd0fa Merge branch 'develop' into small_matrices Xianyi Zhang 2020-08-28 21:38:58 +0800
  • 741d6c5cb8 Refs #2587 Add small matrix optimization reference kernel for c/zgemm. Xianyi Zhang 2020-08-28 21:00:54 +0800
  • 514a3d7d63
    Merge pull request #2798 from kadler/aix-cpuid Martin Kroeker 2020-08-28 08:30:59 +0200
  • 085aae8bdb
    Fix compile error on AIX cpuid detection Kevin Adler 2020-08-27 23:08:33 -0500
  • 712ca43069 Change a1b0 gemm to b0 gemm. Xianyi Zhang 2020-08-28 07:55:27 +0800
  • de63675717
    Add early returns and fix sign errors in workspace calculations Martin Kroeker 2020-08-27 11:25:18 +0200
  • d64cc2be81
    Add early returns Martin Kroeker 2020-08-27 11:22:50 +0200
  • c9b67141f0
    Add early returns Martin Kroeker 2020-08-27 11:20:31 +0200
  • 6797a3a1e0
    Add early returns Martin Kroeker 2020-08-27 11:15:12 +0200
  • 936966a42c
    Make ILAENV and xGETRF2 functions available Martin Kroeker 2020-08-27 10:59:08 +0200
  • 5c6c2cd4f6
    Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify Martin Kroeker 2020-08-24 20:18:09 +0200
  • e54be4ba1c
    Merge pull request #2792 from pkubaj/patch-1 Martin Kroeker 2020-08-24 08:03:39 +0200
  • 48a1364e10
    Add aliases for armv6, armv7 pkubaj 2020-08-23 18:50:19 +0000
  • 0c1c903f1e Fix OMP num specify issue Chen, Guobing 2020-08-12 03:28:25 +0800
  • a073fa870e
    Merge pull request #2791 from martin-frbg/issue2787 Martin Kroeker 2020-08-23 19:33:03 +0200
  • b2053239fc
    Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function Martin Kroeker 2020-08-23 15:08:16 +0200
  • b11bb6e728
    Merge pull request #2790 from martin-frbg/issue2789 Martin Kroeker 2020-08-23 14:42:35 +0200
  • 1840bc5b52
    Add OpenMP dependency to pkgconfig file if needed Martin Kroeker 2020-08-22 13:55:18 +0200
  • 7c0977c267
    Add OpenMP dependency to pkgconfig file if needed Martin Kroeker 2020-08-22 13:53:44 +0200
  • fb3d80c42a
    Merge pull request #78 from xianyi/develop Martin Kroeker 2020-08-22 13:52:29 +0200
  • 9ee21a0a39
    Merge pull request #2780 from Guobing-Chen/CPL_build_support Martin Kroeker 2020-08-20 19:54:29 +0200
  • 35557ec926
    Add R benchmarks at higher core counts Martin Kroeker 2020-08-20 16:42:27 +0200
  • bd3207b4b4
    Update system.cmake Martin Kroeker 2020-08-19 22:51:10 +0200
  • b8ebfc9335
    Update system.cmake Martin Kroeker 2020-08-19 22:30:19 +0200
  • 7c1986640b
    fallback from cooperlake to skylake if gcc<10 Martin Kroeker 2020-08-19 20:48:39 +0200
  • 71d33c952d
    Typo fix Martin Kroeker 2020-08-19 17:44:23 +0200
  • 6a3c074786
    -march=cooperlake requires gcc10 Martin Kroeker 2020-08-19 17:22:12 +0200
  • 430f741b30
    -march=cooperlake requires gcc10 Martin Kroeker 2020-08-19 17:17:53 +0200
  • 6f4dc7445d
    Fix typo Martin Kroeker 2020-08-19 16:36:55 +0200
  • 81fbe8d088
    -march=cooperlake only available in gcc >= 10 Martin Kroeker 2020-08-19 16:10:15 +0200
  • bb9cf766f5
    make march=cooperlake option conditional on gcc >= 10.1 Martin Kroeker 2020-08-19 15:06:30 +0200
  • 75eeb265d7
    [WIP] Refactor the driver code for direct SGEMM (#2782) Martin Kroeker 2020-08-19 14:51:09 +0200
  • 2c72972570
    Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config Martin Kroeker 2020-08-19 14:42:58 +0200
  • 416ee26020
    revert the unrelated drone.io CI config change Martin Kroeker 2020-08-18 16:17:19 +0200
  • a7fc14c501
    Limit direct sgemm to x86_64 Martin Kroeker 2020-08-18 14:13:15 +0200
  • b86214e434
    Limit direct sgemm to x86_64 Martin Kroeker 2020-08-18 14:12:19 +0200
  • 1ba18212da
    Update common_s.h Martin Kroeker 2020-08-18 09:36:59 +0200
  • 6b731d917f Do not require pkg-config to generate the *.pc file Albert Ziegenhagel 2020-08-18 08:48:48 +0200
  • 5a0e9e8ded
    Update setparam-ref.c Martin Kroeker 2020-08-17 22:38:02 +0200
  • e46d761bca
    Update setparam-ref.c Martin Kroeker 2020-08-17 22:16:20 +0200
  • 6c279ef552
    Update setparam-ref.c Martin Kroeker 2020-08-17 21:55:54 +0200
  • 7996458ea1
    Update common_s.h Martin Kroeker 2020-08-17 20:06:59 +0200
  • 5dcf47cd97
    Merge pull request #2784 from martin-frbg/issue2783 Martin Kroeker 2020-08-17 19:06:13 +0200
  • 7fe38daee5
    use macros for sgemm_direct to support dynamic_arch naming via common_s,h Martin Kroeker 2020-08-17 18:56:05 +0200
  • af80849063
    Add sgemm_direct Martin Kroeker 2020-08-17 18:54:28 +0200
  • aa286e301b
    Add typedef for bfloat16 if needed Martin Kroeker 2020-08-17 15:32:14 +0200
  • 9f0ef9cdfc
    Merge pull request #77 from xianyi/develop Martin Kroeker 2020-08-17 15:28:15 +0200
  • 6bfc66663c
    revert Martin Kroeker 2020-08-17 15:20:41 +0200
  • a8c6fb9e1c
    revert Martin Kroeker 2020-08-17 15:20:16 +0200
  • 5ec8f716cf
    revert Martin Kroeker 2020-08-17 15:19:40 +0200
  • 54e02aaf11
    Update gemm.c Martin Kroeker 2020-08-16 20:45:20 +0200
  • a83cb3966d
    Refactor sgemm_direct Martin Kroeker 2020-08-16 19:01:43 +0200
  • 5a74bd45fd
    remove include as sgemm_direct is handled at the makefile level now Martin Kroeker 2020-08-16 09:20:44 +0200
  • 56d4d4f84b
    Move sgemm_direct_performant helper to separate file Martin Kroeker 2020-08-16 09:19:34 +0200
  • 2586b26e29
    Add direct_sgemm Martin Kroeker 2020-08-16 09:16:52 +0200
  • 86e3455d02
    Add sgemm_direct targets Martin Kroeker 2020-08-16 09:15:56 +0200
  • 774029af38
    move sgemm_direct function declarations Martin Kroeker 2020-08-16 09:13:39 +0200
  • 82f8a0aeba
    Update .drone.yml Martin Kroeker 2020-08-15 15:46:18 +0200
  • d57d503c15
    Update Makefile Martin Kroeker 2020-08-15 14:46:26 +0200
  • 37ac23e8a3
    Add simple MT sgemm precision test and INTERFACE64 build Martin Kroeker 2020-08-15 13:38:05 +0200
  • 6a93e3b2ba
    Add simple sgemm preicsion test Martin Kroeker 2020-08-15 13:33:52 +0200
  • 47ce1dd08f
    Update gemm64.cpp Martin Kroeker 2020-08-15 13:31:28 +0200