Commit Graph

  • 819e852ae7
    AVX512 CGEMM & ZGEMM kernels wjc404 2019-11-11 20:04:52 +0800
  • 4e466d739c
    Merge pull request #15 from xianyi/develop Martin Kroeker 2019-11-09 18:52:08 +0100
  • 4c6a457358
    Merge pull request #2300 from wjc404/develop Martin Kroeker 2019-11-06 07:27:33 +0100
  • 836c414e22
    optimizations of software prefetching wjc404 2019-11-05 13:36:56 +0800
  • d403eb3c2f
    Merge pull request #2302 from martin-frbg/ppc970 Martin Kroeker 2019-11-04 22:55:05 +0100
  • 3cd97f1a80
    Merge pull request #2301 from martin-frbg/ppc8be Martin Kroeker 2019-11-04 22:54:28 +0100
  • 9955f0996f
    Merge pull request #2294 from martin-frbg/ios-cleanup Martin Kroeker 2019-11-04 22:53:58 +0100
  • 430c11e135
    Add files via upload wjc404 2019-11-04 20:10:12 +0800
  • fbacd2605d
    optimizations via software prefetches wjc404 2019-11-04 19:37:19 +0800
  • 6fa89b06a1
    Use the two-operand form of DCBT on all PPC970 regardless of OS Martin Kroeker 2019-11-03 22:55:31 +0100
  • 68597002ea
    The assembly microkernel is not safe to use on ELFv1 Martin Kroeker 2019-11-03 22:42:46 +0100
  • d2a6285549
    The assembly microkernel is not safe to use on ELFv1 Martin Kroeker 2019-11-03 22:41:19 +0100
  • d999688d1a
    The assembly microkernel is not safe to use on ELFv1 Martin Kroeker 2019-11-03 22:39:06 +0100
  • 928fe1b28e
    The assembly microkernel is not safe to use on ELFv1 Martin Kroeker 2019-11-03 22:37:27 +0100
  • ccc28c6d60
    Merge pull request #13 from xianyi/develop Martin Kroeker 2019-11-03 22:33:31 +0100
  • ae43b75a6a
    Add files via upload wjc404 2019-11-02 10:09:19 +0800
  • 54fc06fd70
    Add files via upload wjc404 2019-11-02 10:06:13 +0800
  • 1df9a2013d
    new sgemm kernel for skylakex wjc404 2019-11-02 00:00:48 +0800
  • 274ff5cdb8
    update sgemm_q on skylakex cpus wjc404 2019-11-01 23:59:18 +0800
  • eb2eddf241
    Merge pull request #2296 from kdunee/develop Martin Kroeker 2019-10-28 13:24:18 +0100
  • 8691825944 Fixed a minor cmake problem, occuring when DYNAMIC_CORE=ON and CMAKE_C_FLAGS was empty k.dunikowski 2019-10-28 08:51:05 +0100
  • 7dc8a76f60
    Merge pull request #2293 from martin-frbg/pr2288 Martin Kroeker 2019-10-25 23:46:39 +0200
  • df857551c0
    Remove special parameter set for obsolete IOS/ARMV8 workaround Martin Kroeker 2019-10-25 23:07:00 +0200
  • 85ccdce8c4
    Remove the IOS fallbacks to generic C kernels Martin Kroeker 2019-10-25 23:02:37 +0200
  • aeabe0a83f
    Fix regex to parse -R options with and without whitespace Martin Kroeker 2019-10-25 22:52:30 +0200
  • 1b90989662
    Add NetBSD to the xBSD conditionals Martin Kroeker 2019-10-25 12:52:49 +0200
  • e3e8b5cdca
    Add NetBSD Martin Kroeker 2019-10-25 12:51:06 +0200
  • 69b16a894d
    Merge pull request #2292 from martin-frbg/g95fixes Martin Kroeker 2019-10-25 10:35:17 +0200
  • 6782e5767d
    Merge pull request #2291 from martin-frbg/gensymbol Martin Kroeker 2019-10-25 10:34:50 +0200
  • 48f5a89f92
    Merge pull request #2282 from martin-frbg/issue2281 Martin Kroeker 2019-10-25 09:56:30 +0200
  • 4ae1610f37
    Merge pull request #2290 from martin-frbg/cpuidfixes Martin Kroeker 2019-10-24 22:52:15 +0200
  • 911c3e2f4b
    Improve support for g95 and non-GNU ld Martin Kroeker 2019-10-24 22:43:27 +0200
  • fab49e49e5
    Move most lapack 3.7/3.8 additions to the embedded_underscores list Martin Kroeker 2019-10-24 21:26:20 +0200
  • b687fba5bc
    Disable direct clock register access on IOS and Android Martin Kroeker 2019-10-24 21:18:17 +0200
  • 46a8c2519a Remove prototype of unused, unimplemented function (#2274) luzpaz 2019-10-24 12:56:53 -0400
  • e9437eebd2
    Restore Goldmont ID and improve QEMU support Martin Kroeker 2019-10-24 18:45:27 +0200
  • 3a39062cfc
    Merge pull request #12 from xianyi/develop Martin Kroeker 2019-10-24 18:40:13 +0200
  • 0394e1195e
    NetBSD fix gufe44 2019-10-22 08:44:39 +0200
  • eaa0be1313
    Merge pull request #2286 from wjc404/develop Martin Kroeker 2019-10-20 12:44:19 +0200
  • 6ff013bae0
    native support for icopy_4 wjc404 2019-10-19 03:54:44 +0800
  • 0d669e04bb
    Update dgemm_kernel_8x8_skylakex.c wjc404 2019-10-18 15:00:17 +0800
  • 17cdd9f9e1
    some correction wjc404 2019-10-18 14:58:07 +0800
  • 6bcb06fcb1
    make further changes to icopy_8 easier wjc404 2019-10-18 10:47:31 +0800
  • b7315f8401
    Add files via upload wjc404 2019-10-16 19:23:36 +0800
  • 9b19e9e1b0
    Update dgemm_kernel_8x8_skylakex.c wjc404 2019-10-16 10:14:51 +0800
  • 6bd67ddbab
    Update dgemm_kernel_8x8_skylakex.c wjc404 2019-10-16 03:20:08 +0800
  • 5da9484d93
    Add files via upload wjc404 2019-10-16 02:01:13 +0800
  • 844629af57
    Add files via upload wjc404 2019-10-16 02:00:34 +0800
  • 467c555344 Remove beta-thread function per request luz.paz 2019-10-11 08:04:03 -0400
  • 2beaa82c05
    Merge pull request #2283 from martin-frbg/issue2176 Martin Kroeker 2019-10-09 22:06:09 +0200
  • e8a2aed2b9
    Support QEMU cpu calling itself 64bit AMD Athlon as well Martin Kroeker 2019-10-09 18:24:13 +0200
  • f262031685
    Support QEMU virtual cpu as CORE2 Martin Kroeker 2019-10-08 22:30:02 +0200
  • 5f6206fa2d
    Simplify OSX/IOS cross-compilation and add a CI test for it (#2279) Martin Kroeker 2019-10-08 20:13:14 +0200
  • f2cde2ccfb
    Update common_arm64.h Martin Kroeker 2019-10-08 20:12:08 +0200
  • ba7838d2e1
    Merge pull request #2280 from martin-frbg/iosfix Martin Kroeker 2019-10-08 10:25:25 +0200
  • a448884a63
    Remove automatic label postfixes from macro included only once Martin Kroeker 2019-10-08 08:37:50 +0200
  • 17609f88f1
    Merge pull request #11 from xianyi/develop Martin Kroeker 2019-10-08 08:32:52 +0200
  • 3a2df19db6
    Fix accidental duplication of jump instruction Martin Kroeker 2019-10-08 08:09:26 +0200
  • 12856eb8ef
    Fix PROLOGUE for OSX/IOS Martin Kroeker 2019-10-07 23:03:50 +0200
  • b6af6a5a5a
    Handle platforms that lack hwcap.h by falling back to ARMV8 Martin Kroeker 2019-10-07 20:34:28 +0200
  • 40eb3c22bf
    Update .travis.yml Martin Kroeker 2019-10-07 18:26:56 +0200
  • 6eac491783
    Update .travis.yml Martin Kroeker 2019-10-07 16:43:32 +0200
  • a1eb21fbb6
    Fix indentation Martin Kroeker 2019-10-07 15:22:40 +0200
  • 38ad1e4db8
    Add OSX/IOS cross-compilation test to Travis CI Martin Kroeker 2019-10-07 13:57:01 +0200
  • 76c2bf6c8a
    Add automatic fixups for OSX/IOS cross-compilation Martin Kroeker 2019-10-07 13:54:47 +0200
  • 5b65adc5ff
    Merge pull request #10 from xianyi/develop Martin Kroeker 2019-10-07 13:52:34 +0200
  • d2093a40d3
    Merge pull request #2277 from martin-frbg/issue2275 Martin Kroeker 2019-10-06 23:01:54 +0200
  • aa04b0925e
    Merge pull request #2276 from xianyi/revert-2272-thread-sqrt-of-negative Martin Kroeker 2019-10-06 11:12:44 +0200
  • 258ac56e0a
    Move 32bit OSX build back to xcode 8.3 but switch to gcc8 Martin Kroeker 2019-10-05 10:52:47 +0200
  • 56837e9d92
    Make local labels in macro compatible with the xcode assembler Martin Kroeker 2019-10-04 14:53:23 +0200
  • bb5413863f
    Rewrite ARM64 PROLOGUE to make it compatible with xcode/ios Martin Kroeker 2019-10-04 14:50:03 +0200
  • 32f5907fef
    Update 32bit macOS again to xcode 9.3 Martin Kroeker 2019-10-03 01:09:02 +0200
  • ac10236cc8
    Update the OSX BINARY=32 test to xcode9.2 Martin Kroeker 2019-10-02 22:35:34 +0200
  • 8617d75548
    Revert "Avoid taking root of negative number in symv_thread.c" revert-2272-thread-sqrt-of-negative Martin Kroeker 2019-10-01 23:50:41 +0200
  • 4cb4738f31 Fix source typo luz.paz 2019-09-30 09:10:19 -0400
  • ec7ab144b4 Fix various source comment typos luz.paz 2019-09-30 09:08:37 -0400
  • c07d78b9e9
    Merge pull request #2272 from seberg/thread-sqrt-of-negative Martin Kroeker 2019-09-30 11:27:29 +0200
  • 6355c25dde Avoid taking root of negative number in symv_thread.c Sebastian Berg 2019-09-29 22:03:12 -0700
  • 5e244d80f2
    Merge pull request #2271 from quickwritereader/strmm_fix Martin Kroeker 2019-09-29 13:53:45 +0200
  • ede5efebab trmm fix AbdelRauf 2019-09-29 02:27:50 +0000
  • 84908d60d2
    Merge pull request #2269 from martin-frbg/ppc-fixes Martin Kroeker 2019-09-27 09:52:19 +0200
  • 596a22325a
    Fix prologue of power9 assembly cdot(c) kernel to provide cdotc Martin Kroeker 2019-09-27 00:47:18 +0200
  • 7f58f3ad0e
    Fix mis-edits in the gcc-derived power8 caxpy kernel Martin Kroeker 2019-09-27 00:44:26 +0200
  • c0d570a357
    Merge pull request #7 from xianyi/develop Martin Kroeker 2019-09-27 00:42:32 +0200
  • 6b83079368
    Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters (#2267) Martin Kroeker 2019-09-25 23:13:24 +0200
  • aeaf7129a1
    Count cpu cores on ARMV8 and use that to pick the GEMM_PQ parameters Martin Kroeker 2019-09-25 00:08:22 +0200
  • 673e5a0495
    Replace several POWER8/9 C kernels with their gcc7-generated assembly versions (#2263) Martin Kroeker 2019-09-22 22:35:22 +0200
  • 8c6638ba7c
    Disable POWER9 with old gcc versions Martin Kroeker 2019-09-21 23:58:20 +0200
  • 9f8da7e5ef
    Use PROLOGUE macro to ensure correct function name for DYNAMIC_ARCH Martin Kroeker 2019-09-21 23:57:21 +0200
  • ba9c0ab673
    Update Makefile.system Martin Kroeker 2019-09-21 14:11:36 +0200
  • 471e57ecd3
    Exclude POWER9 from DYNAMIC_ARCH when gcc versions is lower than 6 Martin Kroeker 2019-09-21 13:32:00 +0200
  • 0ae691bd0d
    Use prebuilt assembly for POWER9 cdot Martin Kroeker 2019-09-21 10:56:28 +0200
  • 15c5a013d0
    Add gcc7-generated assembly cdot for POWER9 Martin Kroeker 2019-09-21 10:54:38 +0200
  • 4427ffe8b2
    Handle CONJ define for caxpyc Martin Kroeker 2019-09-20 21:52:45 +0200
  • 0f942a0fd6
    Handle CONJ define for caxpyc Martin Kroeker 2019-09-20 16:13:13 +0200
  • bfa2cc7d64
    Restore ppc64 CI job and remove the travis_wait that caused the problem with it Martin Kroeker 2019-09-20 10:29:35 +0200
  • 9e7e5c1185
    Add gcc7-generated assembler version of caxpy for power8 Martin Kroeker 2019-09-19 22:18:43 +0200
  • b50f50168c
    Use gcc-generated assembly instead of the original C source Martin Kroeker 2019-09-19 22:17:11 +0200
  • 97e754a826
    Use gcc-generated assembly instead of original C sources Martin Kroeker 2019-09-19 22:15:04 +0200
  • 9ec741dbb4
    Add gcc7-generated assembly files for POWER8/9 isa/ica-min/max and POWER9 caxpy Martin Kroeker 2019-09-19 22:06:59 +0200