Commit Graph

  • 7aa1ff8ff6
    Fix build on FreeBSD/powerpc64le pkubaj 2021-01-01 21:19:57 +0000
  • d6c97cf010
    Merge pull request #3052 from ashwinyes/arm64_fix_nrm2 Martin Kroeker 2021-01-01 15:51:07 +0100
  • 1b2508362b arm64: Fix nrm2 for input vectors with Inf Ashwin Sekhar T K 2021-01-01 02:09:40 -0800
  • ca3f7bad1f Enable zhbmv smp implementation. zhbmv_smp Zhang Xianyi 2020-12-31 10:05:00 +0800
  • cd898af59f
    Merge pull request #3050 from aurel32/riscv64-openblas-supported Martin Kroeker 2020-12-29 21:59:40 +0100
  • 0a535e58d8 getarch.c: define OPENBLAS_SUPPORTED for riscv64 Aurelien Jarno 2020-12-29 12:06:39 +0000
  • 9ce9e295fe
    Merge pull request #3049 from martin-frbg/readme Martin Kroeker 2020-12-27 22:54:20 +0100
  • 9a38592c79
    Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers Martin Kroeker 2020-12-27 21:55:08 +0100
  • 9b3965b08c
    Merge pull request #6 from xianyi/develop Martin Kroeker 2020-12-27 21:28:10 +0100
  • 531cb4f673
    Merge pull request #3035 from Joshua-Ashton/patch-1 Martin Kroeker 2020-12-27 21:26:52 +0100
  • 3559c5d7a2
    Merge pull request #3048 from martin-frbg/issue2998 Martin Kroeker 2020-12-21 13:30:08 +0100
  • 8631e2976a
    Temporarily revert to the old nrm2 kernels Martin Kroeker 2020-12-21 07:45:13 +0100
  • 2768bc1764
    Temporarily revert to the old nrm2 kernels Martin Kroeker 2020-12-21 07:42:51 +0100
  • 6f4698ee1f
    Temporarily revert to the old nrm2 kernel Martin Kroeker 2020-12-21 07:41:18 +0100
  • 85e5165e98
    Merge pull request #3046 from martin-frbg/nvidiasdk-ppc Martin Kroeker 2020-12-20 11:55:53 +0100
  • 17c16f2a71
    Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers Martin Kroeker 2020-12-19 23:21:22 +0100
  • 91c3f86c2b
    NVIDIA compiler does not yet support POWER10 Martin Kroeker 2020-12-19 23:19:05 +0100
  • 75b1f3becc
    Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers Martin Kroeker 2020-12-19 23:17:40 +0100
  • 07c5e549b2
    Merge pull request #3045 from martin-frbg/nvidiasdk Martin Kroeker 2020-12-19 23:14:02 +0100
  • 114eb159a4
    Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA Martin Kroeker 2020-12-19 22:15:58 +0100
  • 005cce5507
    Amend SkylakeX options to support the NVIDIA compiler Martin Kroeker 2020-12-19 22:11:49 +0100
  • b859b6e79d
    Add nvfortran Martin Kroeker 2020-12-19 22:09:57 +0100
  • b212a2fb9f
    Add/modify "PGI" compiler options for NVIDIA SDK 20.11 Martin Kroeker 2020-12-19 22:08:37 +0100
  • e40416567a
    Add version printout for PGI/NVIDIA compiler Martin Kroeker 2020-12-19 22:06:56 +0100
  • b37e5fa2f8
    Merge pull request #5 from xianyi/develop Martin Kroeker 2020-12-19 20:11:06 +0100
  • 326469ef4a
    Merge pull request #3042 from martin-frbg/develop Martin Kroeker 2020-12-19 20:04:19 +0100
  • a3cac9cca0 Update sgemm kernel 1x4 for C910. Xianyi Zhang 2020-12-18 11:53:23 +0800
  • c73d8ee40d
    Conditionally add -mfma to compiler options where needed Martin Kroeker 2020-12-17 11:34:05 +0100
  • abef2ea770
    Move -fma option setting to kernel/Makefile.L1 Martin Kroeker 2020-12-17 11:32:27 +0100
  • b26e32c3af
    Merge pull request #3040 from martin-frbg/fixfcheck Martin Kroeker 2020-12-16 00:05:04 +0100
  • 7822eff936
    Merge pull request #3038 from martin-frbg/issue3037 Martin Kroeker 2020-12-16 00:04:45 +0100
  • 865676682d
    Add Intel Rocket Lake Martin Kroeker 2020-12-14 22:40:23 +0100
  • 0f7776af0b
    Add Intel Rocket Lake Martin Kroeker 2020-12-14 22:30:36 +0100
  • b03dc011be
    Fix undefined CC variable in clang check Martin Kroeker 2020-12-14 19:21:52 +0100
  • 77460ac255 Fix gemm_batch bug for SMALL_MATRIX_OPT=1. small_matrices Zhang Xianyi 2020-12-12 18:59:07 +0800
  • 88e6806e3f Init cblas_?gemm_batch implementation. Zhang Xianyi 2020-12-12 17:05:14 +0800
  • 00ce35336e
    Fix spurious removal of a trailing character from the hostarch string on x86_64 Martin Kroeker 2020-12-13 21:28:01 +0100
  • 723776ddf7
    Merge pull request #4 from xianyi/develop Martin Kroeker 2020-12-13 21:22:41 +0100
  • 5a77ec7f1c
    Merge pull request #3036 from RajalakshmiSR/p10copyalign Martin Kroeker 2020-12-13 21:21:34 +0100
  • 2fb11f873b POWER10: Improve copy performance Rajalakshmi Srinivasaraghavan 2020-12-13 10:41:45 -0600
  • ad63647446
    Define BLAS acronym in README Joshie 2020-12-13 09:06:14 +0000
  • 87315e8a8d
    Update version to 0.3.13.dev Martin Kroeker 2020-12-12 23:28:49 +0100
  • 9031ebd7d5
    Update version to 0.3.13.dev Martin Kroeker 2020-12-12 23:28:20 +0100
  • 12b41d5598
    Merge pull request #3034 from xianyi/release-0.3.0 Martin Kroeker 2020-12-12 23:27:40 +0100
  • d2b11c4777
    Merge pull request #3033 from xianyi/develop v0.3.13 Martin Kroeker 2020-12-12 18:19:29 +0100
  • 7bc0e4a2e0
    Update version to 0.3.13 for release Martin Kroeker 2020-12-12 18:15:33 +0100
  • d3ec787f77
    Update version to 0.3.13 for release Martin Kroeker 2020-12-12 18:14:49 +0100
  • 2c309c235d
    Merge pull request #3031 from martin-frbg/changelog13 Martin Kroeker 2020-12-12 18:13:23 +0100
  • 3dec81200c
    Update Changelog.txt Martin Kroeker 2020-12-12 14:27:37 +0100
  • 737724607f
    Merge pull request #3030 from martin-frbg/fix2994 Martin Kroeker 2020-12-12 10:01:45 +0100
  • 77edf82c7f
    Update Changelog.txt for 0.3.13 Martin Kroeker 2020-12-12 01:25:20 +0100
  • 6232237dba
    Make fallback from P10 to P9 conditional on suitable compiler Martin Kroeker 2020-12-11 23:41:17 +0100
  • 7d81acc762
    Merge pull request #3 from xianyi/develop Martin Kroeker 2020-12-11 23:38:42 +0100
  • 18d8a67485
    Merge pull request #2994 from antonblanchard/power10-fixes Martin Kroeker 2020-12-11 23:37:30 +0100
  • 043128cbe5
    Merge pull request #3029 from RajalakshmiSR/axpyp10 Martin Kroeker 2020-12-10 22:49:28 +0100
  • 3331ca492d
    Merge pull request #3021 from austinpagan/trsm_p10 Martin Kroeker 2020-12-10 19:42:54 +0100
  • 346e30a46a POWER10: Improve axpy performance Rajalakshmi Srinivasaraghavan 2020-12-10 11:51:42 -0600
  • 83de62c20d
    Merge pull request #3026 from martin-frbg/revert747 Martin Kroeker 2020-12-10 16:29:41 +0100
  • 658da9a769
    Merge pull request #3027 from gxw-loongson/develop Martin Kroeker 2020-12-10 16:27:30 +0100
  • be24c66a7c Keep LOONGSON3A and LOONGSON3B for loongson gxw 2020-12-10 10:48:53 +0800
  • 4b548857d6 Add msa support for loongson gxw 2020-11-26 14:59:41 +0800
  • d71fe4ed4e
    Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) Martin Kroeker 2020-12-08 21:07:57 +0100
  • a554712439
    remove extra/intermediate size step for min_jj introduced in PR747 Martin Kroeker 2020-12-08 21:01:36 +0100
  • 5d26223f4a
    remove extra/intermediate size step of min_jj from PR747 Martin Kroeker 2020-12-08 20:59:56 +0100
  • 980ab349bc
    Merge pull request #2 from xianyi/develop Martin Kroeker 2020-12-08 20:53:35 +0100
  • d67babf345 Remove gcc unrecognized option '-msched-weight' when check msa gxw 2020-12-08 19:16:39 +0800
  • 7f11e33e8d
    Merge pull request #3025 from TiredNotTear/develop Martin Kroeker 2020-12-08 09:39:27 +0100
  • 7834c10e2f Add PingTouGe contribution credit. ck860v Xianyi Zhang 2020-12-07 16:55:05 +0800
  • 53e0837809
    Merge pull request #3022 from jinboson/develop Martin Kroeker 2020-12-07 08:09:11 +0100
  • ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization Hao Chen 2020-12-07 10:18:51 +0800
  • 47b639cc9b Fix failed sswap and dswap case by using msa optimization Hao Chen 2020-12-07 10:04:00 +0800
  • 8fef5876d1
    Merge pull request #3024 from martin-frbg/sparc Martin Kroeker 2020-12-06 22:34:36 +0100
  • 6c7d557a16
    Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio Martin Kroeker 2020-12-06 19:20:50 +0100
  • b660008c7e
    Work around DOT and SWAP test failures Martin Kroeker 2020-12-06 19:15:37 +0100
  • f8346603cf
    Fix compilation with SolarisStudio Martin Kroeker 2020-12-06 19:14:16 +0100
  • 93473174d6
    Fix utest build with SolarisStudio compilers Martin Kroeker 2020-12-06 19:12:56 +0100
  • b0b14f4e9b
    Change comments to C style for compatibility Martin Kroeker 2020-12-06 19:12:02 +0100
  • 3a1b1b7c8c
    Fix complex ABI for 32bit SolarisStudio builds Martin Kroeker 2020-12-06 19:08:43 +0100
  • da6d5d675c
    Fix hostarch detection for sparc Martin Kroeker 2020-12-06 19:07:45 +0100
  • 04fa17322c
    Fix build options for SolarisStudio compilers Martin Kroeker 2020-12-06 19:05:27 +0100
  • 3853014ea1
    Merge pull request #1 from xianyi/develop Martin Kroeker 2020-12-06 18:52:51 +0100
  • 65de6f5957 Fix test errors reported by cblas_cgemm & cblas_ctrmm Jin Bo 2020-12-05 15:06:12 +0800
  • 213c0e7abb Added special unrolled vectorized versions of "Solve" for specific sizes, in DTRSM and STRSM, to improve performance in Power9 and Power10. Gordon Fossum 2020-12-04 17:07:06 -0600
  • f21618684b
    Merge pull request #3018 from martin-frbg/issue3015 Martin Kroeker 2020-12-04 22:08:17 +0100
  • 441c08c9ff
    Merge pull request #3016 from xiegengxin/complex-asum Martin Kroeker 2020-12-04 22:07:16 +0100
  • 66302b3c06
    Merge pull request #3013 from martin-frbg/gcc46 Martin Kroeker 2020-12-04 08:54:11 +0100
  • 07e9a12349
    Merge pull request #3011 from cyyever/fix_link Martin Kroeker 2020-12-04 08:50:59 +0100
  • dd1adbdec4
    Merge pull request #3019 from RajalakshmiSR/dgemm_param Martin Kroeker 2020-12-04 08:49:28 +0100
  • a1eecccda2
    Update f_check Martin Kroeker 2020-12-03 23:43:17 +0100
  • 41fe6e864e POWER10: Update param.h Rajalakshmi Srinivasaraghavan 2020-12-03 14:40:11 -0600
  • 74b5850581
    Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds Martin Kroeker 2020-12-03 21:28:10 +0100
  • da0c94c76f
    Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds Martin Kroeker 2020-12-03 21:25:57 +0100
  • a6692dc129
    use gfortran-10 with xcode 12 Martin Kroeker 2020-12-03 14:32:21 +0100
  • 72a553f5bc
    Update .travis.yml Martin Kroeker 2020-12-03 09:17:27 +0100
  • dcbb3b5ef1
    fix misplaced lines Martin Kroeker 2020-12-02 23:13:13 +0100
  • 57456c248b
    fix gfortran requirement in osx interface64 test Martin Kroeker 2020-12-02 15:56:21 +0100
  • c361313564
    Disable deprecated 32bit xcode Martin Kroeker 2020-12-02 07:49:43 +0100
  • 0cb7a403b2 fix error declare function blas_level1_thread_with_return_value Gengxin Xie 2020-12-02 09:51:52 +0800
  • 77a538d4ba
    Update an overlooked instance of xcode 10.0 as well Martin Kroeker 2020-12-01 22:05:35 +0100
  • 9621062eba
    Update OSX xcode version to 11.5 Martin Kroeker 2020-12-01 12:23:30 +0100