Commit Graph

  • f36862603a
    Merge pull request #3101 from jake-arkinstall/issue-3100 Martin Kroeker 2021-02-11 15:42:18 +0100
  • 47691c031f
    Use Haswell optimizations for Zen as well Martin Kroeker 2021-02-11 09:26:15 +0100
  • ce7ddd8921
    Use Haswell optimizations for Zen as well Martin Kroeker 2021-02-11 09:25:36 +0100
  • 950c047b49
    Use Haswell optimizations for Zen as well Martin Kroeker 2021-02-11 09:24:51 +0100
  • 46509953a9
    Use Haswell optimizations for Zen as well Martin Kroeker 2021-02-11 09:24:16 +0100
  • db348dcff2
    Enable optimized srot/drot kernels from Haswell Martin Kroeker 2021-02-11 09:23:05 +0100
  • a33f471065
    Merge pull request #3102 from martin-frbg/issue3099 Martin Kroeker 2021-02-11 08:56:46 +0100
  • ece3ce581e
    Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation Martin Kroeker 2021-02-10 14:22:59 +0100
  • 8189a98d85
    Merge pull request #12 from xianyi/develop Martin Kroeker 2021-02-10 14:17:24 +0100
  • d7a77091a3 Addressed issue #3100, removing an unnecessary write to the include directory Jake Arkinstall 2021-02-10 12:11:17 +0000
  • 3e1e74fca6
    Merge pull request #3094 from xoviat/patch-1 Martin Kroeker 2021-02-02 13:36:17 +0100
  • 33b5670122
    Merge pull request #3096 from martin-frbg/fixclangcmake Martin Kroeker 2021-02-02 13:33:15 +0100
  • 95e19e2e23
    fix case in compiler name check Martin Kroeker 2021-02-02 10:53:46 +0100
  • 99ac042702
    remove spurious lines (probably editor malfunction) Martin Kroeker 2021-02-01 21:02:53 +0100
  • 774b9f8653
    handle AppleClang in Cooperlake support condition Martin Kroeker 2021-02-01 20:18:53 +0100
  • eb1d2344f7
    Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion) Martin Kroeker 2021-02-01 19:45:25 +0100
  • 6fa9860dbe appveyor: cleanup and add openmp run xoviat 2021-01-30 21:28:12 -0600
  • 0cc36770f1
    Merge pull request #3073 from xoviat/embedded Martin Kroeker 2021-01-31 18:02:41 +0100
  • 558cd543bf
    Merge pull request #3093 from martin-frbg/fix3064 Martin Kroeker 2021-01-30 22:21:28 +0100
  • bd906e3410
    fix copy-paste error in build rules for cblas_crotg and cblas_zrotg Martin Kroeker 2021-01-30 16:46:25 +0100
  • 35086cb501
    Merge pull request #3092 from RajalakshmiSR/cscal_p10 Martin Kroeker 2021-01-30 16:23:37 +0100
  • 2056ffc227 Optimize cscal function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-29 13:51:43 -0600
  • 7745439312
    Merge pull request #3091 from martin-frbg/lapack477-2 Martin Kroeker 2021-01-29 13:37:23 +0100
  • c4b5abbe43
    fix data type Martin Kroeker 2021-01-29 10:45:36 +0100
  • f87842483e
    fix calculation of non-exceptional shift (from Reference-LAPACK PR 477) Martin Kroeker 2021-01-29 09:56:12 +0100
  • 3dbb32c734
    Merge pull request #11 from xianyi/develop Martin Kroeker 2021-01-29 09:52:21 +0100
  • 609ea80276 enable testing xoviat 2021-01-27 16:39:52 -0600
  • 3dfecaaf7c require nofortran to be set on msvc xoviat 2021-01-27 16:39:15 -0600
  • 3165c915b6 fix test helpers xoviat 2021-01-27 15:24:49 -0600
  • 457ccc42c9
    Merge branch 'develop' into msvc xoviat 2021-01-27 14:15:59 -0600
  • 00880c720a
    Merge pull request #3087 from martin-frbg/lapack477 Martin Kroeker 2021-01-27 19:11:55 +0100
  • 856bc36533
    Add exceptional shift to fix rare convergence problems Martin Kroeker 2021-01-27 13:41:45 +0100
  • fe71887b68
    Merge pull request #10 from xianyi/develop Martin Kroeker 2021-01-27 13:39:26 +0100
  • 10094bd885
    Merge pull request #3076 from martin-frbg/dyn-thunderx Martin Kroeker 2021-01-27 13:25:45 +0100
  • eea0c0f2ed
    Merge pull request #3085 from alexhenrie/memory_alloc Martin Kroeker 2021-01-26 20:11:42 +0100
  • 85be43e0df
    Merge pull request #3083 from martin-frbg/develop Martin Kroeker 2021-01-26 15:13:35 +0100
  • 0cb9e9fc8d
    Remove the VORTEX support bits again for now Martin Kroeker 2021-01-25 19:02:21 +0100
  • cb61d3b46b
    Add DYNAMIC_LIST support for ARM64 Martin Kroeker 2021-01-25 13:13:20 +0100
  • 113840da12 Fix null pointer check in blas_memory_alloc Alex Henrie 2021-01-24 22:20:44 -0700
  • deb2e66bcc
    Add DYNAMIC_LIST support for ARM64 Martin Kroeker 2021-01-24 23:18:52 +0100
  • 9b2d69aa80
    Add DYNAMIC_LIST option for ARM64 Martin Kroeker 2021-01-24 23:18:01 +0100
  • e3ff4cdd23
    Merge pull request #9 from xianyi/develop Martin Kroeker 2021-01-24 23:14:45 +0100
  • 0745ba43a4
    Merge pull request #3082 from RajalakshmiSR/scalp10 Martin Kroeker 2021-01-24 19:03:40 +0100
  • 3ede843d50 Optimize s/dscal function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-24 07:48:28 -0600
  • 2e8d6e8690 add functions for embedded xoviat 2021-01-23 22:12:17 -0600
  • 69a5558203
    Merge pull request #3059 from Guobing-Chen/BF16_gemm Martin Kroeker 2021-01-23 19:08:05 +0100
  • d6905403e3
    Merge pull request #3068 from alexhenrie/scan-build Martin Kroeker 2021-01-23 19:06:29 +0100
  • c56f8b3787 dedup User User-User 2021-01-22 10:03:09 +0200
  • 411926b572
    Merge pull request #3079 from RajalakshmiSR/rotp10 Martin Kroeker 2021-01-22 08:26:00 +0100
  • 439b93f6d2 Optimize s/drot function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-21 13:24:45 -0600
  • d6cf67778c
    Merge pull request #3075 from martin-frbg/issue3074 Martin Kroeker 2021-01-21 08:51:30 +0100
  • b94dab5250
    patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel Martin Kroeker 2021-01-20 21:34:36 +0100
  • 6178974cd9
    Update .drone.yml Martin Kroeker 2021-01-20 20:21:27 +0100
  • 0b9e4d1278
    Add gcc10/arm64 DYNAMIC_ARCH build Martin Kroeker 2021-01-20 18:30:05 +0100
  • 63fa3c3f8f
    Require gcc 11 for builtin_cpu_is(power10) Martin Kroeker 2021-01-20 15:41:04 +0100
  • 3612d9a57a
    Merge pull request #8 from xianyi/develop Martin Kroeker 2021-01-20 15:38:30 +0100
  • b60de4447a add cortex-m platform xoviat 2021-01-19 08:57:44 -0600
  • 16dddb760e
    Merge pull request #3070 from RajalakshmiSR/cdot Martin Kroeker 2021-01-16 15:47:34 +0100
  • eff7c9166e Optimize cdot function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-15 13:40:34 -0600
  • f1bf2603e6 Remove dead assignment to dflag in rotmg functions Alex Henrie 2021-01-14 19:40:32 -0700
  • 6f32991eae Don't define the mode variable when not needed in gemm functions Alex Henrie 2021-01-14 19:40:31 -0700
  • 202fc9e8ed Fix uninitialized argument value in dasum_k Alex Henrie 2021-01-14 19:40:31 -0700
  • e378b24487
    Merge pull request #3067 from albertziegenhagel/fix-generic-cmake Martin Kroeker 2021-01-14 21:35:19 +0100
  • 3628b22d49
    Merge pull request #3064 from martin-frbg/issue3063 Martin Kroeker 2021-01-14 16:47:59 +0100
  • af2b0d0205
    Merge pull request #3066 from martin-frbg/buffsizefix Martin Kroeker 2021-01-14 16:00:38 +0100
  • 4bf988959a
    Merge pull request #3062 from austinpagan/GemmPreferedSize3 Martin Kroeker 2021-01-14 15:59:53 +0100
  • a0e4fb3a28
    Merge pull request #3061 from martin-frbg/arm64-pgi Martin Kroeker 2021-01-14 15:59:21 +0100
  • 2c445be8ba
    Merge pull request #3051 from martin-frbg/rocketlake Martin Kroeker 2021-01-14 15:56:25 +0100
  • e3f4063683 Fix building "generic" TRMM kernel with CMake Albert Ziegenhagel 2021-01-14 10:00:49 +0100
  • 6bbe6d5b92
    Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor Martin Kroeker 2021-01-13 22:36:04 +0100
  • 89ae305e11
    Workaround for cmake having its own C_COMPILER variable Martin Kroeker 2021-01-13 12:30:26 +0100
  • da8d7f09f1
    try to work around gcc update problems Martin Kroeker 2021-01-13 09:46:53 +0100
  • 25c986db5a
    Add prototypes for CBLAS_CROTG and CBLAS_ZROTG Martin Kroeker 2021-01-13 00:30:27 +0100
  • a8f249458d
    Build CBLAS interfaces for CROTG and ZROTG as well Martin Kroeker 2021-01-13 00:29:38 +0100
  • bc5b35367f
    restore Makefile after accidental overwrite Martin Kroeker 2021-01-13 00:28:43 +0100
  • 930aff2c2e
    Build CBLAS interfaces for CROTG and ZROTG as well Martin Kroeker 2021-01-13 00:27:42 +0100
  • ac3e2a3fdd
    Add CBLAS interfaces for csrot and zdrot Martin Kroeker 2021-01-12 23:22:00 +0100
  • 9ccb12b031
    Add prototypes for cblas_csrot and cblas_zdrot Martin Kroeker 2021-01-12 23:20:07 +0100
  • e18a2c22db
    Merge pull request #3060 from martin-frbg/dyn_arm64 Martin Kroeker 2021-01-12 23:02:05 +0100
  • b716c0ef01
    Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:51:35 +0100
  • 2efa3b70dc
    Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:49:39 +0100
  • 49959d4f1c
    Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:47:15 +0100
  • 0f27a03607
    Add workaround for NVIDIA HPC mishandling of the asm DOT kernels Martin Kroeker 2021-01-12 16:39:35 +0100
  • c2a8ebfe69
    Add workaround for NVIDIA HPC mishandling of the asm DOT kernels Martin Kroeker 2021-01-12 16:38:51 +0100
  • 43aac5bacc
    Support NVIDIA HPC compiler Martin Kroeker 2021-01-12 16:36:12 +0100
  • bff2b7c94d
    Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options) Martin Kroeker 2021-01-12 16:34:18 +0100
  • 2d45a262d9
    Support compilation with nvfortran Martin Kroeker 2021-01-12 16:32:29 +0100
  • ed652d8136 Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h. Gordon Fossum 2021-01-11 21:13:53 -0500
  • 6fe0f1fab9
    Label get_cpu_ftr as volatile to keep gcc from rearranging the code Martin Kroeker 2021-01-11 19:05:29 +0100
  • f725ef29d7
    Loop the OpenMP test 20 times Martin Kroeker 2021-01-10 23:14:14 +0100
  • b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel Chen, Guobing 2021-01-11 02:15:21 +0800
  • 5bcc7bcb0b
    add include path for cblas.h Martin Kroeker 2021-01-10 18:55:14 +0100
  • 14381868b0
    Add another OpenMP test for EPYC and ARM server Martin Kroeker 2021-01-10 17:16:25 +0100
  • f3ad15df5a
    Add another OpenMP test variant Martin Kroeker 2021-01-10 17:14:25 +0100
  • 0930b2bab4
    Add another OpenMP test Martin Kroeker 2021-01-10 17:11:45 +0100
  • f88a337f93
    Create test_gemm_omp.cc Martin Kroeker 2021-01-10 17:11:00 +0100
  • 018dec8588
    Merge pull request #7 from xianyi/develop Martin Kroeker 2021-01-10 17:09:46 +0100
  • 5d6209e1f9
    Merge pull request #3055 from RajalakshmiSR/swapp10 Martin Kroeker 2021-01-09 00:11:44 +0100
  • 601b711c78 Optimize swap function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-08 08:01:36 -0600
  • 78702753f2
    Merge pull request #3053 from pkubaj/patch-1 Martin Kroeker 2021-01-02 16:14:07 +0100