Commit Graph

  • b766c1e9bb Improve the performance of zasum and casum with AVX512 intrinsic Gengxin Xie 2020-12-01 16:49:26 +0800
  • 22574b474e
    Suppress -mfma as well for gcc 4.6 Martin Kroeker 2020-11-30 21:41:51 +0100
  • f662022994
    Move the version check to avoid overwriting unprocessed compiler data Martin Kroeker 2020-11-30 17:24:27 +0100
  • 5e81e81478
    Merge pull request #3014 from RajalakshmiSR/dgemvnp10 Martin Kroeker 2020-11-30 08:18:24 +0100
  • 7d46e31de1 POWER10: Optimize dgemv_n Rajalakshmi Srinivasaraghavan 2020-11-29 15:28:28 -0600
  • 62a2eb884f
    Add SSE flags for x86 Martin Kroeker 2020-11-29 15:33:07 +0100
  • 2e99e2699b
    Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx Martin Kroeker 2020-11-29 15:32:17 +0100
  • 006b13299f
    Merge pull request #3012 from martin-frbg/restore-getarch Martin Kroeker 2020-11-29 13:27:47 +0100
  • ca17d3dc3d
    Restore RISCV entries accidentally trashed by my PR 3005 Martin Kroeker 2020-11-29 13:19:51 +0100
  • 52ed2741c5
    Merge pull request #3010 from ggouaillardet/topic/fj_compilers Martin Kroeker 2020-11-29 11:36:43 +0100
  • 3b4c016110 link math lib on FreeBSD cyy 2020-11-29 17:17:07 +0800
  • 358100ec15 add Fujitsu compilers Gilles Gouaillardet 2020-11-29 13:57:57 +0900
  • 3e6d10612e
    Do not pass -mavx for gcc 4.6 Martin Kroeker 2020-11-28 23:22:26 +0100
  • 24c52ff340
    Add -msse2 Martin Kroeker 2020-11-27 19:45:56 +0100
  • 6e84391430
    remove DYNAMIC_ARCH restriction on -msse3 Martin Kroeker 2020-11-27 13:30:38 +0100
  • 30db556daa
    export NO_AVX2 Martin Kroeker 2020-11-27 10:39:04 +0100
  • c903518e89
    Downgrade HASWELL/ZEN targets to SANDYBRIDGE if no AVX2 support Martin Kroeker 2020-11-27 10:07:53 +0100
  • ca793f6dba
    Make -mavx2 -mfma conditional on compiler support Martin Kroeker 2020-11-27 10:05:47 +0100
  • 18a5520a3e
    Add check for pre-AVX2 gcc versions on x86 Martin Kroeker 2020-11-27 10:04:45 +0100
  • 953c4ae1ac
    remove quiet to debug piledriver build failure Martin Kroeker 2020-11-23 17:07:24 +0100
  • 9fb80b9e49
    try to update the ancient binutils in Ubuntu Precise for fma support Martin Kroeker 2020-11-23 14:57:36 +0100
  • 3788b6d156
    Merge pull request #3005 from martin-frbg/ssefix Martin Kroeker 2020-11-23 08:35:32 +0100
  • bc5b1ddf0d
    Merge pull request #3004 from martin-frbg/bsd_getauxval Martin Kroeker 2020-11-23 08:35:12 +0100
  • 2f42d23104
    Merge pull request #3002 from martin-frbg/issue3000 Martin Kroeker 2020-11-22 22:51:26 +0100
  • b72dd007dc
    Merge pull request #3001 from martin-frbg/issue2996 Martin Kroeker 2020-11-22 22:50:41 +0100
  • 11ebe5fa25
    Avoid redefinition warning Martin Kroeker 2020-11-22 21:16:07 +0100
  • 01f01dae98
    Add -msse if supported Martin Kroeker 2020-11-22 21:15:08 +0100
  • e7bf8ced6c
    Build fix for systems that do not support getauxval Martin Kroeker 2020-11-22 20:20:28 +0100
  • 5df09f8452
    define inf if needed Martin Kroeker 2020-11-22 19:35:43 +0100
  • c38bb5d516
    Add utest for NRM2 behaviour with an inf value in the input Martin Kroeker 2020-11-22 19:08:40 +0100
  • 0256294921
    Fix syntax mixup Martin Kroeker 2020-11-22 17:41:44 +0100
  • 2b114c3f30
    Restore proper Makefile Martin Kroeker 2020-11-22 17:16:22 +0100
  • 60e1fddca7
    Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds Martin Kroeker 2020-11-22 16:48:22 +0100
  • ebb8788696
    Use ifneq instead of ifdef for CROSS option Martin Kroeker 2020-11-22 16:33:34 +0100
  • 857afcc41d
    Use ifeq instead of ifdef for user-definable build options Martin Kroeker 2020-11-22 16:31:44 +0100
  • 5fa305172a
    Use ifeq instead of ifdef for user-definable options Martin Kroeker 2020-11-22 16:29:56 +0100
  • d3ff1f889f
    Convert ifndefs to ifneq Martin Kroeker 2020-11-22 16:27:17 +0100
  • 65eb7afaf4
    Change ifndef CROSS to ifneq Martin Kroeker 2020-11-22 16:25:36 +0100
  • 8a6b17f97d
    Change ifndefs to ifneq Martin Kroeker 2020-11-22 16:19:31 +0100
  • 0f863f96e4
    Merge pull request #112 from xianyi/develop Martin Kroeker 2020-11-22 16:17:19 +0100
  • 437702e0e1
    Merge pull request #2965 from epsilon-0/develop Martin Kroeker 2020-11-22 12:25:33 +0100
  • f1bf040b25
    Merge pull request #2988 from xiegengxin/smp-asum Martin Kroeker 2020-11-22 12:24:13 +0100
  • 613e3b2baf
    Merge pull request #2997 from Flamefire/reproduce_crash Martin Kroeker 2020-11-22 12:22:57 +0100
  • 05a0ea2340 Merge branch 'risc-v' into develop Xianyi Zhang 2020-11-22 16:05:32 +0800
  • 7037849498 Merge branch 'develop' into risc-v Xianyi Zhang 2020-11-22 16:04:50 +0800
  • c6c9c24d1b Update doc for C910. Xianyi Zhang 2020-11-22 16:02:19 +0800
  • bed01f47c4 Cast arguments of `_mm512_abs_pd` to `__m512` Mosè Giordano 2020-11-21 15:02:59 +0000
  • 6dd71af0c3
    Merge pull request #2995 from Flamefire/fix_thread_buffer_init Martin Kroeker 2020-11-20 09:42:10 +0100
  • a05dc6e62b
    Add reproducer test for crash after fork Alexander Grund 2020-11-19 15:24:57 +0100
  • 60005eb47b
    Don't overwrite blas_thread_buffer if already set Alexander Grund 2020-11-19 14:39:00 +0100
  • 043f3d6faa POWER10: Use POWER9 as a fallback Anton Blanchard 2020-11-19 21:04:10 +1100
  • fdf71d66b3 POWER10: Fix ld version detection Anton Blanchard 2020-11-19 20:50:42 +1100
  • 8917203ebd
    Update common_thread.h Martin Kroeker 2020-11-17 20:57:16 +0100
  • 1592c1f708
    Compare environment variables for NUM_THREADS against compile-time maximum Martin Kroeker 2020-11-17 19:21:12 +0100
  • 4639c9ae4e
    Update common_thread.h Martin Kroeker 2020-11-17 18:49:59 +0100
  • 8116299631
    Handle runtime OMP thread count exceeding build-time NUM_THREADS Martin Kroeker 2020-11-17 18:18:35 +0100
  • 26ce2705f1
    reduce num_threads Martin Kroeker 2020-11-17 17:56:10 +0100
  • cfe35efbfb
    activate testcase Martin Kroeker 2020-11-17 15:43:42 +0100
  • 906b236388
    typo Martin Kroeker 2020-11-17 15:18:04 +0100
  • c8a32d0a93
    Add alternative OpenMP thread safety test from old issue 602 Martin Kroeker 2020-11-17 14:47:51 +0100
  • 1748f40cbb
    Add testcase from issue 602 Martin Kroeker 2020-11-17 14:45:20 +0100
  • e607d8de14
    Add C version of testcase from issue 602 Martin Kroeker 2020-11-17 14:43:26 +0100
  • c1f52d3589
    Add original testcase from issue 602 Martin Kroeker 2020-11-17 14:42:15 +0100
  • eead529d38
    Create test_dgemm_f90.f Martin Kroeker 2020-11-17 14:41:29 +0100
  • 4293b4b654
    Create test_dgemm_omp.c Martin Kroeker 2020-11-17 00:02:49 +0100
  • 7e9cb39a25
    Merge pull request #2981 from Qiyu8/fix-sum Martin Kroeker 2020-11-16 08:40:46 +0100
  • be075d53cf
    Merge pull request #2983 from Qiyu8/optimize-srot Martin Kroeker 2020-11-16 08:38:37 +0100
  • b00a0de132 remove the -mfma flag in when the host has AVX. Qiyu8 2020-11-16 09:14:56 +0800
  • 1425abc276
    Reduce the default BUFFERSIZE for x86_64 to its 0.3.9 value Martin Kroeker 2020-11-15 19:39:18 +0100
  • d341a0fea0
    Merge pull request #2989 from martin-frbg/cmake-fma Martin Kroeker 2020-11-13 12:35:09 +0100
  • ec4d77c47c
    Add -mfma for HAVE_FMA3 in the non-DYNAMIC_ARCH case as well Martin Kroeker 2020-11-13 09:16:34 +0100
  • 02699226d0
    Merge pull request #111 from xianyi/develop Martin Kroeker 2020-11-13 09:14:23 +0100
  • d6e7e05bb3 Improve the performance of dasum and sasum when SMP is defined Gengxin Xie 2020-11-13 14:20:52 +0800
  • ae0b1dea19 modify system.cmake to enable fma flag Qiyu8 2020-11-13 10:20:24 +0800
  • e0dac6b53b fix the CI failure of target specific option mismatch Qiyu8 2020-11-12 20:31:03 +0800
  • e5c2ceb675 fix the CI failure of lack the head Qiyu8 2020-11-12 17:35:17 +0800
  • a87e537b8c modify macro Qiyu8 2020-11-11 15:53:48 +0800
  • 5bc0a7583f only FMA3 and vector larger than 128 have positive effects. Qiyu8 2020-11-11 15:18:01 +0800
  • 8c0b206d4c Optimize the performance of rot by using universal intrinsics Qiyu8 2020-11-11 14:33:12 +0800
  • 7c71f9448f Revert "Lazyly reinit threads after a fork in OMP mode" Jonathan Ringer 2020-11-10 17:00:28 -0800
  • c4c591ac5a fix sum optimize issues Qiyu8 2020-11-10 16:16:38 +0800
  • 1ea6cfefdb Refs #2899. Merge branch 'damonyu1989-openblas-open-910' into risc-v Xianyi Zhang 2020-11-10 09:38:43 +0800
  • fc35b72ae1 Refs #2899 Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910 Xianyi Zhang 2020-11-10 09:38:04 +0800
  • 913cc9a4ca Merge branch 'develop' into risc-v Xianyi Zhang 2020-11-10 09:18:25 +0800
  • c3b0b2d59b
    Update .drone.yml Martin Kroeker 2020-11-09 22:12:20 +0100
  • 3532fbcad6
    Update .drone.yml Martin Kroeker 2020-11-09 20:16:45 +0100
  • 9f57b7b8af
    Update .drone.yml Martin Kroeker 2020-11-09 19:16:06 +0100
  • d735454a9a
    add package for add-apt-repository command Martin Kroeker 2020-11-09 18:27:19 +0100
  • d81513ab7a
    update repo address Martin Kroeker 2020-11-09 17:51:02 +0100
  • 045a349437
    add toolchain-test repo for gcc10 Martin Kroeker 2020-11-09 17:29:35 +0100
  • f491291269
    try to update the Epyc build to gcc10 Martin Kroeker 2020-11-09 17:20:42 +0100
  • ff16329cb7
    Merge pull request #2972 from xiegengxin/rot-intrinsic Martin Kroeker 2020-11-08 22:43:00 +0100
  • 433637ccd8
    Merge pull request #2980 from martin-frbg/fixgetarch Martin Kroeker 2020-11-08 17:39:05 +0100
  • ec088bf33a
    Fix missing AVX2 and FMA3 capabilities in FORCE_target mode Martin Kroeker 2020-11-08 13:15:40 +0100
  • 110c7a6de0
    Merge pull request #2979 from RajalakshmiSR/dot_power10 Martin Kroeker 2020-11-08 10:19:34 +0100
  • d2faa1be4e
    Merge pull request #2978 from martin-frbg/fixdynfeatures Martin Kroeker 2020-11-08 10:19:17 +0100
  • 1c4cfdc139
    Stay compatible with old gmake that did not support undefine Martin Kroeker 2020-11-08 00:12:55 +0100
  • f6a57d8f63
    Update Makefile.system Martin Kroeker 2020-11-08 00:01:36 +0100
  • f4b7ba12b7
    Update Makefile.system Martin Kroeker 2020-11-07 23:37:21 +0100
  • 6e364981a8 Optimize sdot/ddot for POWER10 Rajalakshmi Srinivasaraghavan 2020-11-07 15:21:58 -0600