Commit Graph

  • 317ff27cda POWER10: Avoid setting accumulators to zero in gemm kernels Rajalakshmi Srinivasaraghavan 2020-08-28 10:42:54 -05:00
  • 4130d1732e Refs #2587 fix small matrix c/zgemm bug. Xianyi Zhang 2020-08-28 22:36:36 +08:00
  • 255b6dd0fa Merge branch 'develop' into small_matrices Xianyi Zhang 2020-08-28 21:38:58 +08:00
  • 741d6c5cb8 Refs #2587 Add small matrix optimization reference kernel for c/zgemm. Xianyi Zhang 2020-08-28 21:00:54 +08:00
  • 514a3d7d63 Merge pull request #2798 from kadler/aix-cpuid Martin Kroeker 2020-08-28 08:30:59 +02:00
  • 085aae8bdb Fix compile error on AIX cpuid detection Kevin Adler 2020-08-27 23:08:33 -05:00
  • 712ca43069 Change a1b0 gemm to b0 gemm. Xianyi Zhang 2020-08-28 07:55:27 +08:00
  • de63675717 Add early returns and fix sign errors in workspace calculations Martin Kroeker 2020-08-27 11:25:18 +02:00
  • d64cc2be81 Add early returns Martin Kroeker 2020-08-27 11:22:50 +02:00
  • c9b67141f0 Add early returns Martin Kroeker 2020-08-27 11:20:31 +02:00
  • 6797a3a1e0 Add early returns Martin Kroeker 2020-08-27 11:15:12 +02:00
  • 936966a42c Make ILAENV and xGETRF2 functions available Martin Kroeker 2020-08-27 10:59:08 +02:00
  • 5c6c2cd4f6 Merge pull request #2775 from Guobing-Chen/Fix_OMP_threads_specify Martin Kroeker 2020-08-24 20:18:09 +02:00
  • e54be4ba1c Merge pull request #2792 from pkubaj/patch-1 Martin Kroeker 2020-08-24 08:03:39 +02:00
  • 48a1364e10 Add aliases for armv6, armv7 pkubaj 2020-08-23 18:50:19 +00:00
  • 0c1c903f1e Fix OMP num specify issue Chen, Guobing 2020-08-12 03:28:25 +08:00
  • a073fa870e Merge pull request #2791 from martin-frbg/issue2787 Martin Kroeker 2020-08-23 19:33:03 +02:00
  • b2053239fc Fix mssing dummy parameter (imag part of alpha) of zdot_thread_function Martin Kroeker 2020-08-23 15:08:16 +02:00
  • b11bb6e728 Merge pull request #2790 from martin-frbg/issue2789 Martin Kroeker 2020-08-23 14:42:35 +02:00
  • 1840bc5b52 Add OpenMP dependency to pkgconfig file if needed Martin Kroeker 2020-08-22 13:55:18 +02:00
  • 7c0977c267 Add OpenMP dependency to pkgconfig file if needed Martin Kroeker 2020-08-22 13:53:44 +02:00
  • fb3d80c42a Merge pull request #78 from xianyi/develop Martin Kroeker 2020-08-22 13:52:29 +02:00
  • 9ee21a0a39 Merge pull request #2780 from Guobing-Chen/CPL_build_support Martin Kroeker 2020-08-20 19:54:29 +02:00
  • bd3207b4b4 Update system.cmake Martin Kroeker 2020-08-19 22:51:10 +02:00
  • b8ebfc9335 Update system.cmake Martin Kroeker 2020-08-19 22:30:19 +02:00
  • 7c1986640b fallback from cooperlake to skylake if gcc<10 Martin Kroeker 2020-08-19 20:48:39 +02:00
  • 71d33c952d Typo fix Martin Kroeker 2020-08-19 17:44:23 +02:00
  • 6a3c074786 -march=cooperlake requires gcc10 Martin Kroeker 2020-08-19 17:22:12 +02:00
  • 430f741b30 -march=cooperlake requires gcc10 Martin Kroeker 2020-08-19 17:17:53 +02:00
  • 6f4dc7445d Fix typo Martin Kroeker 2020-08-19 16:36:55 +02:00
  • 81fbe8d088 -march=cooperlake only available in gcc >= 10 Martin Kroeker 2020-08-19 16:10:15 +02:00
  • bb9cf766f5 make march=cooperlake option conditional on gcc >= 10.1 Martin Kroeker 2020-08-19 15:06:30 +02:00
  • 75eeb265d7 [WIP] Refactor the driver code for direct SGEMM (#2782) Martin Kroeker 2020-08-19 14:51:09 +02:00
  • 2c72972570 Merge pull request #2785 from albertziegenhagel/always-generate-pkg-config Martin Kroeker 2020-08-19 14:42:58 +02:00
  • 6b731d917f Do not require pkg-config to generate the *.pc file Albert Ziegenhagel 2020-08-18 08:48:48 +02:00
  • 5dcf47cd97 Merge pull request #2784 from martin-frbg/issue2783 Martin Kroeker 2020-08-17 19:06:13 +02:00
  • aa286e301b Add typedef for bfloat16 if needed Martin Kroeker 2020-08-17 15:32:14 +02:00
  • 9f0ef9cdfc Merge pull request #77 from xianyi/develop Martin Kroeker 2020-08-17 15:28:15 +02:00
  • 6bfc66663c revert Martin Kroeker 2020-08-17 15:20:41 +02:00
  • a8c6fb9e1c revert Martin Kroeker 2020-08-17 15:20:16 +02:00
  • 5ec8f716cf revert Martin Kroeker 2020-08-17 15:19:40 +02:00
  • 82f8a0aeba Update .drone.yml Martin Kroeker 2020-08-15 15:46:18 +02:00
  • d57d503c15 Update Makefile Martin Kroeker 2020-08-15 14:46:26 +02:00
  • 37ac23e8a3 Add simple MT sgemm precision test and INTERFACE64 build Martin Kroeker 2020-08-15 13:38:05 +02:00
  • 6a93e3b2ba Add simple sgemm preicsion test Martin Kroeker 2020-08-15 13:33:52 +02:00
  • 47ce1dd08f Update gemm64.cpp Martin Kroeker 2020-08-15 13:31:28 +02:00
  • f5fcc5baec Add trivial gemm test for multithread consistency Martin Kroeker 2020-08-15 13:30:29 +02:00
  • 597010a968 Fix incorrect argument to SLASET Martin Kroeker 2020-08-14 00:41:56 +02:00
  • d64f1ef26b Fix incorrect argument to SLASET Martin Kroeker 2020-08-14 00:40:24 +02:00
  • c62aad62e5 Fix incorrect calls to DLASET Martin Kroeker 2020-08-14 00:35:45 +02:00
  • e740c4873d Enable COOPERLAKE build target Chen, Guobing 2020-08-13 06:17:34 +08:00
  • efdd237a91 Add a dedicated POWER9 build to the Travis CI (#2774) Martin Kroeker 2020-08-12 23:08:38 +02:00
  • 4573cb2f43 Merge pull request #2765 from martin-frbg/issue2760 Martin Kroeker 2020-08-11 22:40:17 +02:00
  • 2a4bb797db Merge pull request #2773 from martin-frbg/issue2770 Martin Kroeker 2020-08-11 21:02:55 +02:00
  • cbbe38bb88 Merge pull request #2772 from mhillenibm/s390x_gemm_tuning Martin Kroeker 2020-08-11 18:14:09 +02:00
  • 619343278d Fix mishandling of NO_CBLAS=0 and NO_LAPACKE=0 Martin Kroeker 2020-08-11 13:40:40 +02:00
  • fee361ae64 fix another source of NO_CBLAS=0 surprise Martin Kroeker 2020-08-11 13:27:19 +02:00
  • 62f4c84f27 Merge pull request #76 from xianyi/develop Martin Kroeker 2020-08-11 13:25:12 +02:00
  • e115c97e05 s390x/SGEMM: adjust default P and Q to multiples of M Marius Hillenbrand 2020-08-11 12:55:59 +02:00
  • 07c334e7be s390x: Factor out small block sizes for SGEMM/DGEMM on z14 Marius Hillenbrand 2020-08-11 12:55:53 +02:00
  • e2828e30aa s390x: Optimize SGEMM/DGEMM blocks for z14 with explicit loop unrolling/interleaving Marius Hillenbrand 2020-08-11 12:55:42 +02:00
  • 7219c9cb87 Merge pull request #2764 from martin-frbg/lapacktests Martin Kroeker 2020-08-10 13:27:51 +02:00
  • c9d32674ea Add memory barrier to the blas_lock implementation for Linux Martin Kroeker 2020-08-09 19:17:04 +02:00
  • 64259d521a Fix use of unallocated array in workspace query and wrong type of argument to xSCAL Martin Kroeker 2020-08-09 13:02:27 +02:00
  • 6f5ca44c1a Expand TAU array as SGEMQR/DGEMQR read elements 2 and 3 Martin Kroeker 2020-08-09 12:59:20 +02:00
  • d28b3f2776 Create Jenkinsfile for OSUOSL PowerCI Martin Kroeker 2020-08-08 18:05:20 +02:00
  • ba3f7b3acf Merge pull request #2761 from RajalakshmiSR/Makefile_err Martin Kroeker 2020-08-08 12:20:04 +02:00
  • 475b5c95b9 Remove extra symbol in Makefile Rajalakshmi Srinivasaraghavan 2020-08-07 15:27:44 -05:00
  • cd60080d4a Merge pull request #2758 from martin-frbg/undef_shift Martin Kroeker 2020-08-03 23:30:26 +02:00
  • 4847bfdddd Merge pull request #2757 from martin-frbg/cmake64 Martin Kroeker 2020-08-02 23:05:21 +02:00
  • 81dcfdcf39 Multiply by 2 instead of left-shifting a potentially negative number Martin Kroeker 2020-08-02 18:29:56 +02:00
  • 0ef4b3f1f2 Multiply instead of doing a left shift of a potentially negative number Martin Kroeker 2020-08-02 18:27:40 +02:00
  • aa53a8a5cb Multiply by two instead of left-shifting one place Martin Kroeker 2020-08-02 18:25:09 +02:00
  • aa3a1e7d8c Multiply by two rather than left shift by one place Martin Kroeker 2020-08-02 18:22:31 +02:00
  • aaf1a17168 Apply current library name suffix Martin Kroeker 2020-08-02 17:58:33 +02:00
  • 53add6a80d Apply library name suffix to openblas if any Martin Kroeker 2020-08-02 17:57:12 +02:00
  • 9eb897cc01 Merge pull request #75 from xianyi/develop Martin Kroeker 2020-08-02 17:50:06 +02:00
  • 7cead56258 Merge pull request #2753 from martin-frbg/issue2751 Martin Kroeker 2020-08-02 15:32:46 +02:00
  • 6794ac3415 Add SYMBOLPREFIX and/or -SUFFIX to cblas.h if needed Martin Kroeker 2020-08-02 11:20:08 +02:00
  • ecf4b9e0fc Improve substitution rules for SYMBOLPREFIX and -SUFFIX addition Martin Kroeker 2020-08-01 17:06:03 +02:00
  • dfe5d09641 Merge pull request #2756 from martin-frbg/issue2755 Martin Kroeker 2020-08-01 15:19:02 +02:00
  • 60cd5e55fc Protect against inadvertent activation of USE_CUDA Martin Kroeker 2020-08-01 12:31:39 +02:00
  • da9e2a7ada Add SYMBOLPREFIX and/or SYMBOLSUFFIX to cblas prototypes Martin Kroeker 2020-07-31 16:03:33 +02:00
  • c88cbc5e0d Merge pull request #2752 from kadler/cpuid_aix Martin Kroeker 2020-07-31 12:52:24 +02:00
  • 589c74aed3 Use systemcfg APIs for CPU detection on AIX Kevin Adler 2020-07-30 20:52:16 -05:00
  • 104aa678b0 Fix inadvertent version number reversal to 0.3.9.dev caused by #2710 Martin Kroeker 2020-07-30 11:40:52 +02:00
  • c6b48e0394 Merge pull request #2749 from martin-frbg/make_ppc Martin Kroeker 2020-07-30 11:35:53 +02:00
  • 4927251298 Merge pull request #2750 from RajalakshmiSR/dgemv_p10 Martin Kroeker 2020-07-30 10:13:19 +02:00
  • f77b6a83f4 dgemv optimization for POWER10 Rajalakshmi Srinivasaraghavan 2020-07-29 18:59:32 -05:00
  • 39724e8128 Separate OpenMP handling and allow compilation of Power9 code with older gcc Martin Kroeker 2020-07-30 01:14:08 +02:00
  • 525db5401c Merge pull request #74 from xianyi/develop Martin Kroeker 2020-07-30 01:04:09 +02:00
  • cb097beba2 Merge pull request #2741 from martin-frbg/issue2739 Martin Kroeker 2020-07-29 10:01:14 +02:00
  • 7c02f4b1f7 Merge pull request #2744 from martin-frbg/issue2738 Martin Kroeker 2020-07-28 19:32:04 +02:00
  • 383262035d Merge pull request #2740 from RajalakshmiSR/clang-power Martin Kroeker 2020-07-28 18:15:25 +02:00
  • 5fa581c87e Put hint to use git develop rather than master branch in README Martin Kroeker 2020-07-28 14:22:41 +00:00
  • 12918358aa Add AMD Renoir/Matisse and preliminary support for Zen3 as Zen2 Martin Kroeker 2020-07-28 13:53:17 +00:00
  • 200f5c44cc Add AMD Renoir models and preliminary support for ZEN3 as ZEN2 Martin Kroeker 2020-07-28 13:45:23 +00:00
  • 64e2e4aaf3 missing braces Martin Kroeker 2020-07-27 20:19:22 +00:00
  • 921ec4e9e2 Adjust A53 SGEMM parameters to reflect move to 8x8 kernel Martin Kroeker 2020-07-27 19:54:46 +00:00
  • d557584b71 Fix compilation issues with clang on POWER Rajalakshmi Srinivasaraghavan 2020-07-27 14:11:07 -05:00