Commit Graph

  • 6bbe6d5b92 Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor Martin Kroeker 2021-01-13 22:36:04 +01:00
  • 89ae305e11 Workaround for cmake having its own C_COMPILER variable Martin Kroeker 2021-01-13 12:30:26 +01:00
  • da8d7f09f1 try to work around gcc update problems Martin Kroeker 2021-01-13 09:46:53 +01:00
  • 25c986db5a Add prototypes for CBLAS_CROTG and CBLAS_ZROTG Martin Kroeker 2021-01-13 00:30:27 +01:00
  • a8f249458d Build CBLAS interfaces for CROTG and ZROTG as well Martin Kroeker 2021-01-13 00:29:38 +01:00
  • bc5b35367f restore Makefile after accidental overwrite Martin Kroeker 2021-01-13 00:28:43 +01:00
  • 930aff2c2e Build CBLAS interfaces for CROTG and ZROTG as well Martin Kroeker 2021-01-13 00:27:42 +01:00
  • ac3e2a3fdd Add CBLAS interfaces for csrot and zdrot Martin Kroeker 2021-01-12 23:22:00 +01:00
  • 9ccb12b031 Add prototypes for cblas_csrot and cblas_zdrot Martin Kroeker 2021-01-12 23:20:07 +01:00
  • e18a2c22db Merge pull request #3060 from martin-frbg/dyn_arm64 Martin Kroeker 2021-01-12 23:02:05 +01:00
  • b716c0ef01 Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:51:35 +01:00
  • 2efa3b70dc Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:49:39 +01:00
  • 49959d4f1c Add workaround for NVIDIA HPC Martin Kroeker 2021-01-12 16:47:15 +01:00
  • 0f27a03607 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels Martin Kroeker 2021-01-12 16:39:35 +01:00
  • c2a8ebfe69 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels Martin Kroeker 2021-01-12 16:38:51 +01:00
  • 43aac5bacc Support NVIDIA HPC compiler Martin Kroeker 2021-01-12 16:36:12 +01:00
  • bff2b7c94d Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options) Martin Kroeker 2021-01-12 16:34:18 +01:00
  • 2d45a262d9 Support compilation with nvfortran Martin Kroeker 2021-01-12 16:32:29 +01:00
  • ed652d8136 Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h. Gordon Fossum 2021-01-11 21:13:53 -05:00
  • 6fe0f1fab9 Label get_cpu_ftr as volatile to keep gcc from rearranging the code Martin Kroeker 2021-01-11 19:05:29 +01:00
  • b0beb0b1ca Initial code for Cooperlake BF16 GEMM kernel Chen, Guobing 2021-01-11 02:15:21 +08:00
  • 018dec8588 Merge pull request #7 from xianyi/develop Martin Kroeker 2021-01-10 17:09:46 +01:00
  • 5d6209e1f9 Merge pull request #3055 from RajalakshmiSR/swapp10 Martin Kroeker 2021-01-09 00:11:44 +01:00
  • 601b711c78 Optimize swap function for POWER10 Rajalakshmi Srinivasaraghavan 2021-01-08 08:01:36 -06:00
  • 78702753f2 Merge pull request #3053 from pkubaj/patch-1 Martin Kroeker 2021-01-02 16:14:07 +01:00
  • 7aa1ff8ff6 Fix build on FreeBSD/powerpc64le pkubaj 2021-01-01 21:19:57 +00:00
  • d6c97cf010 Merge pull request #3052 from ashwinyes/arm64_fix_nrm2 Martin Kroeker 2021-01-01 15:51:07 +01:00
  • 1b2508362b arm64: Fix nrm2 for input vectors with Inf Ashwin Sekhar T K 2021-01-01 02:09:40 -08:00
  • ca3f7bad1f Enable zhbmv smp implementation. zhbmv_smp Zhang Xianyi 2020-12-31 10:05:00 +08:00
  • cd898af59f Merge pull request #3050 from aurel32/riscv64-openblas-supported Martin Kroeker 2020-12-29 21:59:40 +01:00
  • 0a535e58d8 getarch.c: define OPENBLAS_SUPPORTED for riscv64 Aurelien Jarno 2020-12-29 12:06:39 +00:00
  • 9ce9e295fe Merge pull request #3049 from martin-frbg/readme Martin Kroeker 2020-12-27 22:54:20 +01:00
  • 9a38592c79 Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers Martin Kroeker 2020-12-27 21:55:08 +01:00
  • 9b3965b08c Merge pull request #6 from xianyi/develop Martin Kroeker 2020-12-27 21:28:10 +01:00
  • 531cb4f673 Merge pull request #3035 from Joshua-Ashton/patch-1 Martin Kroeker 2020-12-27 21:26:52 +01:00
  • 3559c5d7a2 Merge pull request #3048 from martin-frbg/issue2998 Martin Kroeker 2020-12-21 13:30:08 +01:00
  • 8631e2976a Temporarily revert to the old nrm2 kernels Martin Kroeker 2020-12-21 07:45:13 +01:00
  • 2768bc1764 Temporarily revert to the old nrm2 kernels Martin Kroeker 2020-12-21 07:42:51 +01:00
  • 6f4698ee1f Temporarily revert to the old nrm2 kernel Martin Kroeker 2020-12-21 07:41:18 +01:00
  • 85e5165e98 Merge pull request #3046 from martin-frbg/nvidiasdk-ppc Martin Kroeker 2020-12-20 11:55:53 +01:00
  • 17c16f2a71 Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers Martin Kroeker 2020-12-19 23:21:22 +01:00
  • 91c3f86c2b NVIDIA compiler does not yet support POWER10 Martin Kroeker 2020-12-19 23:19:05 +01:00
  • 75b1f3becc Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers Martin Kroeker 2020-12-19 23:17:40 +01:00
  • 07c5e549b2 Merge pull request #3045 from martin-frbg/nvidiasdk Martin Kroeker 2020-12-19 23:14:02 +01:00
  • 114eb159a4 Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA Martin Kroeker 2020-12-19 22:15:58 +01:00
  • 005cce5507 Amend SkylakeX options to support the NVIDIA compiler Martin Kroeker 2020-12-19 22:11:49 +01:00
  • b859b6e79d Add nvfortran Martin Kroeker 2020-12-19 22:09:57 +01:00
  • b212a2fb9f Add/modify "PGI" compiler options for NVIDIA SDK 20.11 Martin Kroeker 2020-12-19 22:08:37 +01:00
  • e40416567a Add version printout for PGI/NVIDIA compiler Martin Kroeker 2020-12-19 22:06:56 +01:00
  • b37e5fa2f8 Merge pull request #5 from xianyi/develop Martin Kroeker 2020-12-19 20:11:06 +01:00
  • 326469ef4a Merge pull request #3042 from martin-frbg/develop Martin Kroeker 2020-12-19 20:04:19 +01:00
  • a3cac9cca0 Update sgemm kernel 1x4 for C910. Xianyi Zhang 2020-12-18 11:53:23 +08:00
  • c73d8ee40d Conditionally add -mfma to compiler options where needed Martin Kroeker 2020-12-17 11:34:05 +01:00
  • abef2ea770 Move -fma option setting to kernel/Makefile.L1 Martin Kroeker 2020-12-17 11:32:27 +01:00
  • b26e32c3af Merge pull request #3040 from martin-frbg/fixfcheck Martin Kroeker 2020-12-16 00:05:04 +01:00
  • 7822eff936 Merge pull request #3038 from martin-frbg/issue3037 Martin Kroeker 2020-12-16 00:04:45 +01:00
  • 865676682d Add Intel Rocket Lake Martin Kroeker 2020-12-14 22:40:23 +01:00
  • 0f7776af0b Add Intel Rocket Lake Martin Kroeker 2020-12-14 22:30:36 +01:00
  • b03dc011be Fix undefined CC variable in clang check Martin Kroeker 2020-12-14 19:21:52 +01:00
  • 77460ac255 Fix gemm_batch bug for SMALL_MATRIX_OPT=1. small_matrices Zhang Xianyi 2020-12-12 18:59:07 +08:00
  • 88e6806e3f Init cblas_?gemm_batch implementation. Zhang Xianyi 2020-12-12 17:05:14 +08:00
  • 00ce35336e Fix spurious removal of a trailing character from the hostarch string on x86_64 Martin Kroeker 2020-12-13 21:28:01 +01:00
  • 723776ddf7 Merge pull request #4 from xianyi/develop Martin Kroeker 2020-12-13 21:22:41 +01:00
  • 5a77ec7f1c Merge pull request #3036 from RajalakshmiSR/p10copyalign Martin Kroeker 2020-12-13 21:21:34 +01:00
  • 2fb11f873b POWER10: Improve copy performance Rajalakshmi Srinivasaraghavan 2020-12-13 10:41:45 -06:00
  • ad63647446 Define BLAS acronym in README Joshie 2020-12-13 09:06:14 +00:00
  • 87315e8a8d Update version to 0.3.13.dev Martin Kroeker 2020-12-12 23:28:49 +01:00
  • 9031ebd7d5 Update version to 0.3.13.dev Martin Kroeker 2020-12-12 23:28:20 +01:00
  • 12b41d5598 Merge pull request #3034 from xianyi/release-0.3.0 Martin Kroeker 2020-12-12 23:27:40 +01:00
  • d2b11c4777 Merge pull request #3033 from xianyi/develop v0.3.13 Martin Kroeker 2020-12-12 18:19:29 +01:00
  • 7bc0e4a2e0 Update version to 0.3.13 for release Martin Kroeker 2020-12-12 18:15:33 +01:00
  • d3ec787f77 Update version to 0.3.13 for release Martin Kroeker 2020-12-12 18:14:49 +01:00
  • 2c309c235d Merge pull request #3031 from martin-frbg/changelog13 Martin Kroeker 2020-12-12 18:13:23 +01:00
  • 3dec81200c Update Changelog.txt Martin Kroeker 2020-12-12 14:27:37 +01:00
  • 737724607f Merge pull request #3030 from martin-frbg/fix2994 Martin Kroeker 2020-12-12 10:01:45 +01:00
  • 77edf82c7f Update Changelog.txt for 0.3.13 Martin Kroeker 2020-12-12 01:25:20 +01:00
  • 6232237dba Make fallback from P10 to P9 conditional on suitable compiler Martin Kroeker 2020-12-11 23:41:17 +01:00
  • 7d81acc762 Merge pull request #3 from xianyi/develop Martin Kroeker 2020-12-11 23:38:42 +01:00
  • 18d8a67485 Merge pull request #2994 from antonblanchard/power10-fixes Martin Kroeker 2020-12-11 23:37:30 +01:00
  • 043128cbe5 Merge pull request #3029 from RajalakshmiSR/axpyp10 Martin Kroeker 2020-12-10 22:49:28 +01:00
  • 3331ca492d Merge pull request #3021 from austinpagan/trsm_p10 Martin Kroeker 2020-12-10 19:42:54 +01:00
  • 346e30a46a POWER10: Improve axpy performance Rajalakshmi Srinivasaraghavan 2020-12-10 11:51:42 -06:00
  • 83de62c20d Merge pull request #3026 from martin-frbg/revert747 Martin Kroeker 2020-12-10 16:29:41 +01:00
  • 658da9a769 Merge pull request #3027 from gxw-loongson/develop Martin Kroeker 2020-12-10 16:27:30 +01:00
  • be24c66a7c Keep LOONGSON3A and LOONGSON3B for loongson gxw 2020-12-10 10:48:53 +08:00
  • 4b548857d6 Add msa support for loongson gxw 2020-11-26 14:59:41 +08:00
  • d71fe4ed4e Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747) Martin Kroeker 2020-12-08 21:07:57 +01:00
  • a554712439 remove extra/intermediate size step for min_jj introduced in PR747 Martin Kroeker 2020-12-08 21:01:36 +01:00
  • 5d26223f4a remove extra/intermediate size step of min_jj from PR747 Martin Kroeker 2020-12-08 20:59:56 +01:00
  • 980ab349bc Merge pull request #2 from xianyi/develop Martin Kroeker 2020-12-08 20:53:35 +01:00
  • d67babf345 Remove gcc unrecognized option '-msched-weight' when check msa gxw 2020-12-08 19:16:39 +08:00
  • 7f11e33e8d Merge pull request #3025 from TiredNotTear/develop Martin Kroeker 2020-12-08 09:39:27 +01:00
  • 7834c10e2f Add PingTouGe contribution credit. ck860v Xianyi Zhang 2020-12-07 16:55:05 +08:00
  • 53e0837809 Merge pull request #3022 from jinboson/develop Martin Kroeker 2020-12-07 08:09:11 +01:00
  • ad38bd0e89 Fix failed cgemv and zgemv test case after using msa optimization Hao Chen 2020-12-07 10:18:51 +08:00
  • 47b639cc9b Fix failed sswap and dswap case by using msa optimization Hao Chen 2020-12-07 10:04:00 +08:00
  • 8fef5876d1 Merge pull request #3024 from martin-frbg/sparc Martin Kroeker 2020-12-06 22:34:36 +01:00
  • 6c7d557a16 Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio Martin Kroeker 2020-12-06 19:20:50 +01:00
  • b660008c7e Work around DOT and SWAP test failures Martin Kroeker 2020-12-06 19:15:37 +01:00
  • f8346603cf Fix compilation with SolarisStudio Martin Kroeker 2020-12-06 19:14:16 +01:00