Commit Graph

  • 974acb39ff Merge pull request #3181 from RajalakshmiSR/dgemmp10vp Martin Kroeker 2021-04-14 22:43:02 +02:00
  • 2379abaa5e POWER10: Improve dgemm performance Rajalakshmi Srinivasaraghavan 2021-04-13 22:30:06 -05:00
  • 3caf781d7c Merge pull request #3179 from RajalakshmiSR/zgemvp10 Martin Kroeker 2021-04-11 10:01:09 +02:00
  • 55bb9f639a POWER10: Optimized zgemv Rajalakshmi Srinivasaraghavan 2021-04-10 19:00:24 -05:00
  • 0dba04bb58 Merge pull request #3178 from martin-frbg/fix2864 Martin Kroeker 2021-04-09 13:38:05 +02:00
  • e96f5e3c65 Fix implicit typing of new variable TWO Martin Kroeker 2021-04-09 10:04:15 +02:00
  • 558724e99f Fix implicit typing of new variable TWO Martin Kroeker 2021-04-09 10:03:31 +02:00
  • 067c96a873 Merge pull request #3177 from martin-frbg/issue3176 Martin Kroeker 2021-04-07 08:22:42 +02:00
  • 4b380c0b40 Merge pull request #3175 from LYP951018/develop Martin Kroeker 2021-04-07 08:22:28 +02:00
  • 2dfb24730d Use "old" compute(24) function with clang due to register limitations Martin Kroeker 2021-04-06 19:58:32 +02:00
  • 725432efaa pass NO_AVX512 macro def 刘雨培 2021-04-07 00:10:41 +08:00
  • a2216ef19f Merge pull request #3173 from martin-frbg/dyna-sse3 Martin Kroeker 2021-04-05 13:39:17 +02:00
  • 5332cbae18 Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds Martin Kroeker 2021-04-04 23:12:17 +02:00
  • 209b026e46 Merge pull request #3172 from martin-frbg/lapack477-final Martin Kroeker 2021-04-04 20:19:09 +02:00
  • 1ae607beca Update Makefile.x86_64 Martin Kroeker 2021-04-04 12:31:22 +02:00
  • d393f1923f Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake Martin Kroeker 2021-04-03 22:18:15 +02:00
  • 081d5ae971 Fix typo and potentially undefined variables Martin Kroeker 2021-04-03 22:11:14 +02:00
  • 0492f0f3f9 Merge pull request #22 from xianyi/develop Martin Kroeker 2021-04-03 21:58:36 +02:00
  • 147e0a75fd Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read Martin Kroeker 2021-04-03 19:49:47 +02:00
  • ee068af843 Merge pull request #3171 from RajalakshmiSR/BE_p10 Martin Kroeker 2021-04-01 21:20:24 +02:00
  • 2dbcddd83d POWER10: Adding check for little endian Rajalakshmi Srinivasaraghavan 2021-03-31 21:32:42 -05:00
  • d2bda3b56a Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro CodesWithWolves 2021-03-31 15:38:07 -04:00
  • 903fd85c85 Merge pull request #3167 from xianyi/fix3126 Martin Kroeker 2021-03-27 12:40:42 +01:00
  • d57c681a6d Fix compilation on older OSX versions fix3126 Martin Kroeker 2021-03-26 22:29:29 +01:00
  • d7efe5857c Merge pull request #3165 from martin-frbg/azure-osx Martin Kroeker 2021-03-24 14:05:34 +01:00
  • 8fd694c18f Update .travis.yml Martin Kroeker 2021-03-24 10:36:29 +01:00
  • e69b0b1771 Update azure-pipelines.yml Martin Kroeker 2021-03-24 10:34:24 +01:00
  • 9dc0bfd617 Update azure-pipelines.yml Martin Kroeker 2021-03-24 08:54:30 +01:00
  • e6664ec2c9 Update azure-pipelines.yml Martin Kroeker 2021-03-24 08:41:48 +01:00
  • dbb33f412f Update azure-pipelines.yml Martin Kroeker 2021-03-24 08:30:48 +01:00
  • 70b89a6205 Add OSX build to Azure Martin Kroeker 2021-03-24 07:50:35 +01:00
  • 07b144855a Merge pull request #3164 from martin-frbg/travisosxomp Martin Kroeker 2021-03-24 06:56:10 +01:00
  • 292a0aed66 Fix xcode12 build and add OSX/OpenMP Martin Kroeker 2021-03-24 06:55:14 +01:00
  • 42f0201e21 Merge pull request #20 from xianyi/develop Martin Kroeker 2021-03-22 17:53:43 +01:00
  • 22db876d48 Merge pull request #3158 from austinpagan/Gemm.CZPQ Martin Kroeker 2021-03-19 20:53:21 +01:00
  • bdd6e3a153 Merge pull request #3157 from martin-frbg/issue3020-final Martin Kroeker 2021-03-19 15:23:12 +01:00
  • 7b8f580941 Merge pull request #3156 from martin-frbg/omatcopy_d Martin Kroeker 2021-03-19 15:22:48 +01:00
  • 198adea961 Changed default P/Q values for CGEMM and ZGEMM (Power10 only) Gordon Fossum 2021-03-19 10:05:23 -04:00
  • 86c5a0013f Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler Martin Kroeker 2021-03-19 11:47:58 +01:00
  • ef85c22474 Add workaround for LAPACK test failures with the NVIDIA HPC compiler Martin Kroeker 2021-03-19 11:46:25 +01:00
  • d3555d2e50 Add workaround for LAPACK test failures with the NVIDIA HPC compiler Martin Kroeker 2021-03-19 11:44:31 +01:00
  • c4b91bfcf1 Merge pull request #3155 from martin-frbg/issue3152 Martin Kroeker 2021-03-19 09:55:31 +01:00
  • 0f5e86a0d9 Remove premature entry for DOMATCOPY_RT Martin Kroeker 2021-03-18 21:53:50 +01:00
  • 7b294a99fd Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time Martin Kroeker 2021-03-18 21:28:19 +01:00
  • 1e4b2e98d9 Merge pull request #3154 from martin-frbg/issue3153 Martin Kroeker 2021-03-18 12:35:47 +01:00
  • 3fd6ccdf76 Include just the definition of BLASLONG rather than all of common.h Martin Kroeker 2021-03-18 07:50:19 +01:00
  • fa9a30b491 Merge pull request #19 from xianyi/develop Martin Kroeker 2021-03-18 07:47:03 +01:00
  • d90ca75a6c Update version to 0.3.14.dev Martin Kroeker 2021-03-17 21:14:42 +01:00
  • e107454454 Update version to 0.3.14.dev Martin Kroeker 2021-03-17 21:14:05 +01:00
  • d43962d013 Merge pull request #3151 from xianyi/release-0.3.0 Martin Kroeker 2021-03-17 21:13:25 +01:00
  • 2f6d35c3d4 Merge pull request #3150 from xianyi/develop v0.3.14 Martin Kroeker 2021-03-17 20:21:42 +01:00
  • 86de5f768b Update version to 0.3.14 for release Martin Kroeker 2021-03-17 20:20:34 +01:00
  • 2663e44724 Update version to 0.3.14 for release Martin Kroeker 2021-03-17 20:20:00 +01:00
  • 6f2900c164 Merge pull request #3149 from martin-frbg/changelog14 Martin Kroeker 2021-03-17 20:14:50 +01:00
  • 7888b5127c Update Changelog for 0.3.14 Martin Kroeker 2021-03-17 16:17:55 +01:00
  • 8808c291b9 Merge pull request #3148 from martin-frbg/issue3145 Martin Kroeker 2021-03-17 09:05:43 +01:00
  • 8cdf0825de Add workaround for older gcc on ppc64be not supporting casts in defines Martin Kroeker 2021-03-16 21:20:05 +01:00
  • 9e0dbe8e59 Merge pull request #18 from xianyi/develop Martin Kroeker 2021-03-16 21:09:45 +01:00
  • 52f99d3944 Merge pull request #3147 from martin-frbg/issue3146 Martin Kroeker 2021-03-16 20:25:42 +01:00
  • 186368ddc3 Fix compilation with CLANG Martin Kroeker 2021-03-16 16:52:57 +01:00
  • c0b94ae1df Merge pull request #3143 from martin-frbg/fix3088 Martin Kroeker 2021-03-14 23:12:55 +01:00
  • ddd86309a1 Merge pull request #3144 from xoviat/fix-test Martin Kroeker 2021-03-14 23:12:33 +01:00
  • e9d453b623 disable openmp xoviat 2021-03-14 16:34:02 -05:00
  • ecb4babcf4 remove inclusion of common.h again to avoid circular dependency Martin Kroeker 2021-03-14 17:36:51 +01:00
  • 34753eaebb Include common.h (and indirectly param.h) rather than just param.h to have BLASLONG available w/o circular dependencies Martin Kroeker 2021-03-14 17:28:43 +01:00
  • efa72a631b Merge pull request #17 from xianyi/develop Martin Kroeker 2021-03-14 17:20:49 +01:00
  • 30d835168a Merge pull request #3088 from xoviat/msvc Martin Kroeker 2021-03-14 17:14:28 +01:00
  • 8f6a744807 Merge pull request #3141 from martin-frbg/nagfor-2 Martin Kroeker 2021-03-13 23:04:53 +01:00
  • 6726771645 Support compilation with NAG fortran Martin Kroeker 2021-03-13 20:16:18 +01:00
  • a51cae6b2e Merge pull request #3140 from martin-frbg/issue3139 Martin Kroeker 2021-03-12 15:35:58 +01:00
  • d30b943251 Merge pull request #3138 from martin-frbg/nagfor Martin Kroeker 2021-03-12 12:46:19 +01:00
  • 0934568d9c Move includes under the ifdef for compilers w/o intrinsics support Martin Kroeker 2021-03-12 12:42:05 +01:00
  • 697e64bbb6 Fix syntax Martin Kroeker 2021-03-11 23:03:58 +01:00
  • bffb9b0e95 Merge pull request #3136 from austinpagan/Gemm.PQ Martin Kroeker 2021-03-11 15:17:48 +01:00
  • 6ae7af78a3 Support compilation with nagfor Martin Kroeker 2021-03-11 11:53:51 +01:00
  • 041a26fd79 Support compilation with nagfor Martin Kroeker 2021-03-11 11:52:29 +01:00
  • 3c356b1a1f Support compilation with the NAG Fortran compiler Martin Kroeker 2021-03-11 11:51:09 +01:00
  • b1215f2f8c Merge pull request #16 from xianyi/develop Martin Kroeker 2021-03-11 11:48:37 +01:00
  • 0b73041b16 Merge pull request #3137 from RajalakshmiSR/zscal_p10 Martin Kroeker 2021-03-11 07:18:05 +01:00
  • 9579bd47e5 Modifying a couple paramaters in the "POWER10"-specific section of param.h, for performance enhancements for SGEMM and DGEMM. austinpagan 2021-03-10 18:19:12 -05:00
  • 09d47af2c0 Optimize zscal function for POWER10 Rajalakshmi Srinivasaraghavan 2021-03-10 17:15:33 -06:00
  • ef0238ba2b Merge pull request #3130 from martin-frbg/issue3128 Martin Kroeker 2021-03-06 19:15:53 +01:00
  • a9f6f7ad39 Remove spurious AVX512 requirement and add AVX2/FMA3 guard Martin Kroeker 2021-03-06 14:35:49 +01:00
  • 1d254d321b Merge pull request #3129 from RajalakshmiSR/asum_p10 Martin Kroeker 2021-03-06 09:13:59 +01:00
  • 41646ed006 Optimize s/dasum function for POWER10 Rajalakshmi Srinivasaraghavan 2021-03-05 16:22:36 -06:00
  • 3679781872 Merge pull request #3126 from martin-frbg/m1bench Martin Kroeker 2021-03-02 21:27:21 +01:00
  • 38dcf3454b Support timing Apple M1 Martin Kroeker 2021-03-02 17:50:55 +01:00
  • e34d57ca90 Merge pull request #3125 from martin-frbg/issue3123 Martin Kroeker 2021-03-02 09:58:40 +01:00
  • 20f492c298 Fix AMD AOCC compiler detection Martin Kroeker 2021-03-01 21:00:10 +01:00
  • c7c82be1c3 Merge pull request #3122 from martin-frbg/xeigtstz Martin Kroeker 2021-02-28 22:13:09 +01:00
  • 9564f688c4 Adjust build rules for ?chkee.F Martin Kroeker 2021-02-28 18:57:05 +01:00
  • 90c1776c86 Adjust build rules for ?chkee.F Martin Kroeker 2021-02-28 18:53:20 +01:00
  • 9cf861e8fa Add rewritten cchkee.F from Reference-LAPACK PR335 Martin Kroeker 2021-02-28 18:51:03 +01:00
  • 9b7b1da133 Add rewritten dchkee.F from Reference-LAPACK PR335 Martin Kroeker 2021-02-28 18:50:26 +01:00
  • a5ab891292 Add rewritten schkee.F from Reference-LAPACK PR335 Martin Kroeker 2021-02-28 18:49:50 +01:00
  • 90bb4ac821 Add rewritten zchkee.F from Reference-LAPACK PR335 Martin Kroeker 2021-02-28 18:49:10 +01:00
  • 23a0d1bc1f Delete zchkee.f Martin Kroeker 2021-02-28 18:47:06 +01:00
  • 0e96c378fd Delete schkee.f Martin Kroeker 2021-02-28 18:46:52 +01:00
  • ee16efff3c Delete dchkee.f Martin Kroeker 2021-02-28 18:46:38 +01:00
  • 0197519dd7 Delete cchkee.f Martin Kroeker 2021-02-28 18:46:08 +01:00