Commit Graph

  • 449e8ea443 Merge pull request #26 from xianyi/develop Martin Kroeker 2020-02-09 23:23:55 +01:00
  • 3bec250cf9 Increment version to 0.3.9.dev Martin Kroeker 2020-02-09 23:18:44 +01:00
  • f03dd23e90 Increment version to 0.3.9.dev Martin Kroeker 2020-02-09 23:18:07 +01:00
  • fb5eb47558 Merge pull request #2398 from xianyi/develop v0.3.8 Martin Kroeker 2020-02-09 23:16:28 +01:00
  • fa93d63365 Merge branch 'release-0.3.0' into develop Martin Kroeker 2020-02-09 23:16:06 +01:00
  • 90e6c66a57 Merge pull request #2397 from martin-frbg/038changes Martin Kroeker 2020-02-09 23:01:52 +01:00
  • 32d97330b3 Update with changes from 0.3.8 Martin Kroeker 2020-02-09 23:00:36 +01:00
  • 29eaf4b6d7 Merge pull request #25 from xianyi/develop Martin Kroeker 2020-02-09 22:48:15 +01:00
  • 47c1bf7f4d typo fixes Martin Kroeker 2020-02-09 01:06:40 +01:00
  • 2b55f0ad30 Merge pull request #2393 from martin-frbg/issue2388 Martin Kroeker 2020-02-09 01:00:33 +01:00
  • a5b32ab06c Merge pull request #2390 from martin-frbg/pgi Martin Kroeker 2020-02-09 00:13:40 +01:00
  • 50545b19d0 Update CPU and OS support and document DYNAMIC_ARCH option in README.md Martin Kroeker 2020-02-09 00:06:07 +01:00
  • b3cbd60d7a Remove PGI from list again as it is actually still not capable Martin Kroeker 2020-02-08 10:20:13 +01:00
  • 70199d1905 Merge pull request #2389 from Zeyiii/develop Martin Kroeker 2020-02-07 16:05:46 +01:00
  • cfe63d8cc2 Remove OpenMP libraries from link list Martin Kroeker 2020-02-07 16:03:51 +01:00
  • d55b10830f Remove OpenMP libraries from link list Martin Kroeker 2020-02-07 16:02:17 +01:00
  • c1c10cbb21 Merge pull request #2384 from wjc404/develop Martin Kroeker 2020-02-07 13:47:12 +01:00
  • 5989841524 Add PGI to avx512-supporting compilers Martin Kroeker 2020-02-07 13:01:31 +01:00
  • 68a43db358 Fix utest compilation with PGI Martin Kroeker 2020-02-07 10:15:18 +01:00
  • 9694037b23 Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test Martin Kroeker 2020-02-07 10:09:25 +01:00
  • 71faa1c1a7 Merge pull request #24 from xianyi/develop Martin Kroeker 2020-02-07 10:03:02 +01:00
  • 3447d04eaf Update dgemm_kernel_16x2_skylakex.c wjc404 2020-02-06 02:14:10 +00:00
  • 8b5cdcc64c Update sgemm_kernel_8x4_haswell.c wjc404 2020-02-06 01:47:46 +00:00
  • 4e00d96a78 Update dgemm_kernel_16x2_skylakex.c wjc404 2020-02-06 01:46:36 +00:00
  • ce9ea8f826 Fix another branch w00421467 2020-02-05 15:07:18 +08:00
  • 0b909203cb Fix bugs in benchmark of gemv w00421467 2020-02-05 14:53:37 +08:00
  • 096da2f51a Update dgemm_kernel_16x2_skylakex.c wjc404 2020-02-05 13:36:57 +08:00
  • 2f96a2c55b Update trmm_R.c wjc404 2020-02-05 10:15:02 +08:00
  • 833bd0f8ff Update trmm_L.c wjc404 2020-02-05 10:09:41 +08:00
  • 77b8f49556 Update level3_thread.c wjc404 2020-02-04 20:33:08 +08:00
  • 1c3e20ce48 Update level3.c wjc404 2020-02-04 20:30:23 +08:00
  • 83b6be7976 Update param.h wjc404 2020-02-04 19:55:26 +08:00
  • 081b188529 Update KERNEL.SKYLAKEX wjc404 2020-02-03 21:38:08 +08:00
  • f3f969f681 Update param.h wjc404 2020-02-03 21:34:12 +08:00
  • 8019e70211 AVX512 16x2 DGEMM kernel wjc404 2020-02-03 21:32:56 +08:00
  • 8d2a796f49 Merge pull request #2378 from martin-frbg/issue2377 Martin Kroeker 2020-01-30 17:07:19 +01:00
  • 8dc9fd4dfe Add -march option for AVX512 Martin Kroeker 2020-01-30 12:41:18 +01:00
  • abc67bdd74 Merge pull request #2375 from ewanglong/master Martin Kroeker 2020-01-30 10:27:29 +01:00
  • 1f62a82789 Merge pull request #2376 from wjc404/develop Martin Kroeker 2020-01-23 21:50:19 +01:00
  • e9fb8f62b1 Update level3_gemm3m_thread.c wjc404 2020-01-22 17:40:03 +00:00
  • fbf4f48f4a fix a few performance drop in some matrix size per data type Wang,Long 2020-01-22 15:07:50 +00:00
  • b9ad450295 Merge pull request #2373 from Qiyu8/optimize#gemmbeta Martin Kroeker 2020-01-21 15:05:38 +01:00
  • e011ad820a Merge pull request #2372 from martin-frbg/winexit Martin Kroeker 2020-01-21 14:56:45 +01:00
  • ff42e68652 Optimize genenal Gemm Beta Qiyu8 2020-01-20 11:49:42 +08:00
  • 23f322f997 Do not run any cleanup if the program is exiting anyway Martin Kroeker 2020-01-19 13:28:27 +01:00
  • 093d37de8d Merge pull request #2371 from martin-frbg/issue2370 Martin Kroeker 2020-01-18 20:39:34 +01:00
  • d65e9a2bbd Merge pull request #2253 from thrasibule/xerbla Martin Kroeker 2020-01-18 20:39:04 +01:00
  • 78100b8093 Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT Martin Kroeker 2020-01-18 15:06:39 +01:00
  • 70f45749b9 Merge pull request #2367 from wjc404/develop Martin Kroeker 2020-01-15 21:13:43 +01:00
  • e5dcdeb550 Update sgemm_direct_skylakex.c wjc404 2020-01-13 16:59:23 +08:00
  • 952cc2ba38 Update sgemm_kernel_16x4_skylakex_2.c wjc404 2020-01-13 16:58:54 +08:00
  • feaafbedd3 make skylakex sgemm code more friendly for readers wjc404 2020-01-13 16:28:41 +08:00
  • 1c67567008 improve skylakex paralleled sgemm performance wjc404 2020-01-13 16:26:03 +08:00
  • 4e979bf75b Merge pull request #2366 from martin-frbg/install390 Martin Kroeker 2020-01-13 09:00:21 +01:00
  • daa4310db5 Install new lapack.h Martin Kroeker 2020-01-12 22:00:50 +01:00
  • b8f3605132 Merge pull request #23 from xianyi/develop Martin Kroeker 2020-01-12 21:57:23 +01:00
  • b36018be6d Merge pull request #2365 from wjc404/develop Martin Kroeker 2020-01-09 23:23:09 +01:00
  • 3a100b2797 Update KERNEL.SKYLAKEX wjc404 2020-01-09 13:48:41 +08:00
  • 38742d5547 Merge pull request #2361 from wjc404/develop Martin Kroeker 2020-01-08 16:20:28 +01:00
  • bd4c032f52 Update sgemm_kernel_8x4_haswell.c wjc404 2020-01-07 11:22:46 +08:00
  • 9dc9b7b95e Update sgemm_kernel_8x4_haswell.c wjc404 2020-01-06 20:11:36 +08:00
  • 9f5cdc49d4 Update CONTRIBUTORS.md wjc404 2020-01-06 12:28:43 +08:00
  • b7b408a120 optimize AVX2 SGEMM wjc404 2020-01-06 12:16:09 +08:00
  • 92b10212de optimize AVX2 SGEMM wjc404 2020-01-06 12:11:21 +08:00
  • b73bf01378 optimize AVX2 SGEMM wjc404 2020-01-06 12:09:14 +08:00
  • eb3c9f1db9 optimize AVX2 SGEMM wjc404 2020-01-06 12:07:02 +08:00
  • fd2ff2714f Merge pull request #2359 from martin-frbg/lapack-pr330 Martin Kroeker 2020-01-03 15:03:30 +01:00
  • 2ea2bd99c7 Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers Martin Kroeker 2020-01-03 11:10:00 +01:00
  • fbb894948c Merge pull request #22 from xianyi/develop Martin Kroeker 2020-01-03 10:23:25 +01:00
  • e711659c90 Merge pull request #2358 from shengyang-3390/develop Martin Kroeker 2020-01-03 09:02:03 +01:00
  • 893e6e57c4 modified: ctest/din3 ctest/sin3 shengyang 2020-01-03 10:03:33 +08:00
  • 456ee2e1f0 Merge pull request #2357 from chenxuqiang/dgemm_beta_zero Martin Kroeker 2020-01-02 22:28:36 +01:00
  • 9998f8ed8b Merge pull request #2356 from shengyang-3390/develop Martin Kroeker 2020-01-02 22:27:44 +01:00
  • 80db5f11e1 update shengyang 2020-01-02 11:01:57 +08:00
  • 52de4cc8fd kernel/arm64/dgemm_beta.S: add beta == zero branch chenxuqiang 2020-01-01 21:50:45 -05:00
  • 44028581cc Merge pull request #2355 from Zeyiii/dev-zeyi2 Martin Kroeker 2020-01-01 22:14:16 +01:00
  • 86ab939936 Merge pull request #2354 from ZuoQ3/develop Martin Kroeker 2020-01-01 22:13:37 +01:00
  • 375b1875c8 [WIP] Update LAPACK to 3.9.0 (#2353) Martin Kroeker 2020-01-01 13:18:53 +01:00
  • 6c85cb1869 Merge pull request #2352 from wjc404/develop Martin Kroeker 2019-12-31 18:08:10 +01:00
  • 995768bbc5 Merge pull request #2351 from Zeyiii/develop Martin Kroeker 2019-12-31 18:07:37 +01:00
  • 96ad579428 add in runtime cpu detection for zarch (#2349) int_13h 2019-12-31 22:33:27 +05:30
  • 8d84403205 Use arm neon instructions to optimize ncopy operation shengyang 2019-12-31 17:06:35 +08:00
  • 8729db117c modified: ctest/din3 modified: ctest/sin3 shengyang 2019-12-31 15:59:52 +08:00
  • 0833a4846a Use arm neon instructions to optimize sgemm_beta operation w00421467 2019-12-31 10:31:07 +08:00
  • 50f7fc1401 [WIP] Use arm neon instructions to optimize tcopy operation zq 2019-12-31 10:21:23 +08:00
  • d1b53806be Merge remote-tracking branch 'pub/develop' into develop w00421467 2019-12-31 10:13:24 +08:00
  • a0f0a802fc Update zgemm3m_kernel_4x4_haswell.c wjc404 2019-12-30 17:33:42 +08:00
  • 700fe5b5ee Add files via upload wjc404 2019-12-30 17:18:59 +08:00
  • bb2729c855 Update CONTRIBUTORS.md wjc404 2019-12-30 16:11:37 +08:00
  • aae44d040d Update CONTRIBUTORS.md wjc404 2019-12-30 16:10:08 +08:00
  • 6362c34ee6 Update param.h wjc404 2019-12-30 16:08:19 +08:00
  • f60840c420 Update KERNEL.ZEN wjc404 2019-12-30 16:04:23 +08:00
  • 109e18cd96 Update KERNEL.HASWELL wjc404 2019-12-30 16:03:24 +08:00
  • ae1579be13 Create zgemm3m_kernel_4x4_haswell.c wjc404 2019-12-30 16:02:51 +08:00
  • 3ccf8885ac prefetching for dgemm_beta w00421467 2019-12-30 11:45:49 +08:00
  • 454847588e Update LAPACK to 3.9.0 Martin Kroeker 2019-12-29 21:27:18 +01:00
  • 0257f26488 Merge pull request #21 from xianyi/develop Martin Kroeker 2019-12-29 18:08:55 +01:00
  • c45b7aef14 Merge pull request #2348 from wjc404/develop Martin Kroeker 2019-12-28 20:07:56 +01:00
  • 312060d0d6 Update CONTRIBUTORS.md wjc404 2019-12-27 23:36:13 +08:00
  • cd765f094b Update cgemm3m_kernel_8x4_haswell.c wjc404 2019-12-27 18:23:29 +08:00