Commit Graph

  • edb423d772 align general register using to strmm_kernel_8x8 张丹枫 2020-05-20 21:52:49 +08:00
  • 0e6eb8c247 sgemm kernel use sgemm_kernel_8x8_cortexa53 zhangdanfeng 2020-05-18 16:51:33 +08:00
  • d475db29c6 optimized for cortex-a53 zhangdanfeng 2020-05-18 16:47:33 +08:00
  • 729ac6bd4a Merge pull request #2623 from mhillenibm/zarch_dgemm_z14 Martin Kroeker 2020-05-20 14:51:04 +02:00
  • 89fe17f20e s390x: Use new sgemm kernel also for DGEMM and DTRMM on Z14 Marius Hillenbrand 2020-05-19 14:56:34 +02:00
  • bdd795ed03 s390x/GEMM: replace 0-init with peeled first iteration Marius Hillenbrand 2020-05-19 14:30:44 +02:00
  • e1038ea836 Merge pull request #2622 from martin-frbg/issue2619 Martin Kroeker 2020-05-19 23:07:22 +02:00
  • 6baa9a778d Improve declaration of LAPACKE_get_nancheck Martin Kroeker 2020-05-19 17:59:31 +02:00
  • cf46c9f84e Merge pull request #2617 from martin-frbg/issue2616 Martin Kroeker 2020-05-18 13:23:58 +02:00
  • 55602fce56 Ignore spurious all-numeric library names derived from mishandled jobserver flags Martin Kroeker 2020-05-17 15:28:14 +02:00
  • 3d5e159e7a Ignore spurious all-numeric library names derived from mishandled jobserver flags Martin Kroeker 2020-05-17 15:26:57 +02:00
  • 2931feb575 Merge pull request #58 from xianyi/develop Martin Kroeker 2020-05-17 15:23:32 +02:00
  • 20245ded5f Merge pull request #2615 from mhillenibm/z14_alignment_hints Martin Kroeker 2020-05-14 21:06:34 +02:00
  • 2840432e49 s390x: improvise vector alignment hints for older compilers Marius Hillenbrand 2020-05-13 17:48:50 +02:00
  • ea78106c71 Merge pull request #2614 from mhillenibm/gemm_vec_z14 Martin Kroeker 2020-05-13 15:09:23 +02:00
  • cb9dc36dd5 Update CONTRIBUTORS.md Marius Hillenbrand 2020-05-12 16:14:00 +02:00
  • 1b0b4349a1 s390x/Z14: Change register blocking for SGEMM to 16x4 Marius Hillenbrand 2020-05-12 15:06:38 +02:00
  • 71b6eaf459 s390x: Use new sgemm kernel also for strmm on Z14 and newer Marius Hillenbrand 2020-05-12 14:40:30 +02:00
  • 43c0d4f312 s390x: Add vectorized sgemm kernel for Z14 and newer Marius Hillenbrand 2020-05-12 14:13:54 +02:00
  • d7c1677c20 Update CONTRIBUTORS.md, adding myself Marius Hillenbrand 2020-05-12 11:09:28 +02:00
  • 0dbe61a612 s390x: choose SIMD kernels at run-time based on OS and compiler support Marius Hillenbrand 2020-05-11 13:00:10 +02:00
  • 62cf391cbb s390x: only build kernels supported by gcc with dynamic arch support Marius Hillenbrand 2020-05-11 18:37:04 +02:00
  • 8c338616f9 s390x: gate dynamic arch detection on gcc version and add generic Marius Hillenbrand 2020-05-11 12:37:21 +02:00
  • f94c53ec0a Merge pull request #2612 from RajalakshmiSR/testshgemm Martin Kroeker 2020-05-12 08:34:02 +02:00
  • 8efba9b7c0 Improve shgemm test Rajalakshmi Srinivasaraghavan 2020-05-11 17:15:10 -05:00
  • 4fffa556d8 Merge pull request #2611 from RajalakshmiSR/bench_half Martin Kroeker 2020-05-11 21:08:41 +02:00
  • ce90e2bd3f Include shgemm in benchtest Rajalakshmi Srinivasaraghavan 2020-05-11 09:57:46 -05:00
  • 948b6712ba Merge pull request #2610 from martin-frbg/issue2552-3 Martin Kroeker 2020-05-10 13:10:31 +02:00
  • 2271c3506b Work around excessive LAPACK test failures on Skylake-X Martin Kroeker 2020-05-09 23:49:18 +02:00
  • db00b21445 Merge pull request #2609 from martin-frbg/issue2552-2 Martin Kroeker 2020-05-09 21:33:02 +02:00
  • 58d26b4448 Correct ifort options Martin Kroeker 2020-05-09 17:15:36 +02:00
  • 8e47d14053 Merge pull request #2608 from martin-frbg/issue2604 Martin Kroeker 2020-05-09 16:36:14 +02:00
  • cd10b35fe9 Handle trailing spaces and empty condition variables Martin Kroeker 2020-05-09 13:42:33 +02:00
  • 9472dd99cd Merge pull request #57 from xianyi/develop Martin Kroeker 2020-05-09 13:20:44 +02:00
  • 7181665452 Merge pull request #2605 from RajalakshmiSR/cmake-power Martin Kroeker 2020-05-09 11:29:28 +02:00
  • bd9ff820bc Fix cmake compilation issue - POWER9 Rajalakshmi Srinivasaraghavan 2020-05-08 20:31:56 -05:00
  • 63e45def70 Merge pull request #2603 from martin-frbg/issue2552 Martin Kroeker 2020-05-08 22:08:39 +02:00
  • ec0f228632 Add FFLAGS_DRV to the generated make.inc to fix lapack-test on x86_64 with icc/ifort Martin Kroeker 2020-05-08 18:06:12 +02:00
  • 90e2941c61 Merge pull request #56 from xianyi/develop Martin Kroeker 2020-05-07 22:43:48 +02:00
  • 10d5f3c87b Merge pull request #2602 from ashwinyes/thunderx2_develop Martin Kroeker 2020-05-07 22:06:41 +02:00
  • 8353cb245a ARM64: Improve DAXPY for ThunderX2 Ashwin Sekhar T K 2020-05-07 09:14:05 -07:00
  • ec2dd7b875 Merge pull request #2601 from martin-frbg/issue818 Martin Kroeker 2020-05-07 10:12:33 +02:00
  • 4e82eb9f8a Undefine ASMNAME/NAME/CNAME before defining them Martin Kroeker 2020-05-07 00:31:32 +02:00
  • 61300bb735 Merge pull request #55 from xianyi/develop Martin Kroeker 2020-05-07 00:27:14 +02:00
  • 33e9b12464 Merge pull request #2597 from martin-frbg/appleclang Martin Kroeker 2020-05-05 13:55:08 +02:00
  • 90dba9f716 Duplicate earlier Clang 9.0.0 workaround for corresponding Apple Clang version Martin Kroeker 2020-05-05 10:44:50 +02:00
  • 424d551e01 Merge pull request #53 from xianyi/develop Martin Kroeker 2020-05-01 15:18:46 +02:00
  • 596f5df9e8 Merge pull request #2591 from RajalakshmiSR/testhalf Martin Kroeker 2020-05-01 09:59:39 +02:00
  • 5dd14e3d48 Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) Martin Kroeker 2020-05-01 09:58:30 +02:00
  • a54e35e780 Merge pull request #2586 from martin-frbg/miscfixes Martin Kroeker 2020-04-29 22:01:41 +02:00
  • 564b0d39ef Add test for shgemm Rajalakshmi Srinivasaraghavan 2020-04-29 13:40:34 -05:00
  • 5d58b11101 Merge pull request #52 from xianyi/develop Martin Kroeker 2020-04-29 14:36:15 +02:00
  • d394d4e677 Merge pull request #2585 from martin-frbg/mips64fix Martin Kroeker 2020-04-28 19:47:55 +02:00
  • 9d3a317abc Refs #2587 Fix typos. Xianyi Zhang 2020-04-29 00:19:19 +08:00
  • 92372c70fc Fix gemm interface bug for small matrix. Xianyi Zhang 2020-04-28 23:15:20 +08:00
  • 43bef4aaac Add alpha=1.0 beta=0.0 for small gemm. Xianyi Zhang 2020-04-28 22:35:36 +08:00
  • aae6af94bb Add small marix optimization kernel interface. Xianyi Zhang 2020-04-28 19:01:36 +08:00
  • f4248af26e Fix compiler warnings Martin Kroeker 2020-04-28 10:43:12 +02:00
  • 2d89603e9d Increase BUFFER_SIZE on mips64 to match SGEMM parameters Martin Kroeker 2020-04-28 10:40:40 +02:00
  • 26bc15258a Merge pull request #51 from xianyi/develop Martin Kroeker 2020-04-28 10:38:50 +02:00
  • 141998dce2 Merge pull request #2584 from martin-frbg/issue2583 Martin Kroeker 2020-04-28 10:35:12 +02:00
  • 3bd56846bb Silence a debug message Martin Kroeker 2020-04-27 16:27:09 +02:00
  • e7bbdfdf84 Have CMAKE parse conditional lines in KERNEL files Martin Kroeker 2020-04-27 15:20:03 +02:00
  • b6795db731 Merge pull request #2582 from martin-frbg/mips32fix Martin Kroeker 2020-04-27 09:18:34 +02:00
  • 5e0dbf8dfe Increase default BUFFER_SIZE to accomodate SGEMM parameters Martin Kroeker 2020-04-26 22:21:05 +02:00
  • 955d73127f Merge pull request #50 from xianyi/develop Martin Kroeker 2020-04-26 22:17:56 +02:00
  • a8c1bea7ae Merge pull request #2581 from martin-frbg/raji Martin Kroeker 2020-04-25 19:57:10 +02:00
  • e43b49e064 Drop the set -e from travis scripts Martin Kroeker 2020-04-25 16:18:54 +02:00
  • 3e28db7f38 Update CONTRIBUTORS.md Martin Kroeker 2020-04-25 13:51:44 +02:00
  • 4b69ee31af Merge pull request #2580 from martin-frbg/issue2538-3 Martin Kroeker 2020-04-25 00:28:18 +02:00
  • 03ff213c51 Increase POWER8 ZGEMM_R and use same R values for POWER9 Martin Kroeker 2020-04-24 21:46:54 +02:00
  • 299d1c8de0 Merge pull request #2578 from martin-frbg/issue2576 Martin Kroeker 2020-04-24 14:32:46 +02:00
  • 70869d571f Quote include paths for getarch to protect any embedded spaces Martin Kroeker 2020-04-24 10:30:44 +02:00
  • cba87222b2 Merge pull request #49 from xianyi/develop Martin Kroeker 2020-04-24 10:21:48 +02:00
  • f80dd2151e xcode 11.4.1 for homebrew ? Martin Kroeker 2020-04-23 14:31:09 +02:00
  • 4412ee1754 Switch homebrew build env to new xcode 11.4 Martin Kroeker 2020-04-23 10:54:46 +02:00
  • f6104b68c1 Merge pull request #2571 from martin-frbg/issue2299 Martin Kroeker 2020-04-22 18:27:13 +02:00
  • 84f2c71e93 Merge pull request #2573 from martin-frbg/issue2572 Martin Kroeker 2020-04-22 15:04:49 +02:00
  • 06208c8d01 Limit this fix to ELFv2 builds Martin Kroeker 2020-04-22 14:16:40 +02:00
  • c90b28dee6 Export ELF_VERSION for use in powerpc kernel configurations Martin Kroeker 2020-04-22 14:14:20 +02:00
  • 6275b43918 Avoid duplicate printout of byte order and report ELF_VERSION Martin Kroeker 2020-04-22 14:12:27 +02:00
  • 2db5178e2d enable cblas interfaces to GEMM3M in CMAKE builds Martin Kroeker 2020-04-22 11:01:28 +02:00
  • 57549f5c92 Merge pull request #2569 from martin-frbg/issue2472-2 Martin Kroeker 2020-04-21 20:26:53 +02:00
  • f5c4c28b98 Work around POWER8BE bugs on FreeBSD (ELFv2) Martin Kroeker 2020-04-21 17:17:17 +02:00
  • 239282d5e2 Use CMAKE_SHARED_LINKER_FLAGS to pass MSVC linker option Martin Kroeker 2020-04-20 22:30:51 +02:00
  • 568674477c Merge pull request #48 from xianyi/develop Martin Kroeker 2020-04-20 21:51:59 +02:00
  • fa42588e1f Merge pull request #2565 from martin-frbg/mips24k Martin Kroeker 2020-04-20 17:13:53 +02:00
  • 8a6d26458b Merge pull request #2559 from RajalakshmiSR/shgemm Martin Kroeker 2020-04-19 22:09:55 +02:00
  • db86f516b9 Merge pull request #2568 from martin-frbg/azure-win Martin Kroeker 2020-04-19 19:06:33 +02:00
  • aec353b5a7 Add a Windows/CL build to the Azure Ci configuration Martin Kroeker 2020-04-19 19:04:33 +02:00
  • c62fbefad4 Merge pull request #2567 from xianyi/revert-2566-azurewin Martin Kroeker 2020-04-19 19:01:58 +02:00
  • 04706e760d Revert "Add Windows build job on Azure CI (#2566)" revert-2566-azurewin Martin Kroeker 2020-04-19 19:00:37 +02:00
  • e1e543b145 Add Windows build job on Azure CI (#2566) Martin Kroeker 2020-04-19 16:16:15 +02:00
  • e55ec82bb9 Delete KERNEL.1004K Martin Kroeker 2020-04-19 15:44:30 +02:00
  • 7353ea5afc Delete KERNEL.24K Martin Kroeker 2020-04-19 15:44:19 +02:00
  • 6a04efb122 Rename KERNEL files to include MIPS prefix Martin Kroeker 2020-04-19 15:43:54 +02:00
  • 5afb66812f Update getarch.c Martin Kroeker 2020-04-19 14:55:31 +02:00
  • 0d18f231fc Update getarch.c Martin Kroeker 2020-04-19 13:52:58 +02:00
  • 2f4a8e5bc4 Rename the FORCE entries for 24K and 1004K to include the MIPS prefix Martin Kroeker 2020-04-19 13:22:19 +02:00
  • 4f70512b97 Update kernel.cmake Martin Kroeker 2020-04-19 08:10:26 +02:00