Commit Graph

  • 7ec3c16d82 Remove beta from optimized functions. Chip Kerchner 2024-10-03 13:27:33 -05:00
  • 54afc24e4d Merge pull request #4906 from XiWeiGu/arm64_cmake_small_matrix_opt Martin Kroeker 2024-10-03 20:05:11 +02:00
  • b4495a8fb8 Merge branch 'develop' into arm64_cmake_small_matrix_opt Martin Kroeker 2024-10-03 20:04:52 +02:00
  • 68eefe60b9 Merge pull request #4915 from martin-frbg/issue4907 Martin Kroeker 2024-10-03 18:29:29 +02:00
  • 4f00f02567 Do not add -mabi flags for Loongson when the compiler is flang Martin Kroeker 2024-10-03 16:06:33 +02:00
  • f817f26062 Add simpler EPILOGUE for clang Martin Kroeker 2024-10-03 16:01:10 +02:00
  • a492181665 filter out Loongarch -mabi options for flang-new Martin Kroeker 2024-10-03 15:58:47 +02:00
  • f9249718ce deploy: de421b7764 martin-frbg 2024-10-03 13:54:31 +00:00
  • de421b7764 Merge pull request #4904 from XiWeiGu/la64_cross_cmake Martin Kroeker 2024-10-03 15:53:57 +02:00
  • c0e44259ee deploy: edaf5933c4 martin-frbg 2024-10-02 21:37:44 +00:00
  • edaf5933c4 Merge pull request #4913 from martin-frbg/issue4912 Martin Kroeker 2024-10-02 23:37:15 +02:00
  • 71131406ae Declare the input array in CBLAS_?GEADD as const Martin Kroeker 2024-10-02 18:32:48 +02:00
  • 7cc00f68c9 Remove more duplicate. Chip Kerchner 2024-10-01 11:23:32 -05:00
  • e238a68c03 Remove duplicate. Chip Kerchner 2024-10-01 11:06:23 -05:00
  • f10d47c4bb Merge pull request #4910 from martin-frbg/issue4908 Martin Kroeker 2024-10-01 17:49:12 +02:00
  • 32095b0cbb Remove parameter. Chip Kerchner 2024-10-01 09:32:42 -05:00
  • dd3cb940bd deploy: a1073f5eed martin-frbg 2024-10-01 13:29:53 +00:00
  • a1073f5eed Merge pull request #4900 from XiWeiGu/la64_core_rename Martin Kroeker 2024-10-01 15:29:16 +02:00
  • fa77561396 add openmp option to pkgconfig template Martin Kroeker 2024-10-01 13:32:45 +02:00
  • 176107d23a Add -fopenmp to cflags in pkgconfig file if set Martin Kroeker 2024-10-01 13:31:14 +02:00
  • 0228d36211 move -fopenmp to CFLAGS Martin Kroeker 2024-09-30 21:38:05 +02:00
  • 7087b0a7d0 ARM64: Enable SMALL_MATRIX_OPT when compiling with CMake gxw 2024-09-29 10:31:26 +08:00
  • 30af9278dc LoongArch64: Enable cmake cross-compilation gxw 2024-09-26 16:55:06 +08:00
  • 48698b2b1d LoongArch64: Rename core gxw 2024-09-18 17:20:43 +08:00
  • c8788208c8 Fixing block issue with transpose version. Chip Kerchner 2024-09-27 13:27:03 -05:00
  • d7c0d87cd1 Small changes. Chip Kerchner 2024-09-26 15:21:29 -05:00
  • eb6f3a05ef Common MMA code. Chip Kerchner 2024-09-26 09:28:56 -05:00
  • fb287d17fc Common code. Chip Kerchner 2024-09-25 16:31:36 -05:00
  • 8ab6245771 Small change. Chip Kerchner 2024-09-24 16:50:21 -05:00
  • df19375560 Almost final code for MMA. Chip Kerchner 2024-09-24 16:30:01 -05:00
  • 05aa63e738 More MMA BF16 GEMV code. Chip Kerchner 2024-09-24 12:54:02 -05:00
  • c9ce37d527 Force vector pairs in clang. Chip Kerchner 2024-09-23 08:43:58 -05:00
  • 89a12fa083 MMA BF16 GEMV code. Chip Kerchner 2024-09-23 06:32:14 -05:00
  • e9824ae798 deploy: 92f7a2dc3e martin-frbg 2024-09-19 12:15:38 +00:00
  • 92f7a2dc3e Merge pull request #4899 from martin-frbg/flangmtune Martin Kroeker 2024-09-19 14:15:06 +02:00
  • 969bb949b1 Strip any mtune option from FFLAGS is the compiler is flang-new Martin Kroeker 2024-09-19 11:10:28 +02:00
  • 30733e7d6c deploy: fca86e359c martin-frbg 2024-09-16 09:17:50 +00:00
  • fca86e359c Merge pull request #4887 from goplanid/develop Martin Kroeker 2024-09-16 11:17:19 +02:00
  • 7947970f9d Move common code. Chip Kerchner 2024-09-13 06:22:13 -05:00
  • 60c1519e01 Merge pull request #4896 from martin-frbg/update_azure_mac_hpc Martin Kroeker 2024-09-12 21:09:28 +02:00
  • c8313d9d80 Merge pull request #4895 from martin-frbg/update_homebrewjob Martin Kroeker 2024-09-12 21:09:10 +02:00
  • b588e922a1 Update oneAPI download location for Mac to final Martin Kroeker 2024-09-12 18:13:46 +02:00
  • 4178905fa7 Update version of upload-artifacts following deprecation Martin Kroeker 2024-09-12 16:39:20 +02:00
  • 70ea109d67 deploy: 5f70e245a2 martin-frbg 2024-09-12 13:10:29 +00:00
  • 5f70e245a2 Merge pull request #4894 from martin-frbg/issue4893 Martin Kroeker 2024-09-12 15:09:54 +02:00
  • 383e0b133e remove suppression of gcc14's incompatible pointer error Martin Kroeker 2024-09-11 22:21:09 +02:00
  • 869a169c57 Fix ZAXPYTEST prototype Martin Kroeker 2024-09-11 22:18:14 +02:00
  • 72216d28c2 Fix bug with inc_y adding results twice. Chip Kerchner 2024-09-11 08:47:32 -05:00
  • 2f142ee857 More common code. Chip Kerchner 2024-09-09 14:41:55 -05:00
  • 39fd29f1de Minor improvement and turn off BF16 GEMV forwarding by default. Chip Kerchner 2024-09-08 18:28:31 -05:00
  • 8541b25e1d Special case beta is one. Chip Kerchner 2024-09-06 14:48:48 -05:00
  • 76227e2948 Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV. Chip Kerchner 2024-09-06 14:03:31 -05:00
  • 4894c54055 Improve TN case with further unrolling Deeksha Goplani 2024-09-02 22:22:49 +05:30
  • 4ee4873c2b deploy: 485027563e martin-frbg 2024-08-17 09:47:57 +00:00
  • 485027563e Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize Martin Kroeker 2024-08-17 11:47:26 +02:00
  • 89702e1f4a Fix zero element GEMV test. Chip Kerchner 2024-08-16 11:37:39 -05:00
  • 77f85c7c00 GEMV tests don't like zero elements. Chip Kerchner 2024-08-16 11:15:32 -05:00
  • 868aa857bc Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero. Chip Kerchner 2024-08-16 10:28:10 -05:00
  • b1802f4dc8 Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems. Chip Kerchner 2024-08-16 09:51:37 -05:00
  • f61930eb11 Merge pull request #4882 from martin-frbg/issue4805-3 Martin Kroeker 2024-08-16 11:24:51 +02:00
  • dfba3f8841 restore the pragma as it is reportedly still needed on 3C6000/gcc14.2 Martin Kroeker 2024-08-16 11:23:19 +02:00
  • 54b868f71e deploy: 7129a64d87 martin-frbg 2024-08-16 06:47:47 +00:00
  • 7129a64d87 Merge pull request #4881 from martin-frbg/issue4805-2 Martin Kroeker 2024-08-16 08:47:12 +02:00
  • 49080b631e remove optimizer pragma again Martin Kroeker 2024-08-15 22:15:27 +02:00
  • e05d98d00a expressly use fld.d/fst.d for floating point registers instead of LD/ST macros Martin Kroeker 2024-08-15 22:14:29 +02:00
  • 3ee9e9d8d0 Merge pull request #4879 from martin-frbg/issue4868-2 Martin Kroeker 2024-08-15 22:06:54 +02:00
  • dd71df8fab Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail Martin Kroeker 2024-08-15 20:36:22 +02:00
  • a8d6b0219a Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter Martin Kroeker 2024-08-15 15:35:26 +02:00
  • d24b3cf393 properly fix buffer allocation and assignment Martin Kroeker 2024-08-15 15:32:58 +02:00
  • a0aeba631d Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-15 08:00:00 -05:00
  • eba8615c11 Merge pull request #4876 from martin-frbg/granite Martin Kroeker 2024-08-15 13:50:54 +02:00
  • bc80e7f02d Merge pull request #4878 from martin-frbg/cirrus-androidndk Martin Kroeker 2024-08-15 13:50:09 +02:00
  • 94c9e0b7ad Update ndk version number Martin Kroeker 2024-08-15 11:30:23 +02:00
  • ed0321563a fix installation of NDK in armv7 crossbuild Martin Kroeker 2024-08-15 11:11:07 +02:00
  • fd033467ac Fixed the undefined reference to blas_set_parameter gxw 2024-08-15 16:48:48 +08:00
  • 1b8e40874e Add autodetection support for Intel Granite Rapids as Sapphire Rapids Martin Kroeker 2024-08-15 09:33:42 +02:00
  • cbfe72ca76 deploy: 4944148e66 martin-frbg 2024-08-15 07:32:47 +00:00
  • 4944148e66 Merge pull request #4875 from ChipKerchner/addGEMVtoBF16Test Martin Kroeker 2024-08-15 09:32:11 +02:00
  • a388c4b834 Merge pull request #4872 from chenx97/ls3a-fix-stack-fpr-len Martin Kroeker 2024-08-15 00:10:16 +02:00
  • f24b521709 Merge pull request #4787 from vlad0x00/patch-1 Martin Kroeker 2024-08-15 00:09:53 +02:00
  • 2d84ed7e76 Update README.md Vladimir Nikolić 2024-08-14 14:31:35 -07:00
  • 083faf7556 Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-14 15:56:03 -05:00
  • c23897f585 Add GEMV testing to SBGEMx vs SGEMx testing. Chip Kerchner 2024-08-14 15:55:23 -05:00
  • 0d8ee96f1e Merge pull request #4874 from martin-frbg/issue4869 Martin Kroeker 2024-08-14 22:49:12 +02:00
  • b80671d896 Merge pull request #4871 from martin-frbg/issue4868 Martin Kroeker 2024-08-14 20:53:39 +02:00
  • 6452f7b46d Merge pull request #4873 from ChipKerchner/fixSBGEMMDefaults Martin Kroeker 2024-08-14 19:22:03 +02:00
  • 75472b830a Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-14 10:52:46 -05:00
  • 9842a6cf2f deploy: ca7777de18 martin-frbg 2024-08-14 15:37:07 +00:00
  • ca7777de18 Merge pull request #4870 from chenx97/fix-recursive-make-var Martin Kroeker 2024-08-14 16:03:50 +02:00
  • f6469e21bc move gelqs and geqrs to lapack-deprecated Martin Kroeker 2024-08-14 16:00:43 +02:00
  • 31226740d6 Cleanup of SBGEMM unit test. Chip Kerchner 2024-08-14 08:10:25 -05:00
  • ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A Henry Chen 2024-08-13 14:53:37 +08:00
  • 23b5d66a86 Ensure a memory buffer has been allocated for each thread before invoking it Martin Kroeker 2024-08-14 10:35:44 +02:00
  • 20bdb65882 Fix recursive variable expansion in Makefiles for LOONGSON3A Henry Chen 2024-08-12 16:22:31 +08:00
  • b1737698db Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences. Chip Kerchner 2024-08-13 07:01:21 -05:00
  • 62d1a3cf37 deploy: e5525036e7 martin-frbg 2024-08-13 05:20:43 +00:00
  • e5525036e7 Merge pull request #4865 from martin-frbg/issue4856 Martin Kroeker 2024-08-13 07:20:06 +02:00
  • fd52d09490 Merge pull request #4864 from martin-frbg/issue4862 Martin Kroeker 2024-08-13 00:16:45 +02:00
  • f332ecbf1d deploy: 35dd625adf martin-frbg 2024-08-12 20:06:18 +00:00
  • 35dd625adf Merge pull request #4859 from martin-frbg/cooper_sb Martin Kroeker 2024-08-12 22:05:43 +02:00