Commit Graph

  • 2f142ee857 More common code. Chip Kerchner 2024-09-09 14:41:55 -0500
  • 39fd29f1de Minor improvement and turn off BF16 GEMV forwarding by default. Chip Kerchner 2024-09-08 18:28:31 -0500
  • 8541b25e1d Special case beta is one. Chip Kerchner 2024-09-06 14:48:48 -0500
  • 76227e2948 Initial commit for vectorized BF16 GEMV. Added GEMM_GEMV_FORWARD_BF16 to enable using BF16 GEMV for one dimension matrices. Updated unit test to support inc_x != 1 or inc_y for GEMV. Chip Kerchner 2024-09-06 14:03:31 -0500
  • 4894c54055 Improve TN case with further unrolling Deeksha Goplani 2024-09-02 22:22:49 +0530
  • 060c863515 BLD: Add Windows build Mateusz Sokół 2024-08-25 18:10:15 +0000
  • 6ce99e3148
    MAINT: Add a configuration for meson format Rohit Goswami 2024-08-17 16:37:15 -0500
  • 3f9ffecf86
    MAINT: Fixup hardcoded build folder Rohit Goswami 2024-08-17 16:32:41 -0500
  • 4ee4873c2b deploy: 485027563e martin-frbg 2024-08-17 09:47:57 +0000
  • 485027563e
    Merge pull request #4883 from ChipKerchner/fixSGEMMUnitTestZeroSize Martin Kroeker 2024-08-17 11:47:26 +0200
  • 89702e1f4a Fix zero element GEMV test. Chip Kerchner 2024-08-16 11:37:39 -0500
  • 77f85c7c00 GEMV tests don't like zero elements. Chip Kerchner 2024-08-16 11:15:32 -0500
  • 868aa857bc Change malloc zero to return one byte and update the SBGEMM test to again use sizes of zero. Chip Kerchner 2024-08-16 10:28:10 -0500
  • b1802f4dc8 Fix unit test to start at 1 instead of 0 - since malloc zero bytes fails on some systems. Chip Kerchner 2024-08-16 09:51:37 -0500
  • f61930eb11
    Merge pull request #4882 from martin-frbg/issue4805-3 Martin Kroeker 2024-08-16 11:24:51 +0200
  • dfba3f8841
    restore the pragma as it is reportedly still needed on 3C6000/gcc14.2 Martin Kroeker 2024-08-16 11:23:19 +0200
  • 54b868f71e deploy: 7129a64d87 martin-frbg 2024-08-16 06:47:47 +0000
  • 7129a64d87
    Merge pull request #4881 from martin-frbg/issue4805-2 Martin Kroeker 2024-08-16 08:47:12 +0200
  • 49080b631e
    remove optimizer pragma again Martin Kroeker 2024-08-15 22:15:27 +0200
  • e05d98d00a
    expressly use fld.d/fst.d for floating point registers instead of LD/ST macros Martin Kroeker 2024-08-15 22:14:29 +0200
  • 3ee9e9d8d0
    Merge pull request #4879 from martin-frbg/issue4868-2 Martin Kroeker 2024-08-15 22:06:54 +0200
  • dd71df8fab
    Merge pull request #4880 from ChipKerchner/betterPowerGEMVTail Martin Kroeker 2024-08-15 20:36:22 +0200
  • a8d6b0219a
    Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter Martin Kroeker 2024-08-15 15:35:26 +0200
  • d24b3cf393
    properly fix buffer allocation and assignment Martin Kroeker 2024-08-15 15:32:58 +0200
  • a0aeba631d Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-15 08:00:00 -0500
  • eba8615c11
    Merge pull request #4876 from martin-frbg/granite Martin Kroeker 2024-08-15 13:50:54 +0200
  • bc80e7f02d
    Merge pull request #4878 from martin-frbg/cirrus-androidndk Martin Kroeker 2024-08-15 13:50:09 +0200
  • 94c9e0b7ad
    Update ndk version number Martin Kroeker 2024-08-15 11:30:23 +0200
  • ed0321563a
    fix installation of NDK in armv7 crossbuild Martin Kroeker 2024-08-15 11:11:07 +0200
  • fd033467ac Fixed the undefined reference to blas_set_parameter gxw 2024-08-15 16:48:48 +0800
  • 1b8e40874e
    Add autodetection support for Intel Granite Rapids as Sapphire Rapids Martin Kroeker 2024-08-15 09:33:42 +0200
  • cbfe72ca76 deploy: 4944148e66 martin-frbg 2024-08-15 07:32:47 +0000
  • 4944148e66
    Merge pull request #4875 from ChipKerchner/addGEMVtoBF16Test Martin Kroeker 2024-08-15 09:32:11 +0200
  • a388c4b834
    Merge pull request #4872 from chenx97/ls3a-fix-stack-fpr-len Martin Kroeker 2024-08-15 00:10:16 +0200
  • f24b521709
    Merge pull request #4787 from vlad0x00/patch-1 Martin Kroeker 2024-08-15 00:09:53 +0200
  • 2d84ed7e76
    Update README.md Vladimir Nikolić 2024-08-14 14:31:35 -0700
  • 083faf7556 Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-14 15:56:03 -0500
  • c23897f585 Add GEMV testing to SBGEMx vs SGEMx testing. Chip Kerchner 2024-08-14 15:55:23 -0500
  • 0d8ee96f1e
    Merge pull request #4874 from martin-frbg/issue4869 Martin Kroeker 2024-08-14 22:49:12 +0200
  • b80671d896
    Merge pull request #4871 from martin-frbg/issue4868 Martin Kroeker 2024-08-14 20:53:39 +0200
  • 6452f7b46d
    Merge pull request #4873 from ChipKerchner/fixSBGEMMDefaults Martin Kroeker 2024-08-14 19:22:03 +0200
  • 75472b830a Merge branch 'develop' into betterPowerGEMVTail Chip Kerchner 2024-08-14 10:52:46 -0500
  • 9842a6cf2f deploy: ca7777de18 martin-frbg 2024-08-14 15:37:07 +0000
  • ca7777de18
    Merge pull request #4870 from chenx97/fix-recursive-make-var Martin Kroeker 2024-08-14 16:03:50 +0200
  • f6469e21bc
    move gelqs and geqrs to lapack-deprecated Martin Kroeker 2024-08-14 16:00:43 +0200
  • 31226740d6 Cleanup of SBGEMM unit test. Chip Kerchner 2024-08-14 08:10:25 -0500
  • 0701835710
    Merge pull request #24 from HaoZeke/sharedLib Rohit Goswami 2024-08-14 06:03:04 -0700
  • 04d9a533bb BLD: Use `both_libraries` to build libs Mateusz Sokół 2024-08-14 10:45:26 +0000
  • ef94b96530 Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A Henry Chen 2024-08-13 14:53:37 +0800
  • 23b5d66a86
    Ensure a memory buffer has been allocated for each thread before invoking it Martin Kroeker 2024-08-14 10:35:44 +0200
  • 20bdb65882 Fix recursive variable expansion in Makefiles for LOONGSON3A Henry Chen 2024-08-12 16:22:31 +0800
  • adea569542 BLD: Create OpenBLAS shared object Mateusz Sokół 2024-08-13 09:42:46 +0000
  • b1737698db Fix DEFAULTS in SBGEMM for POWER10. Also comparisons for SBGEMM unit test can be exactly due to epilison differences. Chip Kerchner 2024-08-13 07:01:21 -0500
  • 62d1a3cf37 deploy: e5525036e7 martin-frbg 2024-08-13 05:20:43 +0000
  • e5525036e7
    Merge pull request #4865 from martin-frbg/issue4856 Martin Kroeker 2024-08-13 07:20:06 +0200
  • fd52d09490
    Merge pull request #4864 from martin-frbg/issue4862 Martin Kroeker 2024-08-13 00:16:45 +0200
  • f332ecbf1d deploy: 35dd625adf martin-frbg 2024-08-12 20:06:18 +0000
  • 35dd625adf
    Merge pull request #4859 from martin-frbg/cooper_sb Martin Kroeker 2024-08-12 22:05:43 +0200
  • a48b117636
    Update version information for 0.3.28 Martin Kroeker 2024-08-12 18:22:20 +0200
  • da6393ab91 set larger threshold for POWER10 Hong Bo Peng 2024-08-12 09:13:01 -0400
  • d8f740791a
    tweak threshold a little more to cover POWER10 fma Martin Kroeker 2024-08-12 14:50:49 +0200
  • 73e13b0273
    flesh out HERK prototype Martin Kroeker 2024-08-12 14:45:40 +0200
  • 824306baab
    flesh out HERK prototype Martin Kroeker 2024-08-12 14:44:13 +0200
  • cf98f7afc4
    Merge pull request #23 from HaoZeke/mesonDocs Rohit Goswami 2024-08-12 11:27:20 +0000
  • ff42a9f4fb DOC: Meson build docs Mateusz Sokół 2024-08-09 14:52:38 +0200
  • 05a72c7a71
    Update azure-pipelines.yml Martin Kroeker 2024-08-11 10:42:17 +0200
  • 7ca835a82c
    address clang array overflow warning Martin Kroeker 2024-08-10 13:44:56 +0200
  • a87c4d26dd
    Merge pull request #4857 from nekopsykose/ppc Martin Kroeker 2024-08-10 00:15:28 +0200
  • 1265eee85c fix cmake typo for power10 cc version check psykose 2024-08-09 20:38:05 +0200
  • 6d31ff0b1e
    Merge pull request #17 from HaoZeke/multiArch Rohit Goswami 2024-08-09 08:22:53 +0000
  • f0e9e93a2b deploy: cb38d666da martin-frbg 2024-08-09 01:41:29 +0000
  • cd3945b998
    Update version to 0.3.28.dev Martin Kroeker 2024-08-08 23:09:45 +0200
  • cbd321aecb
    Update versin to 0.3.28.dev Martin Kroeker 2024-08-08 23:08:52 +0200
  • cb38d666da
    Merge pull request #4855 from OpenMathLib/release-0.3.0 Martin Kroeker 2024-08-08 23:08:07 +0200
  • 5ef8b19646
    Merge pull request #4854 from OpenMathLib/develop v0.3.28 release-0.3.0 Martin Kroeker 2024-08-08 22:41:46 +0200
  • 884a949a0d
    Merge branch 'release-0.3.0' into develop Martin Kroeker 2024-08-08 22:41:26 +0200
  • 116bc767d8
    Update version to 0.3.28 Martin Kroeker 2024-08-08 22:23:02 +0200
  • 91d6722a3d
    Update version to 0.3.28 Martin Kroeker 2024-08-08 22:22:24 +0200
  • 2c8e001efe
    Merge pull request #4853 from martin-frbg/changelog0328 Martin Kroeker 2024-08-08 21:14:40 +0200
  • e33ee60651 deploy: 1c2bfea1bb martin-frbg 2024-08-08 17:17:44 +0000
  • 1c2bfea1bb
    Merge pull request #4852 from martin-frbg/fix4814 Martin Kroeker 2024-08-08 19:16:48 +0200
  • 1df95bb23a
    Update Changelog.txt for 0.3.28 Martin Kroeker 2024-08-08 18:51:25 +0200
  • 7878976236
    disable forwarding from SBGEMM to SBGEMV for now Martin Kroeker 2024-08-08 18:03:38 +0200
  • af0e7f1c8a BLD: Support x86_64 and arm64 architectures Mateusz Sokół 2024-08-06 18:55:54 +0200
  • d92cc96978
    Merge pull request #4851 from martin-frbg/test3m Martin Kroeker 2024-08-08 00:07:17 +0200
  • 76db713e79
    fix invocation of GEMM3M tests Martin Kroeker 2024-08-07 21:37:20 +0200
  • deae7cf1ec
    Merge pull request #4850 from martin-frbg/generic_3m Martin Kroeker 2024-08-07 21:35:38 +0200
  • 46e331a917
    remove the unworkable GEMM3M restriction from GENERIC again Martin Kroeker 2024-08-07 19:41:10 +0200
  • ccc23338d7
    have the dummy GEMM3M kernel at least forward to regular GEMM Martin Kroeker 2024-08-07 19:39:02 +0200
  • b3e2d00d43
    fix invocation of gemm3m tests Martin Kroeker 2024-08-07 16:51:53 +0200
  • fe0a69e308 even less invasive Harmen Stoppels 2024-08-07 16:43:45 +0200
  • f49371c1ba Set CMake 3.0 policies to NEW Harmen Stoppels 2024-08-07 16:40:11 +0200
  • 1ef9f24b39 Revert "require consistent minimal cmake version" Harmen Stoppels 2024-08-07 16:37:02 +0200
  • a24acffaef deploy: 753c7ebe17 martin-frbg 2024-08-07 12:10:30 +0000
  • 753c7ebe17
    Merge pull request #4835 from martin-frbg/revertwin4359 Martin Kroeker 2024-08-07 14:09:32 +0200
  • 5b07ec643c require consistent minimal cmake version Harmen Stoppels 2024-08-07 09:43:47 +0200
  • b0ac6d8f10 deploy: 3b8d7dfdca martin-frbg 2024-08-06 22:15:41 +0000
  • 3b8d7dfdca
    Merge pull request #4846 from martin-frbg/lapack1025 Martin Kroeker 2024-08-07 00:04:37 +0200
  • 447d66a9e8
    Assume cross-compilation if EMBEDDED was specified Martin Kroeker 2024-08-06 23:50:20 +0200
  • 797ae08dbe
    Add explanation of LAPACK_STRLEN Martin Kroeker 2024-08-06 21:38:00 +0200