update README

2025-09-04 18:39:13 +08:00 · 2025-09-04 18:39:13 +08:00 · 07a6c32971
parent 79e4fd6ab1
commit 07a6c32971
19 changed files with 94 additions and 48 deletions
--- a/.gitee/ISSUE_TEMPLATE/cp.yml
+++ b/.gitee/ISSUE_TEMPLATE/cp.yml
@ -0,0 +1,36 @@
+name: 赛题(CP)
+description: 参赛题目
+title: "[cp]: "
+labels: ["cp"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        感谢你积极参与比赛，并提交自己希望参加的赛题，当出现一样的题目、一样的性能的时候，以谁创建赛题的时间早者优先。
+  - type: textarea
+    id: desired-solution
+    attributes:
+      label: 你提交的赛题的内容介绍
+      description: 清晰并简洁地描述你希望参赛的内容详细的描述，尽量清晰方便他人了解赛题希望达成的目标。
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: 赛题有对应的预期结果
+      description: 清晰并简洁地描述赛题完成的预期结果，最佳的方式是知道如何测试验证预期结果的正确性。
+    validations:
+      required: false
+  - type: textarea
+    id: additional-context
+    attributes:
+      label: 你有其他相关背景的信息吗？
+      description: 在此处添加有关想法的任何其他上下文或截图。
+    validations:
+      required: false
+  - type: checkboxes
+    attributes:
+      label: 推荐其它选手完成【如选否，请创建后assign赛题给自己】
+      options:
+        - label: 是否希望推荐给别人完成该赛题。
+          required: false
--- a/.gitee/ISSUE_TEMPLATE/feature.yml
+++ b/.gitee/ISSUE_TEMPLATE/feature.yml
@ -1,4 +1,4 @@
-name: 想法(GPUKernelContest)
+name: 想法
 description: 对本赛题提出一个想法或建议
 title: "[idea]: "
 labels: ["idea"]
--- a/.gitee/ISSUE_TEMPLATE/task.yml
+++ b/.gitee/ISSUE_TEMPLATE/task.yml
@ -1,4 +1,4 @@
-name: 任务(GPUKernelContest)
+name: 任务
 description: 对本赛题提出一个任务，用于后续跟踪和执行。
 title: "[task]: "
 labels: ["task"]
--- a/README.md
+++ b/README.md
@ -34,53 +34,68 @@

 ## 🚀 快速上手

-本竞赛旨在评估参赛者在GPU并行计算领域的算法优化能力。为了快速让参赛者进入比赛状态，可选择实现三个核心算法的高性能版本：
+本竞赛旨在评估参赛者在GPU并行计算领域的算法优化能力。为了快速让参赛者进入比赛状态，我们提供了三个核心算法的高性能版本参考，供参赛选手不断优化性能：
 - **ReduceSum**: 高精度归约求和
 - **SortPair**: 键值对稳定排序
 - **TopkPair**: 键值对TopK选择

-### 📥 
+[三个核心算法赛题模板](./cp_template/)
+
+### 📥 选手赛题准备
+
+1. 点击[创建赛题](https://gitee.com/ccf-ai-infra/GPUKernelContest/issues/new?template=cp.yml)
+2. 记录赛题的ID，例如：[ICTN0N](https://gitee.com/ccf-ai-infra/GPUKernelContest/issues/ICTN0N)
+3. Fork仓库并初始化比赛环境(三个核心算法题优化赛题以外自定义的赛题需有入口run.sh脚本，供CI自动测试验证)
+   1. 拷贝赛题样例`cp_template`到赛题`ICTN0N`目录
+        ```
+        ├── S1(说明：第一季比赛名)
+        │   ├── ICTN0N(说明：以赛题ID命名目录存放赛题的PR)
+        |   |   ├── utils
+        │   |   ├── run.sh（说明：作为CI自动测试验证的入口）
+        |   |   └── ……
+        │   └── ……
+        └── S2（说明：第二季比赛名）
+            └── 赛题目录1
+            └── 赛题目录2
+        ```

 ### 编译和测试

+选手赛题目录内提供了编译、测试的脚本，供选手熟悉比赛环境，步骤如下：
+
+```bash
+# ！！！注意参赛选手需要根据自己的赛题ID进入自己初始化的目录！！！！
+cd GPUKernelContest/S1/ICTN0N
+```
+
+
 #### 1. 全量编译和运行
 ```bash
 # 编译并运行所有算法测试（默认行为）
 ./run.sh

-# 仅编译所有算法，不运行测试
-./run.sh --build-only
-
 # 编译并运行单个算法测试
 ./run.sh --run_reduce   # ReduceSum算法
 ./run.sh --run_sort     # SortPair算法
 ./run.sh --run_topk     # TopkPair算法
 ```

-#### 2. 单独编译和运行
+#### 2. 手动运行测试
+
 ```bash
-# 编译并运行ReduceSum算法（默认行为）
-./run_reduce_sum.sh
+# 仅编译所有算法，不运行测试
+./run.sh --build-only

-# 仅编译ReduceSum算法，不运行测试
-./run_reduce_sum.sh --build-only
-
-# 编译并运行SortPair正确性测试
-./run_sort_pair.sh --run correctness
-
-# 编译并运行TopkPair性能测试
-./run_topk_pair.sh --run performance
-```
-
-#### 3. 手动运行测试
-```bash
+# 单个运行不同算法的测试
 ./build/test_reducesum [correctness|performance|all]
 ./build/test_sortpair [correctness|performance|all]
 ./build/test_topkpair [correctness|performance|all]
 ```

+对于如何提交可参考：[如何贡献](how-to-contribute.md)
+
 ### ✅ 参赛要求：
- 提交内容必须可以在沐曦自研 GPU **曦云 C500** 上运行。
+- 提交内容必须可以在MACA软件上运行。
 - 所提交的优化代码将由主办方审核，**需成功合并（Merge）到官方 Gitee 仓库，才算有效提交。**

 ### 📦 提交内容包含：
@ -92,7 +107,7 @@

 ## 📈 评分机制

-每次提交会按以下规则评分：
+每次合并的提交会按以下规则评分：

 ### 🎯 基础得分（Level）：
 | 等级 | 内容描述 | 分值 |
@ -101,7 +116,7 @@
 | Level 2 | 融合优化 2~9 个算子 | 10 分 |
 | Level 3 | 含 MMA（多维矩阵乘）融合算子 | 50 分 |
 | Level 4 | 用于大模型推理的复杂融合算子 | 50 分 |
-| 合并至metax-maca开源项目仓库的每个PR | - | 50 分 |
+| 合并至MACA开源项目仓库的每个PR<需要在赛题提供对应合并的记录，并确保和参赛使用的邮箱一致的提交邮箱> | - | 50 分 |

 ### ✨ 加分项：
 | 内容 | 分值 |
@ -115,18 +130,19 @@

 ---

-## 🏅 排名规则
+## 🏆 排名机制

- 比赛周期：2 个月
- 排名按累计得分排序，取前 12 名！
-
-若得分相同：
-1. 提交次数多者优先
-2. 提交时间早者优先
+1. 评委评分从高到低排序
+2. **评估规则：** 取前 12 名作为最终获奖选手
+3. 若基础得分相同：
+  - 加分项多者优先
+  - 提交数量多者优先
+  - 提交时间早者优先
+4. 当同一参赛选手在本赛题有多个赛题的提交时，多个赛题计算累计得分

 ---

-## 📚 官方参考项目仓库
+## 📚 参考MACA开源项目仓库

 你可以参考以下项目仓库，了解算子开发与提交格式：

@ -137,12 +153,6 @@

 ---

-## 🖥️ 可用资源
-
- 曦云 **C500 GPU 1/2卡**，主办方通过算力券的形式发放给报名的同学。
-
---
-
 ## 💡 术语解释

 - **算子（Operator）**：指深度学习框架中的基本计算模块，例如矩阵乘法、卷积等。
--- a/S1/ICTN0N/build/test_reducesum
+++ b/S1/ICTN0N/build/test_reducesum
--- a/S1/ICTN0N/build/test_sortpair
+++ b/S1/ICTN0N/build/test_sortpair
--- a/S1/ICTN0N/build/test_topkpair
+++ b/S1/ICTN0N/build/test_topkpair
--- a/S1/ICTN0N/reduce_sum_algorithm.maca
+++ b/S1/ICTN0N/reduce_sum_algorithm.maca
--- a/S1/ICTN0N/reduce_sum_performance.yaml
+++ b/S1/ICTN0N/reduce_sum_performance.yaml
@ -1,5 +1,5 @@
 # ReduceSum算法性能测试结果
-# 生成时间: 2025-09-03 22:34:18
+# 生成时间: 2025-09-04 18:32:03

 algorithm: "ReduceSum"
 data_types:
@ -9,18 +9,18 @@ formulas:
  throughput: "elements / time(s) / 1e9 (G/s)"
 performance_data:
  - data_size: 1000000
-    time_ms: 0.048717
-    throughput_gps: 20.526799
+    time_ms: 0.051046
+    throughput_gps: 19.590022
    data_type: "float"
  - data_size: 134217728
-    time_ms: 0.402560
-    throughput_gps: 333.410496
+    time_ms: 0.405018
+    throughput_gps: 331.387385
    data_type: "float"
  - data_size: 536870912
-    time_ms: 1.346586
-    throughput_gps: 398.690510
+    time_ms: 1.351834
+    throughput_gps: 397.142754
    data_type: "float"
  - data_size: 1073741824
-    time_ms: 2.639513
-    throughput_gps: 406.795353
+    time_ms: 2.618675
+    throughput_gps: 410.032451
    data_type: "float"
--- a/S1/ICTN0N/run.sh
+++ b/S1/ICTN0N/run.sh
--- a/S1/ICTN0N/sort_pair_algorithm.maca
+++ b/S1/ICTN0N/sort_pair_algorithm.maca
--- a/S1/ICTN0N/sort_pair_performance.yaml
+++ b/S1/ICTN0N/sort_pair_performance.yaml
--- a/S1/ICTN0N/topk_pair_algorithm.maca
+++ b/S1/ICTN0N/topk_pair_algorithm.maca
--- a/S1/ICTN0N/topk_pair_performance.yaml
+++ b/S1/ICTN0N/topk_pair_performance.yaml
--- a/cp_template/competition_parallel_algorithms.md
+++ b/cp_template/competition_parallel_algorithms.md
--- a/cp_template/reduce_sum_algorithm.maca
+++ b/cp_template/reduce_sum_algorithm.maca
--- a/cp_template/run.sh
+++ b/cp_template/run.sh
--- a/cp_template/sort_pair_algorithm.maca
+++ b/cp_template/sort_pair_algorithm.maca
--- a/cp_template/topk_pair_algorithm.maca
+++ b/cp_template/topk_pair_algorithm.maca