trace_synthesis/summarize_design.md
yuyr a84d51a101 1. 增加r1生成综合策略代码和输出;
2. 增加tasks;
3. 增加analysis部分,对策略进行归纳分类,然后进行评测。
2025-04-17 17:40:15 +08:00

202 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 提示词基本结构
这是prompt基本结构包含6部分内容
```
# Instruction
# Task
# Annotation description
## Part 1
## Part 2
## Part 3
## Part 4
...
# Playwright action
# Output format
# Example
```
# Instruction
- 描述角色和任务要求这部分是固定的所有task都一样以下是指令内容
- You are an expert in cleaning process data descriptions. Given a task, you are provided with a set of annotation description
data for a certain visual LLM related to human user operation videos. Plus, You are provided with full trace of playwright action,
whic includes action and url before and after the action.
- You need to analyze all the descriptive data and ultimately summarize a complete and reasonable user operation description that can accomplish the given task.
- For each strategy, give a clear list of the low level action sequence.
# Task
从tasks/task_id.json中提取提取其中intent字段信息。这是json的内容
{
"sites": [
"gitlab"
],
"task_id": 103,
"require_login": true,
"storage_state": "./.auth/gitlab_state.json",
"start_url": "http://localhost:28084",
"geolocation": null,
"intent_template": "Display the list of issues in the {{repo}} repository that have labels related to {{label}}",
"instantiation_dict": {
"label": "questions",
"repo": "kkroening/ffmpeg-python"
},
"intent": "Display the list of issues in the kkroening/ffmpeg-python repository that have labels related to questions",
"require_reset": false,
"eval": {
"eval_types": [
"url_match"
],
"reference_answers": null,
"reference_url": "http://localhost:28084/kkroening/ffmpeg-python/-/issues/?label_name%5B%5D=question",
"program_html": [],
"url_note": "GOLD in PRED"
},
"intent_template_id": 349
}
# Annotation description
这里是VLM根据轨迹视频内容进行记录的动作描述。具体是读取video/task_id.trace/*.txt如果有多个txt按照字母序排序每一文件是记录的动作总结将所有动作总结拼接成
以part1(103.trace_recording_part1.txt)为例:
### Step-by-Step Actions in the Video Segment
#### 1. **Initial State**
- **Action:** The video begins with a view of a GitLab projects page. The page displays a list of projects under "Yours," with details such as project names, descriptions, and update times.
- **Page Changes:** No immediate changes occur as this is the starting point.
- **Possible Purpose:** The initial state sets the context for navigating through the user's projects on GitLab.
#### 2. **Hovering Over Projects**
- **Action:** I move the cursor over several project entries in the list.
- **Page Changes:** As the cursor hovers over each project, a tooltip appears, displaying the full URL of the project.
- **Possible Purpose:** Hovering over the projects likely aims to review the project URLs or gather more information about each project before selecting one.
#### 3. **Scrolling Down the Page**
- **Action:** I scroll down the page using the mouse wheel.
- **Page Changes:** The list of projects moves upward, revealing additional projects further down the list.
- **Possible Purpose:** Scrolling down is intended to view more projects that are not initially visible on the screen.
#### 4. **Hovering Over Additional Projects**
- **Action:** After scrolling, I continue to hover over different project entries.
- **Page Changes:** Similar to the previous hovering action, tooltips appear showing the full URLs of the newly visible projects.
- **Possible Purpose:** This continued hovering suggests an ongoing review of project details, possibly to locate a specific project or assess available options.
#### 5. **Stopping at a Specific Project**
- **Action:** I stop scrolling and hover over a project named "Byte Blazea11yproject.contributor.me / cloud-to-butt."
- **Page Changes:** The tooltip for this project appears, displaying its full URL.
- **Possible Purpose:** Pausing at this specific project indicates interest in it, perhaps for further interaction or selection.
#### 6. **Clicking on the Project**
- **Action:** I click on the project "Byte Blazea11yproject.contributor.me / cloud-to-butt."
- **Page Changes:** The page transitions from the projects list to the detailed view of the selected project. The new page shows the project's repository files, commit history, and other related information.
- **Possible Purpose:** Clicking on the project is to access its detailed page, likely for viewing its contents, making edits, or performing other project-specific actions.
### Summary
In this video segment, I begin by reviewing a list of projects on my GitLab page. I hover over various projects to see their URLs, scroll down to view more projects, and then select a specific project ("Byte Blazea11yproject.contributor.me / cloud-to-butt") by clicking on it. This sequence of actions suggests a focused effort to locate and access a particular project for further interaction.
# Playwright action
这里是playwright记录的trace.zip文件中提取的细粒度动作指令包括动作uid动作顺序编号从0开始playwright指令动作前后的网页url。 从 trace_extract/task_id.trace.zip.content.json 读取,这是示例:
[{"action_uid": "action_0", "idx": 0, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/"}}, {"action_uid": "action_1", "idx": 1, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_2", "idx": 2, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_3", "idx": 3, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_4", "idx": 4, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_5", "idx": 5, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_6", "idx": 6, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_7", "idx": 7, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_8", "idx": 8, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_9", "idx": 9, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_10", "idx": 10, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")Enter", "before": {"url": "about:blank"}, "after": {"url": "about:blank"}}, {"action_uid": "action_11", "idx": 11, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "about:blank"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_12", "idx": 12, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "text_Yours 23 Starred 3 Explore Topics", "idx": 13, "action_repr": "frame.clickget_by_text(\"Yours 23 Starred 3 Explore Topics\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_14", "idx": 14, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_15", "idx": 15, "action_repr": "frame.clickget_by_placeholder(\"Filter by name\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_16", "idx": 16, "action_repr": "frame.pressget_by_placeholder(\"Filter by name\")Control+c", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_17", "idx": 17, "action_repr": "frame.clickget_by_placeholder(\"Search GitLab\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_18", "idx": 18, "action_repr": "frame.pressget_by_placeholder(\"Search GitLab\")Enter", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "link_Users 1", "idx": 19, "action_repr": "frame.clickget_by_role(\"link\", name=\"Users 1\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?search=kkroening&nav_source=navbar"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?search=kkroening&nav_source=navbar"}}, {"action_uid": "link_Karl Kroening @kkroening", "idx": 20, "action_repr": "frame.clickget_by_role(\"link\", name=\"Karl Kroening @kkroening\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?scope=users&search=kkroening"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?scope=users&search=kkroening"}}, {"action_uid": "link_Personal projects", "idx": 21, "action_repr": "frame.clickget_by_role(\"link\", name=\"Personal projects\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening"}}, {"action_uid": "link_ffmpeg-python", "idx": 22, "action_repr": "frame.clickget_by_role(\"link\", name=\"ffmpeg-python\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/users/kkroening/projects"}}, {"action_uid": "action_23", "idx": 23, "action_repr": "frame.clicklocator(\"a\").filter(has_text=\"Issues 402\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "filtered-search-token-segment", "idx": 24, "action_repr": "frame.clickget_by_test_id(\"filtered-search-token-segment\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_Label", "idx": 25, "action_repr": "frame.clickget_by_role(\"menuitem\", name=\"Label\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_= is", "idx": 26, "action_repr": "frame.clickget_by_role(\"menuitem\", name=\"= is\", exact=True)", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_question", "idx": 27, "action_repr": "frame.clickget_by_role(\"menuitem\", name=\"question\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "search-button", "idx": 28, "action_repr": "frame.clickget_by_test_id(\"search-button\")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}]
# Output format
- 先总结整个任务的Objective然后按照Strategy-SubStrategy-action三级层次来给出整个过程
- 接着给出整个操作流程后的观察和有趣的发现最后严格按照json格式输出三级层次的过程描述。
- 最后的输出json应该是包在```{json}```之间最底层动作需要包含描述、对应的playwright动作指令顺序编号以及具体指令内容。
# Example
### Complete User Operation Description to Display Labeled Issues in kkroening/ffmpeg-python
**Objective:** Filter and display all issues labeled as "question" in the kkroening/ffmpeg-python repository.
---
#### **Strategy 1: Navigate to the Repository**
**Low-Level Action Sequence:**
1. **Search for the user "kkroening"**
- Click the global search bar (placeholder: "Search GitLab").
- Type "kkroening" and press `Enter`.
2. **Select the user from results**
- Click the "Users" tab in search results.
- Click on "Karl Kroening @kkroening" in the user list.
3. **Access the repository**
- Navigate to the "Personal projects" section.
- Click on the "ffmpeg-python" project.
---
#### **Strategy 2: Filter Issues by Label**
**Low-Level Action Sequence:**
1. **Open the Issues tab**
- Scroll to the left sidebar menu.
- Click the "Issues" tab (displaying the count, e.g., "Issues 402").
2. **Apply label filtering**
- Click the search/filter bar in the issues list.
- Select the "Label" dropdown from the filter options.
- Type or select "question" from the label dropdown.
- Click the search/apply button to confirm the filter.
---
#### **Final Oberservation**
The issues list will refresh to show only issues with the "question" label. The URL will reflect the filter:
`.../ffmpeg-python/-/issues/?label_name[]=question`.
---
### Key Observations from Playwright Trace
- The final URL after filtering:
`http://ec2-3-135-39-80.../ffmpeg-python/-/issues/?label_name%5B%5D=question`
confirms the "question" label filter is applied.
- Critical interactions include selecting the "Label" dropdown and explicitly choosing "question" to refine results.
### Final output
```
[{
"strategy" : "Navigate to the Repository",
"substrategies": [
{
"substrategy": "Search for the user \"kkroening\"",
"actions" : [
{
"description": "Click the global search bar (placeholder: \"Search GitLab\"). "
"playwright_idx" : 18,
"playwright_instruction" : "frame.pressget_by_placeholder(\"Search GitLab\")Enter"
}
...
]
},
{
"substrategy": "Select the user from results",
"actions" : [
...
]
}
]
},
{
"strategy" : "Filter Issues by Label",
"substrategies" : [
...
]
}]
```
# 任务列表
遍历trace目录下的所有文件譬如 4.trace.zip代表存在task_id =4。