trace_synthesis/summarize_design.md
yuyr a84d51a101 1. 增加r1生成综合策略代码和输出;
2. 增加tasks;
3. 增加analysis部分,对策略进行归纳分类,然后进行评测。
2025-04-17 17:40:15 +08:00

18 KiB
Raw Blame History

提示词基本结构

这是prompt基本结构包含6部分内容

# Instruction

# Task 

# Annotation description
## Part 1 
## Part 2 
## Part 3
## Part 4 
...

# Playwright action

# Output format

# Example

Instruction

  • 描述角色和任务要求这部分是固定的所有task都一样以下是指令内容

  • You are an expert in cleaning process data descriptions. Given a task, you are provided with a set of annotation description data for a certain visual LLM related to human user operation videos. Plus, You are provided with full trace of playwright action, whic includes action and url before and after the action.

  • You need to analyze all the descriptive data and ultimately summarize a complete and reasonable user operation description that can accomplish the given task.

  • For each strategy, give a clear list of the low level action sequence.

Task

从tasks/task_id.json中提取提取其中intent字段信息。这是json的内容

{ "sites": [ "gitlab" ], "task_id": 103, "require_login": true, "storage_state": "./.auth/gitlab_state.json", "start_url": "http://localhost:28084", "geolocation": null, "intent_template": "Display the list of issues in the {{repo}} repository that have labels related to {{label}}", "instantiation_dict": { "label": "questions", "repo": "kkroening/ffmpeg-python" }, "intent": "Display the list of issues in the kkroening/ffmpeg-python repository that have labels related to questions", "require_reset": false, "eval": { "eval_types": [ "url_match" ], "reference_answers": null, "reference_url": "http://localhost:28084/kkroening/ffmpeg-python/-/issues/?label_name%5B%5D=question", "program_html": [], "url_note": "GOLD in PRED" }, "intent_template_id": 349 }

Annotation description

这里是VLM根据轨迹视频内容进行记录的动作描述。具体是读取video/task_id.trace/*.txt如果有多个txt按照字母序排序每一文件是记录的动作总结将所有动作总结拼接成

以part1(103.trace_recording_part1.txt)为例:

Step-by-Step Actions in the Video Segment

1. Initial State

  • Action: The video begins with a view of a GitLab projects page. The page displays a list of projects under "Yours," with details such as project names, descriptions, and update times.
  • Page Changes: No immediate changes occur as this is the starting point.
  • Possible Purpose: The initial state sets the context for navigating through the user's projects on GitLab.

2. Hovering Over Projects

  • Action: I move the cursor over several project entries in the list.
  • Page Changes: As the cursor hovers over each project, a tooltip appears, displaying the full URL of the project.
  • Possible Purpose: Hovering over the projects likely aims to review the project URLs or gather more information about each project before selecting one.

3. Scrolling Down the Page

  • Action: I scroll down the page using the mouse wheel.
  • Page Changes: The list of projects moves upward, revealing additional projects further down the list.
  • Possible Purpose: Scrolling down is intended to view more projects that are not initially visible on the screen.

4. Hovering Over Additional Projects

  • Action: After scrolling, I continue to hover over different project entries.
  • Page Changes: Similar to the previous hovering action, tooltips appear showing the full URLs of the newly visible projects.
  • Possible Purpose: This continued hovering suggests an ongoing review of project details, possibly to locate a specific project or assess available options.

5. Stopping at a Specific Project

  • Action: I stop scrolling and hover over a project named "Byte Blazea11yproject.contributor.me / cloud-to-butt."
  • Page Changes: The tooltip for this project appears, displaying its full URL.
  • Possible Purpose: Pausing at this specific project indicates interest in it, perhaps for further interaction or selection.

6. Clicking on the Project

  • Action: I click on the project "Byte Blazea11yproject.contributor.me / cloud-to-butt."
  • Page Changes: The page transitions from the projects list to the detailed view of the selected project. The new page shows the project's repository files, commit history, and other related information.
  • Possible Purpose: Clicking on the project is to access its detailed page, likely for viewing its contents, making edits, or performing other project-specific actions.

Summary

In this video segment, I begin by reviewing a list of projects on my GitLab page. I hover over various projects to see their URLs, scroll down to view more projects, and then select a specific project ("Byte Blazea11yproject.contributor.me / cloud-to-butt") by clicking on it. This sequence of actions suggests a focused effort to locate and access a particular project for further interaction.

Playwright action

这里是playwright记录的trace.zip文件中提取的细粒度动作指令包括动作uid动作顺序编号从0开始playwright指令动作前后的网页url。 从 trace_extract/task_id.trace.zip.content.json 读取,这是示例:

[{"action_uid": "action_0", "idx": 0, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/"}}, {"action_uid": "action_1", "idx": 1, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_2", "idx": 2, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_3", "idx": 3, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_4", "idx": 4, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_5", "idx": 5, "action_repr": "frame.pressget_by_placeholder("Filter by name")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeg-python&sort=name_asc"}}, {"action_uid": "action_6", "idx": 6, "action_repr": "frame.pressget_by_placeholder("Filter by name")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_7", "idx": 7, "action_repr": "frame.pressget_by_placeholder("Filter by name")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_8", "idx": 8, "action_repr": "frame.pressget_by_placeholder("Filter by name")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_9", "idx": 9, "action_repr": "frame.pressget_by_placeholder("Filter by name")ArrowRight", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?sort=name_asc&name=kkroening%2Fffmpeython&sort=name_asc"}}, {"action_uid": "action_10", "idx": 10, "action_repr": "frame.pressget_by_placeholder("Filter by name")Enter", "before": {"url": "about:blank"}, "after": {"url": "about:blank"}}, {"action_uid": "action_11", "idx": 11, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "about:blank"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_12", "idx": 12, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "text_Yours 23 Starred 3 Explore Topics", "idx": 13, "action_repr": "frame.clickget_by_text("Yours 23 Starred 3 Explore Topics")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_14", "idx": 14, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_15", "idx": 15, "action_repr": "frame.clickget_by_placeholder("Filter by name")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_16", "idx": 16, "action_repr": "frame.pressget_by_placeholder("Filter by name")Control+c", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_17", "idx": 17, "action_repr": "frame.clickget_by_placeholder("Search GitLab")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "action_18", "idx": 18, "action_repr": "frame.pressget_by_placeholder("Search GitLab")Enter", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/?name=kkroening&sort=name_asc"}}, {"action_uid": "link_Users 1", "idx": 19, "action_repr": "frame.clickget_by_role("link", name="Users 1")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?search=kkroening&nav_source=navbar"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?search=kkroening&nav_source=navbar"}}, {"action_uid": "link_Karl Kroening @kkroening", "idx": 20, "action_repr": "frame.clickget_by_role("link", name="Karl Kroening @kkroening")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?scope=users&search=kkroening"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/search?scope=users&search=kkroening"}}, {"action_uid": "link_Personal projects", "idx": 21, "action_repr": "frame.clickget_by_role("link", name="Personal projects")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening"}}, {"action_uid": "link_ffmpeg-python", "idx": 22, "action_repr": "frame.clickget_by_role("link", name="ffmpeg-python")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/users/kkroening/projects"}}, {"action_uid": "action_23", "idx": 23, "action_repr": "frame.clicklocator("a").filter(has_text="Issues 402")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "filtered-search-token-segment", "idx": 24, "action_repr": "frame.clickget_by_test_id("filtered-search-token-segment")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_Label", "idx": 25, "action_repr": "frame.clickget_by_role("menuitem", name="Label")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_= is", "idx": 26, "action_repr": "frame.clickget_by_role("menuitem", name="= is", exact=True)", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "menuitem_question", "idx": 27, "action_repr": "frame.clickget_by_role("menuitem", name="question")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}, {"action_uid": "search-button", "idx": 28, "action_repr": "frame.clickget_by_test_id("search-button")", "before": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}, "after": {"url": "http://ec2-3-135-39-80.us-east-2.compute.amazonaws.com:8023/kkroening/ffmpeg-python/-/issues"}}]

Output format

  • 先总结整个任务的Objective然后按照Strategy-SubStrategy-action三级层次来给出整个过程
  • 接着给出整个操作流程后的观察和有趣的发现最后严格按照json格式输出三级层次的过程描述。
  • 最后的输出json应该是包在{json}之间最底层动作需要包含描述、对应的playwright动作指令顺序编号以及具体指令内容。

Example

Complete User Operation Description to Display Labeled Issues in kkroening/ffmpeg-python

Objective: Filter and display all issues labeled as "question" in the kkroening/ffmpeg-python repository.


Strategy 1: Navigate to the Repository

Low-Level Action Sequence:

  1. Search for the user "kkroening"
    • Click the global search bar (placeholder: "Search GitLab").
    • Type "kkroening" and press Enter.
  2. Select the user from results
    • Click the "Users" tab in search results.
    • Click on "Karl Kroening @kkroening" in the user list.
  3. Access the repository
    • Navigate to the "Personal projects" section.
    • Click on the "ffmpeg-python" project.

Strategy 2: Filter Issues by Label

Low-Level Action Sequence:

  1. Open the Issues tab
    • Scroll to the left sidebar menu.
    • Click the "Issues" tab (displaying the count, e.g., "Issues 402").
  2. Apply label filtering
    • Click the search/filter bar in the issues list.
    • Select the "Label" dropdown from the filter options.
    • Type or select "question" from the label dropdown.
    • Click the search/apply button to confirm the filter.

Final Oberservation

The issues list will refresh to show only issues with the "question" label. The URL will reflect the filter:
.../ffmpeg-python/-/issues/?label_name[]=question.


Key Observations from Playwright Trace

  • The final URL after filtering:
    http://ec2-3-135-39-80.../ffmpeg-python/-/issues/?label_name%5B%5D=question
    confirms the "question" label filter is applied.
  • Critical interactions include selecting the "Label" dropdown and explicitly choosing "question" to refine results.

Final output

[{
  "strategy" : "Navigate to the Repository",
  "substrategies": [
    {
      "substrategy": "Search for the user \"kkroening\"",
      "actions" : [
        {
          "description": "Click the global search bar (placeholder: \"Search GitLab\"). "
          "playwright_idx" : 18,
          "playwright_instruction" : "frame.pressget_by_placeholder(\"Search GitLab\")Enter"
        }
        ...
      ]
    },
    {
      "substrategy": "Select the user from results",
      "actions" : [
        ...
      ]
    }
  ]
},
{
  "strategy" : "Filter Issues by Label",
  "substrategies" : [
    ...
  ]
}]

任务列表

遍历trace目录下的所有文件譬如 4.trace.zip代表存在task_id =4。