crawlee/misc/temp_analysis/test_run copy.py
2025-04-23 12:14:50 +08:00

85 lines
4.1 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# modified version of the task proposer agent prompt from https://arxiv.org/pdf/2502.11357
# System prompt
cot_system_prompt = """
What does this webpage show? Imagine you are a real user on this webpage. Given the webpage
screenshot or ocr result and parsed HTML/accessibility tree and the task description, please provide
the first action towards completing that task.
Do the following step by step:
1. Given the webpage screenshot or ocr result and parsed HTML/accessibility tree, generate the first action
towards completing that task (in natural language form).
2. Given the webpage screenshot or ocr result, parsed HTML/accessibility tree, and the natural language
action, generate the grounded version of that action.
ACTION SPACE: Your action space is: ['click [element ID]', 'type [element ID] [content]',
'select [element ID] [content of option to select]', 'scroll [up]', 'scroll [down]', and 'stop'].
Action output should follow the syntax as given below:
click [element ID]: This action clicks on an element with a specific ID on the webpage.
type [element ID] [content]: Use this to type the content into the field with id. By default, the
"Enter" key is pressed after typing. Both the content and the ID should be within square braces
as per the syntax.
select [element ID] [content of option to select]: Select an option from a dropdown menu. The
content of the option to select should be within square braces. When you get (select an option)
tags from the accessibility tree, you need to select the serial number (element_id) corresponding
to the select tag, not the option, and select the most likely content corresponding to the option as
input.
scroll [down]: Scroll the page down.
scroll [up]: Scroll the page up.
IMPORTANT:
To be successful, it is important to STRICTLY follow the below rules:
Action generation rules:
1. You should generate a single atomic action at each step.
2. The action should be an atomic action from the given vocabulary - click, type, select, scroll
(up or down), or stop.
3. The arguments to each action should be within square braces. For example, "click [127]",
"type [43] [content to type]", "scroll [up]", "scroll [down]".
4. The natural language form of action (corresponding to the field "action_in_natural_language")
should be consistent with the grounded version of the action (corresponding to the field "grounded
_action"). Do NOT add any additional information in the grounded action. For example, if a
particular element ID is specified in the grounded action, a description of that element must be
present in the natural language action.
5. If the type action is selected, the natural language form of action ("action_in_natural_language") should always specify the actual text to be typed.
6. You should issue a “stop” action if the current webpage asks to log in or for credit card
information.
7. To input text, there is NO need to click the textbox first, directly type content. After typing,
the system automatically hits the ENTER key.
8. STRICTLY Avoid repeating the same action (click/type) if the webpage remains unchanged.
You may have selected the wrong web element.
9. Do NOT use quotation marks in the action generation.
OUTPUT FORMAT:
Please give a short analysis of the screenshot, parsed
HTML/accessibility tree, then put your answer within ``` ```, for example,
"In summary, the proposed task and the corresponding action is: ```{
"action_in_natural_language":<ACTION_IN_NATURAL_LANGUAGE>:str,
"grounded_action": <ACTION>:str}"```
"""
# User prompt
cot_user_prompt = """
Website URL: {INIT_URL}
Parsed HTML/Accessibility Tree: {A11Y_TREE}
Screenshot ocr result: {SCREENSHOT}
Task description: {TASK_DESCRIPTION}
"""
# load json from path/processed_3_with_analysis_20250324163759.json
with open('path/processed_3_with_analysis_20250324163759.json', 'r') as f:
data = json.load(f)
# loop through each key in the json
for key, value in data.items():
# print the key
url = key
shortestPathsMeta = value['shortestPathsMeta']
for shortestPathMeta in shortestPathsMeta:
chainTexts = shortestPathMeta['chainTexts']
for chainText in chainTexts:
print(chainText)