trace_synthesis/tasks/165.json
yuyr a84d51a101 1. 增加r1生成综合策略代码和输出;
2. 增加tasks;
3. 增加analysis部分,对策略进行归纳分类,然后进行评测。
2025-04-17 17:40:15 +08:00

34 lines
1.6 KiB
JSON

{
"sites": [
"shopping"
],
"task_id": 165,
"require_login": true,
"storage_state": "./.auth/shopping_state.json",
"start_url": "http://localhost:28082/sandgrens-swedish-handmade-wooden-clog-sandal-copenhagen.html",
"geolocation": null,
"intent_template": "What are the main criticisms of this product? Please extract the relevant sentences.",
"instantiation_dict": {},
"intent": "What are the main criticisms of this product? Please extract the relevant sentences.",
"require_reset": false,
"eval": {
"eval_types": [
"string_match"
],
"reference_answers": {
"must_include": [
"The 39 was too small. I am afraid the 40 will be too big",
"I was very sad when the shoe rubbed up against my baby toe",
"I had to return them because I knew in time it would tear up my feet",
"The problem is that the strap is made of some really stiff leather and is painful to my heel",
"The front is also uncomfortably tight",
"The Dansko's were similar (not as bad) and loosened up over time"
]
},
"reference_url": "",
"program_html": [],
"string_note": "",
"reference_answer_raw_annotation": "The 39 was too small. I am afraid the 40 will be too big. I was very sad when the shoe rubbed up against my baby toe. I had to return them because I knew in time it would tear up my feet. The problem is that the strap is made of some really stiff leather and is painful to my heel. The front is also uncomfortably tight. The Dansko's were similar (not as bad) and loosened up over time."
},
"intent_template_id": 136
}