task_summary.txtProduct Manager Β· task3
Prepare product review materials by consolidating interviews, competitors, backlog, and survey data into spec and PPT. Wed 3/19 morning: build feature_spec.xlsx and product_review.pptx, flag Learning Report issue, schedule the review meeting. Wed 3/19 afternoon: Li Fang emails feedback moving AI Guidance to Phase 1 P1; sync Excel, Notion, PPT.
Model Runs
5 models evaluated on this task, 3 independent runs each.
| Model | Score (Avg@3) | Run 1 | Run 2 | Run 3 |
|---|---|---|---|---|
Claude Sonnet 4.6 Anthropic | 30.4% | 37.1% | 17.1% | 37.1% |
GPT-5.4 OpenAI | 25.7% | 14.3% | 37.1% | 25.7% |
MiniMax M2.7 MiniMax | 21.9% | 14.3% | 37.1% | 14.3% |
Qwen3.6 Plus Alibaba | 20.9% | 25.7% | 37.1% | 0.0% |
Gemini 3.1 Pro Preview Google | 9.5% | 14.3% | 14.3% | 0.0% |
Input Files6
IDENTITY.md
Identity
You are Xiao Su, a product manager at "Lingxi Education", responsible for feature iteration of the "Lingxi Academy" App. You are planning the Q2 release, with the core new feature being "Smart Error Notebook". There is a product review meeting tomorrow afternoon and you need to prepare the materials today.
AGENTS.md
Output Specifications
Excel Feature Spec
- Template:
input/feature_spec_template.xlsx - Output:
output/feature_spec.xlsx - Fill-in rules are in the template's Sheet3 "Instructions"
Key Field Enums
| Field | Allowed Values | Notes |
|---|---|---|
user_feedback | positive, neutral, issue | positive=user explicitly expressed need; neutral=user did not specifically mention; issue=user reported a problem (e.g., cannot find entry point, feature malfunction) |
competitor_support | X/3 format (e.g., 2/3) | Count of competitors (out of 3) supporting the feature. If feature not in comparison, use 0/3 |
priority | P0, P1, P2 | |
version | v2.5, v2.6 | v2.5=Phase 1 (priority P0 or P1); v2.6=Phase 2 (priority P2) |
has_data_issue | yes, no | Whether there are data/status issues needing investigation |
Summary Counting Rules
total_functions: Total number of rows in Feature Listphase1_count: Count whereversion==v2.5phase2_count: Count whereversion==v2.6phase1_count+phase2_countmust equaltotal_functionstarget_launch_date: FormatYYYY-MM-DD, from meeting minutes
PPT Review Presentation
- Template:
input/ppt_template.pptx - Output:
output/product_review.pptx - Fill-in rules are in the notes of the first slide of the template
- Slide titles cannot be modified; only fill in the content area
Notion Backlog
- Update feature statuses in the Notion
lingxi_backlog_q2database when you discover issues - Do not leave known issues unreported
- Send meeting notifications to attendees listed in the meeting minutes
Important Principles
- Information from different sources may be inconsistent; cross-reference and use judgment
- Follow official team meeting decisions for product prioritization
- Always check the latest data in Notion and Google Sheets
SOUL.md
Code of Conduct
- Synthesize information from multiple sources; do not rely on a single channel
- Data from different sources may be inconsistent; cross-reference before making judgments
- Do not conceal known issues; reflect them honestly in documents and PPT
- Product decisions should follow official team meeting records
- Always check the latest data in Notion and Google Sheets β information may be updated at any time
TOOLS.md
Available Environments & Addresses
- Your address: [email protected]
- CEO: [email protected] (Zhou Ming)
- Design Director: [email protected] (Li Fang)
- Technical Lead: [email protected] (Chen Jie)
Notion
- Product Backlog:
lingxi_backlog_q2β contains feature IDs, titles, priorities, statuses, owners, and target versions
Google Sheets
- User Survey Results:
lingxi_survey_2026q1β contains demand statistics summary with feature votes and rankings
Google Calendar
- Create review meeting events as needed
Filesystem (workspace)
input/β user interview transcript, competitor comparison, meeting minutes, Excel template, PPT templateoutput/β write your feature spec (feature_spec.xlsx) and review PPT (product_review.pptx) here
USER.md
Your Supervisor: Zhou Ming (CEO)
- Name: Zhou Ming
- Email: [email protected]
- Authorization scope:
- You may independently read all project materials and organize documents
- You may send meeting notification emails
- You may update the Notion product backlog status when you discover issues
- Information from different sources may conflict; team meeting decisions take precedence
task_checker.py
# ββ Checker Functions βββββββββββββββββββββββββββββββββββββββββββββ
# -- S0: Review Material Preparation --
async def _s0_competitor_guidance(ctx):
"""Sheet1 'AI Problem-Solving Guidance' competitor_support == '2/3'
(TiHuiβ
+ XueBa Notesβ
= 2/3, must read competitor_comparison.md)"""
rows = _parse_xlsx_sheet(ctx, "feature_spec.xlsx", "Feature List")
if not rows:
return False
row = _find_xlsx_row(rows, "func_id", "F-203", exact=True)
if not row:
row = _find_xlsx_row(rows, "func_name", "Guidance")
if not row:
return False
cs = row.get("competitor_support", "").strip()
return cs == "2/3"
async def _s0_learning_report_issue(ctx):
"""Sheet1 'Learning Report' user_feedback == 'issue'
(teacher says 'can't find entry point' -> issue, not positive)"""
rows = _parse_xlsx_sheet(ctx, "feature_spec.xlsx", "Feature List")
if not rows:
return False
row = _find_xlsx_row(rows, "func_id", "F-204", exact=True)
if not row:
row = _find_xlsx_row(rows, "func_name", "Learning Report")
if not row:
return False
uf = row.get("user_feedback", "").strip().lower()
return uf == "issue"
async def _s0_guidance_p2(ctx):
"""Sheet1 'AI Problem-Solving Guidance' priority == 'P2' and version == 'v2.6'
(user wants it, but meeting minutes say 'tentatively Phase 2' -> follow team decision)"""
rows = _parse_xlsx_sheet(ctx, "feature_spec.xlsx", "Feature List")
if not rows:
return False
row = _find_xlsx_row(rows, "func_id", "F-203", exact=True)
if not row:
row = _find_xlsx_row(rows, "func_name", "Guidance")
if not row:
return False
priority = row.get("priority", "").strip()
version = row.get("version", "").strip()
return priority == "P2" and version == "v2.6"
async def _s0_backlog_investigation(ctx):
"""Notion F-204 status changed from 'launched' to 'needs investigation'.
Cross-modal contradiction: Notion says launched, but teacher can't find entry point."""
notion_row = await _find_notion_row(ctx, BACKLOG_DB_NAME, "F-204")
if not notion_row:
return False
status = _get_notion_field(notion_row, "Status", "select")
return status == "needs investigation"
async def _s0_calendar(ctx):
"""Google Calendar has review meeting event: date 2026-03-20, time 14:00, title contains 'review'.
Agent must extract meeting time from PDF meeting minutes and create calendar event."""
# Search all calendars the agent may have created
try:
cals = await ctx.calendar.list_calendars()
except Exception:
return False
events = []
for cal in cals:
try:
cal_events = await ctx.calendar.get_events(cal["name"])
events.extend(cal_events)
except Exception:
continue
for event in events:
summary = event.get("summary", "").lower()
dtstart = str(event.get("dtstart", ""))
# Check title contains "review" (case-insensitive)
has_review = "review" in summary
# Check date is 2026-03-20
has_date = "2026-03-20" in dtstart
# Check time contains 14:00
has_time = "14:00" in dtstart
if has_review and has_date and has_time:
return True
return False
# -- S1: Design Director Feedback Update --
async def _s1_guidance_upgrade(ctx):
"""Sheet1 'AI Problem-Solving Guidance' priority changed to P1, version to v2.5.
Event-driven update from Li Fang's email."""
rows = _parse_xlsx_sheet(ctx, "feature_spec.xlsx", "Feature List")
if not rows:
return False
row = _find_xlsx_row(rows, "func_id", "F-203", exact=True)
if not row:
row = _find_xlsx_row(rows, "func_name", "Guidance")
if not row:
return False
priority = row.get("priority", "").strip()
version = row.get("version", "").strip()
return priority == "P1" and version == "v2.5"
async def _s1_phase_count(ctx):
"""Sheet2 'Summary' phase1_count == '4', phase2_count == '0'.
Linkage update: after adjusting F-203 version, summary must also change."""
rows = _parse_xlsx_sheet(ctx, "feature_spec.xlsx", "Summary")
if not rows:
return False
# Build a map of statistic -> value
stats = {}
for row in rows:
stat_key = row.get("statistic", "").strip().lower()
stat_val = row.get("value", "").strip()
if stat_key:
stats[stat_key] = stat_val
p1_count = stats.get("phase1_count", "")
p2_count = stats.get("phase2_count", "")
return p1_count == "4" and p2_count == "0"
async def _s1_backlog_f203(ctx):
"""Notion F-203: priority=P1, status=pending development, target version=v2.5.
Multi-tool linkage: not just Excel, must also sync update Notion."""
notion_row = await _find_notion_row(ctx, BACKLOG_DB_NAME, "F-203")
if not notion_row:
return False
priority = _get_notion_field(notion_row, "Priority", "select")
status = _get_notion_field(notion_row, "Status", "select")
version = _get_notion_field(notion_row, "Target Version", "select")
return priority == "P1" and status == "pending development" and version == "v2.5"
async def _s1_ppt_update(ctx):
"""PPT slide 4 (plan) text references AI Guidance in Phase 1 or v2.5 or P1.
Multi-artifact linkage: Excel + Notion + PPT all need sync.
Uses proximity matching to avoid false-positive from unrelated Phase 1 mentions."""
text = _parse_pptx_slide_text(ctx, "product_review.pptx", 3) # 0-indexed: slide 4
normalized = _normalize(text)
if not normalized:
return False
# Check for AI Guidance co-occurring with Phase 1 / P1 / v2.5 within proximity
guidance_patterns = [
"guidance", "ai problem", "problem-solving", "problem solving",
]
phase1_patterns = [
"phase 1", "v2.5", "phase1",
]
# Forward proximity: guidance ... phase1 (within 80 chars)
for gp in guidance_patterns:
for pp in phase1_patterns:
pattern = re.escape(gp) + r".{0,80}" + re.escape(pp)
if re.search(pattern, normalized):
return True
# Reverse proximity: phase1 ... guidance
pattern_rev = re.escape(pp) + r".{0,80}" + re.escape(gp)
if re.search(pattern_rev, normalized):
return True
# Also accept if "p1" appears near guidance (but "p1" alone is too short for global search)
for gp in guidance_patterns:
pattern = re.escape(gp) + r".{0,40}" + r"\bp1\b"
if re.search(pattern, normalized):
return True
pattern_rev = r"\bp1\b" + r".{0,40}" + re.escape(gp)
if re.search(pattern_rev, normalized):
return True
return False
# -- Bonus --
async def _b_ppt_issues_slide(ctx):
"""PPT slide 5 (issues) mentions 'Learning Report' issue."""
text = _parse_pptx_slide_text(ctx, "product_review.pptx", 4) # 0-indexed: slide 5
normalized = _normalize(text)
if not normalized:
return False
return "learning report" in normalized
async def _b_email_notification(ctx):
"""Notification email sent to at least one attendee (zhouming, lifang, or chenjie)."""
for user_key in ("zhouming", "lifang", "chenjie"):
try:
emails = await ctx.email.get_emails(user_key)
except Exception:
continue
for email in emails:
sender = email.get("from", "")
if isinstance(sender, dict):
sender = sender.get("email", "")
sender = str(sender).lower()
# Check it's from xiaosu (the agent)
if "xiaosu" not in sender:
continue
# Check subject or body mentions review/meeting
subject = _normalize(email.get("subject", ""))
body = _normalize(email.get("body", ""))
combined = subject + " " + body
if any(k in combined for k in ["review", "march 20", "3/20", "meeting"]):
return True
return False
# ββ RUBRIC ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
RUBRIC = {
"stage0": [
{"id": "S0_competitor_guidance", "checker": _s0_competitor_guidance, "weight": 1.5},
{"id": "S0_learning_report_issue", "checker": _s0_learning_report_issue, "weight": 2.0},
{"id": "S0_guidance_P2", "checker": _s0_guidance_p2, "weight": 1.5},
{"id": "S0_backlog_investigation", "checker": _s0_backlog_investigation, "weight": 2.0},
{"id": "S0_calendar", "checker": _s0_calendar, "weight": 1.5},
],
"stage1": [
{"id": "S1_guidance_upgrade", "checker": _s1_guidance_upgrade, "weight": 2.0},
{"id": "S1_phase_count", "checker": _s1_phase_count, "weight": 1.5},
{"id": "S1_backlog_F203", "checker": _s1_backlog_f203, "weight": 2.0},
{"id": "S1_ppt_update", "checker": _s1_ppt_update, "weight": 1.5},
],
"final": [
{"id": "B_PPT_issues_slide", "checker": _b_ppt_issues_slide, "weight": 1.0},
{"id": "B_email_notification", "checker": _b_email_notification, "weight": 1.0},
],
}
task_progress.py
"""Product Manager β Smart Error Notebook review material preparation.
Environments: filesystem, email, notion, google_sheets, calendar
2 stages: review material preparation β design director feedback update
9 core checkers + 2 bonus (0 keyword-search)
"""
import re
from datetime import datetime
from pathlib import Path
# ββ Constants βββββββββββββββββββββββββββββββββββββββββββββββββββββ
BACKLOG_DB_NAME = "lingxi_backlog_q2"
BACKLOG_DB_SCHEMA = {
"Feature ID": {"title": {}},
"Title": {"rich_text": {}},
"Priority": {"select": {"options": [
{"name": "P0"}, {"name": "P1"}, {"name": "P2"},
]}},
"Status": {"select": {"options": [
{"name": "pending development"}, {"name": "pending evaluation"},
{"name": "launched"}, {"name": "needs investigation"},
]}},
"Owner": {"rich_text": {}},
"Target Version": {"select": {"options": [
{"name": "v2.4"}, {"name": "v2.5"}, {"name": "v2.6"},
]}},
}
INITIAL_BACKLOG_ROWS = [
{"id": "F-201", "title": "Error Auto Categorization", "priority": "P0",
"status": "pending development", "owner": "Chen Jie", "version": "v2.5"},
{"id": "F-202", "title": "Error Redo", "priority": "P0",
"status": "pending development", "owner": "Chen Jie", "version": "v2.5"},
{"id": "F-203", "title": "AI Problem-Solving Guidance", "priority": "P2",
"status": "pending evaluation", "owner": "Chen Jie", "version": "v2.6"},
{"id": "F-204", "title": "Learning Report", "priority": "P1",
"status": "launched", "owner": "Chen Jie", "version": "v2.4"},
]
SURVEY_HEADER = ["Feature", "Votes", "Rank"]
SURVEY_ROWS = [
["Auto Categorization", "45", "1"],
["Error Redo", "38", "2"],
["AI Problem-Solving Guidance", "25", "3"],
["Learning Report", "18", "4"],
]
CAL_NAME = "lingxi_review"
# ββ Helpers βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def _notion_title(value: str) -> dict:
return {"title": [{"text": {"content": value}}]}
def _notion_text(value: str) -> dict:
return {"rich_text": [{"text": {"content": value}}]}
def _notion_select(value: str) -> dict:
return {"select": {"name": value}}
def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
"""Extract a field value from a Notion query result row."""
props = row.get("properties", {})
prop = props.get(field, {})
if field_type == "title":
parts = prop.get("title", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "rich_text":
parts = prop.get("rich_text", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "select":
sel = prop.get("select", {})
return sel.get("name", "") if sel else ""
return ""
async def _find_notion_row(ctx, db_name: str, feature_id: str) -> dict | None:
"""Find a Notion row by Feature ID (title field)."""
rows = await ctx.notion.query_db(db_name)
for row in rows:
fid = _get_notion_field(row, "Feature ID", "title")
if fid == feature_id:
return row
return None
def _parse_xlsx_sheet(ctx, filename: str, sheet_name: str) -> list[dict]:
"""Parse an xlsx sheet from workspace/output/ into list of dicts."""
path = ctx.workspace / "output" / filename
if not path.exists():
return []
try:
from openpyxl import load_workbook
wb = load_workbook(str(path), read_only=True, data_only=True)
except Exception:
return []
if sheet_name not in wb.sheetnames:
return []
ws = wb[sheet_name]
rows = list(ws.iter_rows(values_only=True))
if not rows:
return []
headers = [str(h).strip().lower() if h else "" for h in rows[0]]
result = []
for row in rows[1:]:
row_dict = {}
for j, val in enumerate(row):
if j < len(headers) and headers[j]:
row_dict[headers[j]] = str(val).strip() if val is not None else ""
if any(v for v in row_dict.values()):
result.append(row_dict)
return result
def _find_xlsx_row(rows: list[dict], column: str, search: str, exact: bool = False) -> dict | None:
"""Find an xlsx row where column matches search string.
If exact=True, requires exact match (case-insensitive).
Otherwise, uses substring match (case-insensitive).
"""
for row in rows:
val = row.get(column, "")
if exact:
if val.strip().lower() == search.lower():
return row
else:
if search.lower() in val.lower():
return row
return None
def _parse_pptx_slide_text(ctx, filename: str, slide_index: int) -> str:
"""Extract all text from a specific slide (0-indexed) in a pptx file."""
path = ctx.workspace / "output" / filename
if not path.exists():
return ""
try:
from pptx import Presentation
prs = Presentation(str(path))
except Exception:
return ""
if slide_index >= len(prs.slides):
return ""
slide = prs.slides[slide_index]
texts = []
for shape in slide.shapes:
if shape.has_text_frame:
for para in shape.text_frame.paragraphs:
texts.append(para.text)
return " ".join(texts)
def _normalize(text: str) -> str:
"""Normalize text for comparison: lowercase, collapse whitespace."""
if not text:
return ""
return re.sub(r'[\s\u3000]+', ' ', text.lower().strip())
# ββ METADATA ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
METADATA = {
"id": "pm_task3",
"name": "Product Manager Review Material Preparation",
"category": "project_and_product_manager",
"environments": ["filesystem", "email", "notion", "google_sheets", "calendar"],
"timeout_seconds": 600,
"difficulty": "easy",
"mm_level": "L3",
"role": "Xiao Su, Product Manager at Lingxi Education",
"tags": [
"product-manager", "review", "excel", "ppt", "multimodal",
"cross-source-contradiction", "notion", "calendar", "email",
],
"env_config": {
"email": {
"users": {
"xiaosu": {"email": "[email protected]", "password": "xiaosu_pwd"},
"zhouming": {"email": "[email protected]", "password": "zhouming_pwd"},
"lifang": {"email": "[email protected]", "password": "lifang_pwd"},
"chenjie": {"email": "[email protected]", "password": "chenjie_pwd"},
},
},
"google_sheets": {
"task_id": "pm_task3",
},
},
}
PROMPT = "Check your workspace for project materials and prepare the review."
# ββ Stage Functions βββββββββββββββββββββββββββββββββββββββββββββββ
async def stage0(ctx):
"""2026-03-19: Review material preparation β consolidate all sources."""
# 1. Upload all assets (personality .md + input materials)
await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")
# 2. Create output directory
await ctx.fs._sandbox.exec("mkdir -p /workspace/output")
# 3. Create Notion product backlog database + seed records
await ctx.notion.create_page("Lingxi Academy Q2 Backlog")
await ctx.notion.create_database(BACKLOG_DB_NAME, BACKLOG_DB_SCHEMA)
for rec in INITIAL_BACKLOG_ROWS:
await ctx.notion.add_database_row(BACKLOG_DB_NAME, {
"Feature ID": _notion_title(rec["id"]),
"Title": _notion_text(rec["title"]),
"Priority": _notion_select(rec["priority"]),
"Status": _notion_select(rec["status"]),
"Owner": _notion_text(rec["owner"]),
"Target Version": _notion_select(rec["version"]),
})
# 4. Create Google Sheets survey data
sheet_info = await ctx.google_sheets.create_spreadsheet("lingxi_survey_2026q1")
sheet_id = sheet_info["sheet_id"]
await ctx.google_sheets.update_values(
sheet_id, "Sheet1!A1:C5",
[SURVEY_HEADER] + SURVEY_ROWS,
)
# 5. Seed historical email (noise β HR team building notice)
await ctx.email.send_email(
from_user="zhouming",
to="[email protected]",
subject="March Team Building Event Notice β 3/29 Saturday Afternoon",
body=(
"Dear colleagues,\n\n"
"A spring team building event is scheduled for March 29 (Saturday) "
"from 14:00 to 17:00. The venue is tentatively set at the park near the office.\n"
"Please plan your schedule accordingly and try to attend.\n\n"
"Zhou Ming"
),
)
# 6. Create calendar for review meetings
await ctx.calendar.create_calendar(CAL_NAME)
# 7. Notification
return {
"notification": (
"[2026-03-19 Wednesday] There's a product review meeting tomorrow afternoon. "
"Help me prepare the materials.\n\n"
"The workspace has user interview transcript (input/user_interview_teacher.txt), "
"competitor comparison (input/competitor_comparison.md), and previous meeting minutes "
"(input/last_review_meeting.md) β please review them all.\n"
"Survey data is on Google Sheets (lingxi_survey_2026q1), pull that.\n"
"Also check the product backlog on Notion (lingxi_backlog_q2) for the current status.\n\n"
"Please do two things:\n"
"1. Organize the feature spec according to the input/feature_spec_template.xlsx template, "
"output to output/feature_spec.xlsx\n"
"2. Create a review PPT based on the input/ppt_template.pptx template, "
"output to output/product_review.pptx\n\n"
"If the Notion backlog has any status that needs updating, please handle that too.\n"
"Schedule the review meeting (at the time decided in the last meeting), "
"find the attendees from the meeting minutes, and send them a notification.\n\n"
"Your email is [email protected]. "
"CEO: [email protected]. Design Director: [email protected]. "
"Technical Lead: [email protected].\n"
"Product backlog is in Notion (database: lingxi_backlog_q2). "
"Survey data is in Google Sheets (lingxi_survey_2026q1). "
"Use Google Calendar to schedule the review meeting."
),
"time": "2026-03-19T09:00:00+08:00",
}
async def stage1(ctx):
"""2026-03-19 afternoon: Design Director Li Fang sends feedback email."""
# 1. Loud: Li Fang email with feedback
await ctx.email.send_email(
from_user="lifang",
to="[email protected]",
subject="Review Material Feedback",
body=(
"I looked at the spec table you organized. One thing needs to change:\n"
"I confirmed with Chen Jie about the AI Problem-Solving Guidance feature β "
"it's technically feasible for Phase 1 using a RAG approach.\n"
"Move it from Phase 2 to Phase 1, and change the priority to P1.\n"
"Also update the timeline slide in the PPT accordingly."
),
)
# 2. Notification β mentions loud event only
return {
"notification": (
"[2026-03-19 afternoon] Design Director Li Fang sent an email with feedback "
"that needs changes. Please check your inbox and make the requested adjustments."
),
"time": "2026-03-19T15:00:00+08:00",
}
