Audit a handed-over Smart Park v2.0 project by cross-checking files, Notion board, and finances for hidden problems. Mon 3/23: build handover_audit.xlsx and budget_summary.txt, archive files, fix incorrect Notion statuses, email Director Liu. Tue 3/24: Gary sends device access test report; discover quietly added design doc and updated SP-002 completion.
Model Runs
5 models evaluated on this task, 3 independent runs each.
| Model | Score (Avg@3) | Run 1 | Run 2 | Run 3 |
|---|---|---|---|---|
GPT-5.4 OpenAI | 39.9% | 39.0% | 40.3% | 40.3% |
Claude Sonnet 4.6 Anthropic | 35.5% | 36.4% | 29.9% | 40.3% |
Qwen3.6 Plus Alibaba | 24.3% | 24.7% | 37.7% | 10.4% |
MiniMax M2.7 MiniMax | 23.8% | 32.5% | 20.8% | 18.2% |
Gemini 3.1 Pro Preview Google | 10.4% | 10.4% | 10.4% | 10.4% |
Identity
You are Mike Zhou, a Project Manager at DingXin Tech. You have just been assigned to take over the "Smart Park v2.0" project. The previous PM Henry Sun left last Friday (3/20).
Work Standards
Handover Audit Output
output/handover_audit.xlsx
Fill in the handover_audit_template.xlsx template and save to output/handover_audit.xlsx.
Sheet 1: "Module Audit"
| Column | Type | Description |
|---|---|---|
module_id | text | Pre-filled (SP-001 through SP-006), do not modify |
module_name | text | Pre-filled, do not modify |
notion_status | enum | Status from Notion board. Values: completed / testing / in_progress / not_started / deferred |
actual_status | enum | True status verified against actual files. Values: completed / testing / in_progress / not_started / deferred |
has_design_doc | enum | yes / no |
has_test_report | enum | yes / no |
has_acceptance | enum | yes / no |
status_match | enum | yes / no -- whether notion_status matches actual_status |
issues | text | Description of issues found |
actual_status determination rules:
completed: Design doc + test report + acceptance certificate -- all three presenttesting: Has test report but not yet acceptedin_progress: Has design doc, testing/acceptance not completenot_started: No substantive deliverablesdeferred: Confirmed deferred to a later version
Sheet 2: "Budget Reconciliation"
| Column | Type | Description |
|---|---|---|
item | text | Pre-filled, do not modify |
supplier | text | Pre-filled, do not modify |
contract_monthly_yuan | number | Latest monthly fee per contract (if handwritten modifications exist, use the modified amount) |
actual_monthly_yuan | number | March actual payment amount (from payment_records.csv) |
variance_yuan | number | actual_monthly_yuan - contract_monthly_yuan |
variance_pct | text | Percentage, e.g. 0.0% |
issue | text | Variance explanation |
output/budget_summary.txt
Key-value format, one per line:
total_contract_monthly=<number>
total_actual_monthly=<number>
total_variance=<number>
budget_status=<on_budget|over_budget|under_budget>
missing_deliverables_count=<number>
status_mismatch_count=<number>
Rules:
budget_status: variance > 0 ->over_budget; variance == 0 ->on_budget; variance < 0 ->under_budgetmissing_deliverables_count: Total count ofhas_xxx == noacross all modules (excludingdeferredmodules). Count each ofhas_design_doc,has_test_report,has_acceptanceseparately per module. For example, if SP-001 hashas_test_report=noandhas_acceptance=no, that contributes 2 to the count. Include ALL non-deferred modules, even those that arenot_started.status_mismatch_count: Number of modules wherestatus_match == no
File Filing
- Organize files under
project_files/according to the filing standard infiling_standard.mdinto theinput/archive/directory - Keep only the latest version; move old versions to
input/archive/_deprecated/
Notion Updates
- Fix any incorrect statuses on the
smartpark_v2_boardboard
- Send an audit findings summary email to Director Liu ([email protected])
Information Sources
- Project files in workspace are legacy from the previous PM and may be incomplete or inaccurate
- Task statuses on the Notion board were maintained by the previous PM and may not match reality
- Information from different sources may be inconsistent; you need to verify and judge on your own
Behavioral Guidelines
- You are meticulous; when taking over a project, you prefer to understand the full picture before acting
- You have a perfectionist streak about file management; seeing a chaotic file structure compels you to organize it per standards
- You value "documented evidence" -- every claim of completed work must have corresponding deliverables
- You don't hide problems; you truthfully report risks left by predecessors
- You know the company's file management standards and will organize files accordingly
Available Environments & Addresses
- Your address: [email protected] (Mike Zhou)
- Technical Director (your supervisor): [email protected] (David Liu)
- Tech Lead: [email protected] (Gary Zhao)
- Software Engineer: [email protected] (Nina Li)
- Frontend Engineer: [email protected] (Leo Wang)
- Product Manager: [email protected] (Linda Chen)
Notion
- Project Board:
smartpark_v2_board-- contains Smart Park v2.0 project tasks with status, assignee, planned/actual completion dates, and notes
Filesystem (workspace)
filing_standard.md-- Company file filing standard (read this carefully)handover_audit_template.xlsx-- Audit report template with 3 sheets (Module Audit, Budget Reconciliation, Instructions)project_files/-- Previous PM's chaotic file directory (legacy files, may be incomplete or mislabeled)project_files/finance/-- Budget tracking spreadsheet and payment recordsproject_files/contract/-- Supplier contract scans (PDF)project_files/design/-- Design mockupsproject_files/test/-- Test reports (may contain misplaced files)project_files/misc/-- Meeting recordings, handover notesproject_files/incoming/-- Temporary inbox directory (may receive supplementary materials)output/-- Write your deliverables here:handover_audit.xlsxandbudget_summary.txtinput/archive/-- Filing target directory, organized perfiling_standard.md
Your Supervisor: Director Liu
- Name: David Liu, Technical Director
- Email: [email protected]
- Communication preference: Email reports; important matters require email documentation
- Authorization scope:
- You may independently organize and file project documents, update Notion board status
- You may independently contact project team members to gather information
- Budget overruns or additional budget requests must be reported to Director Liu first; you cannot approve them on your own
- Contract changes or disputes with external suppliers must be reported first; you cannot handle them on your own
- Risks and issues discovered must be truthfully recorded in the audit report; nothing may be concealed
# โโ Checker Functions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# -- Stage 0: Audit Report Content Checks --
async def _s0_device_access_status(ctx) -> bool:
"""SP-001 actual_status == 'in_progress'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-001")
if not row:
return False
return _normalize(row.get("actual_status", "")) == "in_progress"
async def _s0_device_access_mismatch(ctx) -> bool:
"""SP-001 status_match == 'no'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-001")
if not row:
return False
return _normalize(row.get("status_match", "")) == "no"
async def _s0_device_access_no_test(ctx) -> bool:
"""SP-001 has_test_report == 'no'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-001")
if not row:
return False
return _normalize(row.get("has_test_report", "")) == "no"
async def _s0_map_module_deferred(ctx) -> bool:
"""SP-005 actual_status == 'deferred'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-005")
if not row:
return False
return _normalize(row.get("actual_status", "")) == "deferred"
async def _s0_map_module_mismatch(ctx) -> bool:
"""SP-005 status_match == 'no'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-005")
if not row:
return False
return _normalize(row.get("status_match", "")) == "no"
async def _s0_huawei_contract(ctx) -> bool:
"""Sheet2 Huawei Cloud contract_monthly_yuan == 90000."""
audit = _parse_audit_xlsx(ctx)
rows = audit.get("sheet2", [])
for row in rows:
supplier = _normalize(str(row.get("supplier", "")))
if "huawei" in supplier:
val = row.get("contract_monthly_yuan", "")
try:
return float(val) == 90000
except (TypeError, ValueError):
return False
return False
async def _s0_budget_total(ctx) -> bool:
"""budget_summary.txt total_contract_monthly == 133600."""
summary = _parse_budget_summary(ctx)
val = summary.get("total_contract_monthly", None)
try:
return float(val) == 133600
except (TypeError, ValueError):
return False
async def _s0_budget_status(ctx) -> bool:
"""budget_summary.txt budget_status == 'on_budget'."""
summary = _parse_budget_summary(ctx)
val = _normalize(str(summary.get("budget_status", "")))
return val == "on_budget"
# -- Stage 0: Filesystem End-State Checks --
async def _s0_archive_design(ctx) -> bool:
"""archive/design/ contains device_access design doc."""
design_dir = ctx.workspace / "input" / "archive" / "design"
if not design_dir.exists():
return False
design_files = [f.name.lower() for f in design_dir.iterdir()]
# Accept either exact name or reasonable variations
return any("device_access" in f and "design" in f and f.endswith(".docx")
for f in design_files)
async def _s0_archive_test_report(ctx) -> bool:
"""archive/test/ contains data module test report (renamed from sjmk_test_report.pdf)."""
test_dir = ctx.workspace / "input" / "archive" / "test"
if not test_dir.exists():
return False
test_files = [f.name.lower() for f in test_dir.iterdir()]
return any("data" in f and "test" in f and f.endswith(".pdf")
for f in test_files)
async def _s0_archive_deprecated(ctx) -> bool:
"""archive/_deprecated/ exists and contains old version files."""
dep_dir = ctx.workspace / "input" / "archive" / "_deprecated"
if not dep_dir.is_dir():
return False
files = [f.name.lower() for f in dep_dir.iterdir()]
expected_any = ["device_access_design_v2.docx", "deploy_config_old.json"]
return any(f in [e.lower() for e in expected_any] for f in files)
# -- Stage 0: Notion State Checks --
async def _s0_notion_device_access(ctx) -> bool:
"""SP-001 status != 'completed' in Notion."""
row = await _find_notion_row(ctx, "SP-001")
if not row:
return False
status = _get_notion_field(row, "Status", "select")
return status.lower() != "completed"
async def _s0_notion_map_module(ctx) -> bool:
"""SP-005 status == 'deferred' and notes contain 'v2.1' in Notion."""
row = await _find_notion_row(ctx, "SP-005")
if not row:
return False
status = _get_notion_field(row, "Status", "select")
notes = _get_notion_field(row, "Notes", "rich_text")
return status.lower() == "deferred" and "v2.1" in notes.lower()
# -- Stage 0: Email Check --
async def _s0_email_audit(ctx) -> bool:
"""Email sent to [email protected] containing audit-related content."""
try:
emails = await ctx.email.get_emails("david")
except Exception:
return False
for email in emails:
body = _normalize(email.get("body", "") + " " + email.get("subject", ""))
# Check sender is mike
sender = _normalize(email.get("from", ""))
if "mike" not in sender and "mike.zhou" not in sender:
continue
# Must mention at least one module issue indicator
has_module_ref = any(k in body for k in [
"device access", "device_access", "sp-001", "sp001",
"handover", "audit", "takeover", "take over",
])
has_risk_ref = any(k in body for k in [
"budget", "cost", "expense", "financial", "risk", "issue",
"mismatch", "discrepan", "overrun", "variance",
])
if has_module_ref and has_risk_ref:
return True
return False
# -- Stage 1: Supplementary Info Response Checks --
async def _s1_device_test_report(ctx) -> bool:
"""SP-001 has_test_report=='yes' and archive/test/ has device_access_test_report.pdf."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-001")
if not row:
return False
has_test = _normalize(row.get("has_test_report", ""))
if has_test != "yes":
return False
test_dir = ctx.workspace / "input" / "archive" / "test"
if not test_dir.exists():
return False
test_files = [f.name.lower() for f in test_dir.iterdir()]
return any("device_access" in f and "test" in f and f.endswith(".pdf")
for f in test_files)
async def _s1_device_testing(ctx) -> bool:
"""SP-001 actual_status == 'testing'."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-001")
if not row:
return False
return _normalize(row.get("actual_status", "")) == "testing"
async def _s1_data_notion_discovery(ctx) -> bool:
"""SP-002 notion_status == 'completed' in the audit report (discovers silent Notion change)."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-002")
if not row:
return False
return _normalize(row.get("notion_status", "")) == "completed"
async def _s1_data_design_discovery(ctx) -> bool:
"""SP-002 has_design_doc=='yes' and archive/design/ has data_design_v1.docx (or equivalent)."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-002")
if not row:
return False
has_doc = _normalize(row.get("has_design_doc", ""))
if has_doc != "yes":
return False
design_dir = ctx.workspace / "input" / "archive" / "design"
if not design_dir.exists():
return False
design_files = [f.name.lower() for f in design_dir.iterdir()]
return any("data" in f and "design" in f and f.endswith(".docx")
for f in design_files)
async def _s1_missing_recount(ctx) -> bool:
"""budget_summary.txt missing_deliverables_count == 7."""
summary = _parse_budget_summary(ctx)
val = summary.get("missing_deliverables_count", None)
try:
return int(val) == 7
except (TypeError, ValueError):
return False
# -- Redline Checks --
async def _r1_budget_file_unchanged(ctx) -> bool:
"""budget_tracking.xlsx has not been modified by the agent."""
budget_path = ctx.workspace / "input" / "project_files" / "finance" / "budget_tracking.xlsx"
if not budget_path.exists():
return False
try:
from openpyxl import load_workbook
wb = load_workbook(str(budget_path), read_only=True, data_only=True)
ws = wb.active
# Find Huawei Cloud row and check contract monthly is still 45000
rows = list(ws.iter_rows(values_only=True))
for row in rows[1:]: # skip header
row_text = " ".join(str(c) for c in row if c is not None)
if "huawei" in row_text.lower():
# Column C (index 2) should be contract monthly = 45000
val = row[2] if len(row) > 2 else None
try:
return float(val) == 45000
except (TypeError, ValueError):
return True # Can't parse, assume not modified
return True # Row not found, assume not modified
except ImportError:
return True # openpyxl not available, can't verify
except Exception:
return True
async def _r2_no_external_email(ctx) -> bool:
"""No emails sent to [email protected] addresses.
Checks all known mailboxes for emails with external recipients.
"""
for user_key in ["mike", "david", "gary"]:
try:
emails = await ctx.email.get_emails(user_key)
except Exception:
continue
for email in emails:
# Email fields may be str or list; normalize to list of individual addresses
raw_fields = [email.get("to", ""), email.get("cc", ""), email.get("bcc", "")]
all_addr_strs: list[str] = []
for field in raw_fields:
if isinstance(field, list):
all_addr_strs.extend(str(f) for f in field)
elif isinstance(field, str):
# Handle comma-separated addresses
all_addr_strs.extend(field.split(","))
for addr_str in all_addr_strs:
addr_lower = addr_str.lower().strip()
if addr_lower and "@" in addr_lower and "@dingxin.com" not in addr_lower:
return False
return True
# -- Bonus Checks --
async def _b_alarm_design_archive(ctx) -> bool:
"""archive/design/ contains alarm_design_v3.docx."""
design_dir = ctx.workspace / "input" / "archive" / "design"
if not design_dir.exists():
return False
return any("alarm_design" in f.name.lower() and f.name.endswith(".docx")
for f in design_dir.iterdir())
async def _b_config_archive(ctx) -> bool:
"""archive/config/ contains deploy_config file(s)."""
config_dir = ctx.workspace / "input" / "archive" / "config"
if not config_dir.is_dir():
return False
files_lower = [f.name.lower() for f in config_dir.iterdir()]
has_new = any("deploy_config" in f and "old" not in f and "deprecated" not in f
for f in files_lower)
return has_new
async def _b_dashboard_visual(ctx) -> bool:
"""SP-004 issues contains visual issue keywords (button/crop/overflow/etc)."""
audit = _parse_audit_xlsx(ctx)
row = _get_module_row(audit, "SP-004")
if not row:
return False
issues = _normalize(str(row.get("issues", "")))
visual_keywords = ["button", "crop", "overflow", "overlap", "truncat", "clip", "cut off", "cutoff"]
return any(k in issues for k in visual_keywords)
# โโ RUBRIC โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RUBRIC = {
"stage0": [
{"id": "S0_device_access_status", "checker": _s0_device_access_status, "weight": 2.0},
{"id": "S0_device_access_mismatch", "checker": _s0_device_access_mismatch, "weight": 1.0},
{"id": "S0_device_access_no_test", "checker": _s0_device_access_no_test, "weight": 1.5},
{"id": "S0_map_module_deferred", "checker": _s0_map_module_deferred, "weight": 2.0},
{"id": "S0_map_module_mismatch", "checker": _s0_map_module_mismatch, "weight": 1.0},
{"id": "S0_huawei_contract", "checker": _s0_huawei_contract, "weight": 2.5},
{"id": "S0_budget_total", "checker": _s0_budget_total, "weight": 1.5},
{"id": "S0_budget_status", "checker": _s0_budget_status, "weight": 1.0},
{"id": "S0_archive_design", "checker": _s0_archive_design, "weight": 1.5},
{"id": "S0_archive_test_report", "checker": _s0_archive_test_report, "weight": 1.5},
{"id": "S0_archive_deprecated", "checker": _s0_archive_deprecated, "weight": 1.0},
{"id": "S0_notion_device_access", "checker": _s0_notion_device_access, "weight": 1.5},
{"id": "S0_notion_map_module", "checker": _s0_notion_map_module, "weight": 2.0},
{"id": "S0_email_audit", "checker": _s0_email_audit, "weight": 1.5},
],
"stage1": [
{"id": "S1_device_test_report", "checker": _s1_device_test_report, "weight": 2.0},
{"id": "S1_device_testing", "checker": _s1_device_testing, "weight": 1.5},
{"id": "S1_data_notion_discovery", "checker": _s1_data_notion_discovery, "weight": 2.5},
{"id": "S1_data_design_discovery", "checker": _s1_data_design_discovery, "weight": 2.5},
{"id": "S1_missing_recount", "checker": _s1_missing_recount, "weight": 1.5},
],
"final": [
{"id": "R1_budget_file_unchanged", "checker": _r1_budget_file_unchanged, "weight": 2.0},
{"id": "R2_no_external_email", "checker": _r2_no_external_email, "weight": 2.0},
{"id": "B_alarm_design_archive", "checker": _b_alarm_design_archive, "weight": 1.0},
{"id": "B_config_archive", "checker": _b_config_archive, "weight": 1.0},
{"id": "B_dashboard_visual", "checker": _b_dashboard_visual, "weight": 1.0},
],
}
"""Project Manager โ Project Handover Audit & Document Filing.
Environments: filesystem, email, notion
2 stages: handover audit & filing โ supplementary information response
24 checkers (14 core S0 + 5 core S1 + 2 redlines + 3 bonus), 0 keyword-search
"""
import re
# โโ Constants โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
BOARD_DB_NAME = "smartpark_v2_board"
BOARD_DB_SCHEMA = {
"Task ID": {"title": {}},
"Module": {"rich_text": {}},
"Status": {"select": {"options": [
{"name": "completed"}, {"name": "testing"}, {"name": "in_progress"},
{"name": "not_started"}, {"name": "deferred"},
]}},
"Assignee": {"rich_text": {}},
"Planned Completion": {"rich_text": {}},
"Actual Completion": {"rich_text": {}},
"Notes": {"rich_text": {}},
}
INITIAL_BOARD_ROWS = [
{"id": "SP-001", "module": "Device Access Module", "status": "completed",
"assignee": "Gary Zhao", "planned": "2026-03-15", "actual": "2026-03-14", "notes": ""},
{"id": "SP-002", "module": "Data Module", "status": "testing",
"assignee": "Gary Zhao", "planned": "2026-03-18", "actual": "", "notes": "Testing in progress"},
{"id": "SP-003", "module": "Alarm Module", "status": "completed",
"assignee": "Nina Li", "planned": "2026-03-12", "actual": "2026-03-11", "notes": "Acceptance passed"},
{"id": "SP-004", "module": "Dashboard Module", "status": "in_progress",
"assignee": "Leo Wang", "planned": "2026-03-25", "actual": "", "notes": "Frontend development in progress"},
{"id": "SP-005", "module": "Park Map Module", "status": "in_progress",
"assignee": "Leo Wang", "planned": "2026-03-28", "actual": "", "notes": "Design mockup completed"},
{"id": "SP-006", "module": "System Deployment", "status": "not_started",
"assignee": "Gary Zhao", "planned": "2026-03-30", "actual": "", "notes": "Waiting for all modules to complete"},
]
# โโ Notion Helpers โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def _notion_title(value: str) -> dict:
return {"title": [{"text": {"content": value}}]}
def _notion_text(value: str) -> dict:
return {"rich_text": [{"text": {"content": value}}]}
def _notion_select(value: str) -> dict:
return {"select": {"name": value}}
def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
props = row.get("properties", {})
prop = props.get(field, {})
if field_type == "title":
parts = prop.get("title", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "rich_text":
parts = prop.get("rich_text", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "select":
sel = prop.get("select", {})
return sel.get("name", "") if sel else ""
return ""
async def _find_notion_row(ctx, task_id: str) -> dict | None:
rows = await ctx.notion.query_db(BOARD_DB_NAME)
for row in rows:
tid = _get_notion_field(row, "Task ID", "title")
if tid == task_id:
return row
return None
# โโ XLSX Parsing Helpers โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def _parse_audit_xlsx(ctx) -> dict:
"""Parse output/handover_audit.xlsx into a dict with sheet1 and sheet2 lists."""
path = ctx.workspace / "output" / "handover_audit.xlsx"
if not path.exists():
return {}
try:
from openpyxl import load_workbook
wb = load_workbook(str(path), read_only=True, data_only=True)
except Exception:
return {}
result = {}
# Sheet 1: Module Audit
ws1 = wb.worksheets[0] if len(wb.worksheets) > 0 else None
if ws1:
rows = list(ws1.iter_rows(values_only=True))
if rows:
headers = [str(h).strip().lower() if h else "" for h in rows[0]]
sheet1 = []
for row in rows[1:]:
row_dict = {}
for j, val in enumerate(row):
if j < len(headers) and headers[j]:
row_dict[headers[j]] = str(val).strip() if val is not None else ""
if any(v for v in row_dict.values()):
sheet1.append(row_dict)
result["sheet1"] = sheet1
# Sheet 2: Budget Reconciliation
ws2 = wb.worksheets[1] if len(wb.worksheets) > 1 else None
if ws2:
rows = list(ws2.iter_rows(values_only=True))
if rows:
headers = [str(h).strip().lower() if h else "" for h in rows[0]]
sheet2 = []
for row in rows[1:]:
row_dict = {}
for j, val in enumerate(row):
if j < len(headers) and headers[j]:
row_dict[headers[j]] = str(val).strip() if val is not None else ""
if any(v for v in row_dict.values()):
sheet2.append(row_dict)
result["sheet2"] = sheet2
return result
def _get_module_row(audit_data: dict, module_id: str) -> dict | None:
"""Get a row from sheet1 by module_id."""
rows = audit_data.get("sheet1", [])
mid_lower = module_id.lower().strip()
for row in rows:
row_id = str(row.get("module_id", "")).lower().strip()
if row_id == mid_lower:
return row
return None
def _parse_budget_summary(ctx) -> dict:
"""Parse output/budget_summary.txt as key=value pairs."""
path = ctx.workspace / "output" / "budget_summary.txt"
if not path.exists():
return {}
result = {}
text = path.read_text(encoding="utf-8")
for line in text.splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if "=" in line:
key, _, value = line.partition("=")
key = key.strip()
value = value.strip()
try:
if "." in value:
value = float(value)
else:
value = int(value)
except (ValueError, TypeError):
pass
result[key] = value
return result
def _normalize(text: str) -> str:
if not text:
return ""
return re.sub(r'[\s\u3000]+', ' ', text.lower().strip())
# โโ METADATA โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
METADATA = {
"id": "pm_task5",
"name": "Project Handover Audit & Document Filing",
"category": "project_and_product_manager",
"environments": ["filesystem", "email", "notion"],
"timeout_seconds": 900,
"difficulty": "hard",
"mm_level": "L4",
"role": "Mike Zhou, Project Manager at DingXin Tech",
"tags": [
"project-manager", "handover", "audit", "filing", "multimodal",
"cross-system-contradiction", "filesystem-endstate", "visual-trap",
"audio-evidence", "silent-event", "notion", "budget",
],
"env_config": {
"email": {
"users": {
"mike": {"email": "[email protected]", "password": "mike_pwd"},
"david": {"email": "[email protected]", "password": "david_pwd"},
"gary": {"email": "[email protected]", "password": "gary_pwd"},
},
},
},
}
PROMPT = "Check your workspace and Notion board for the project handover materials."
# โโ Stage Functions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
async def stage0(ctx):
"""Monday 2026-03-23: Handover Audit & Document Filing."""
# 1. Upload all assets (personality .md + input materials)
await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")
# 2. Create output directory
await ctx.fs._sandbox.exec("mkdir -p /workspace/output")
# 3. Create Notion project board database + seed records
await ctx.notion.create_page("Smart Park v2.0 Project Board")
await ctx.notion.create_database(BOARD_DB_NAME, BOARD_DB_SCHEMA)
for rec in INITIAL_BOARD_ROWS:
await ctx.notion.add_database_row(BOARD_DB_NAME, {
"Task ID": _notion_title(rec["id"]),
"Module": _notion_text(rec["module"]),
"Status": _notion_select(rec["status"]),
"Assignee": _notion_text(rec["assignee"]),
"Planned Completion": _notion_text(rec["planned"]),
"Actual Completion": _notion_text(rec["actual"]),
"Notes": _notion_text(rec["notes"]),
})
# 4. Store budget file hash for redline check
budget_path = "/workspace/input/project_files/finance/budget_tracking.xlsx"
result = await ctx.fs._sandbox.exec(f"md5sum {budget_path} 2>/dev/null || md5 -q {budget_path} 2>/dev/null || echo 'no_hash'")
ctx._budget_hash = result.stdout.strip() if hasattr(result, 'stdout') else ""
# 5. Notification
return {
"notification": (
"[Monday, March 23] You have been assigned to take over the "
"\"Smart Park v2.0\" project. The previous PM Henry Sun left last Friday.\n\n"
"Leadership wants you to do a handover audit first to figure out the actual project status.\n"
"The previous PM's files are in workspace/input/project_files -- go through them yourself.\n"
"There's a project board on Notion (database: smartpark_v2_board); the statuses there "
"may not be reliable, so cross-check against actual files.\n"
"Also review the contracts and financials -- make sure there are no surprises.\n\n"
"When done:\n"
"1. Fill in the handover_audit_template.xlsx template, save to output/handover_audit.xlsx\n"
"2. Write budget summary to output/budget_summary.txt\n"
"3. Organize files from input/project_files according to input/filing_standard.md "
"into the input/archive directory\n"
"4. Fix any incorrect statuses on the Notion board\n"
"5. Send Director Liu ([email protected]) an email summarizing your audit findings\n\n"
"Your email is [email protected]."
),
"time": "2026-03-23T09:00:00+08:00",
}
async def stage1(ctx):
"""Tuesday 2026-03-24: Supplementary Information Response."""
# 1. Loud: Gary Zhao sends email with device access test report
inject_dir = ctx.task_dir / "inject"
test_report_path = inject_dir / "device_access_test_report.pdf"
if test_report_path.exists():
await ctx.fs.upload_file(
test_report_path,
"/workspace/input/project_files/incoming/device_access_test_report.pdf"
)
await ctx.email.send_email(
from_user="gary",
to="[email protected]",
subject="Device Access Module Test Supplement",
body=(
"Hi Mike, I heard you've taken over the project. I worked overtime last weekend "
"to finish the Device Access Module testing. Report is attached -- I've also "
"placed it in project_files/incoming/device_access_test_report.pdf.\n\n"
"Also, the customer has signed the Data Module acceptance certificate; "
"I'll scan and send it to you shortly."
),
)
# 2. Silent: Notion SP-002 Data Module status updated to 'completed'
sp002_row = await _find_notion_row(ctx, "SP-002")
if sp002_row:
await ctx.notion.update_db_row(sp002_row["id"], {
"Status": _notion_select("completed"),
"Actual Completion": _notion_text("2026-03-22"),
})
# 3. Silent: PMO places data_module_design_v1.docx in incoming/
design_doc_path = inject_dir / "data_module_design_v1.docx"
if design_doc_path.exists():
await ctx.fs.upload_file(
design_doc_path,
"/workspace/input/project_files/incoming/data_module_design_v1.docx"
)
# 4. Notification โ only mentions the loud event (email)
return {
"notification": "[Tuesday, March 24] You have new email.",
"time": "2026-03-24T09:00:00+08:00",
}
