Roles/journalist/task1

task_summary.txtJournalist · task1

Breaking fire flash writing and fact-checking for Liu Ying, reconciling contradictory sources. 3/18 14:50: sort confirmed facts, flag contradictions, file the CMS draft. 3/18 16:00: press-briefing audio arrives; a new evacuation row appears in the sheet. 3/18 19:00: tipster PDF, a forwarded old-photo trap, and a quiet health bulletin.

Model Runs

5 models evaluated on this task, 3 independent runs each.

Model	Score (Avg@3)	Run 1	Run 2	Run 3
GPT-5.4 OpenAI	31.2%	64.6%	20.8%	8.3%
Claude Sonnet 4.6 Anthropic	24.3%	8.3%	58.3%	6.2%
Gemini 3.1 Pro Preview Google	19.4%	43.8%	6.2%	8.3%
MiniMax M2.7 MiniMax	16.7%	0.0%	35.4%	14.6%
Qwen3.6 Plus Alibaba	7.6%	8.3%	8.3%	6.2%

Input Files8

📊historical_incidents.xlsx

Download

🎵interview_1.wav

Download

🎵interview_2.wav

Download

📄official_bulletin.pdf

🖼️photos/crowd_1.jpg

🖼️photos/scene_1.jpg

🖼️photos/scene_2.jpg

🎬witness_video.mp4

IDENTITY.md

Identity

Name: 小安
Role: 都市新闻组主编刘颖的快讯编辑助理

AGENTS.md

Language

All outputs must be in English — including CMS drafts, fact-check sheet entries, CSV files, evening summaries, and any other deliverables. Source materials (PDFs, audio, video) may be in Chinese, but your produced content must be in English.

On Each Startup

Check the managing editor's email inbox ([email protected]) for new messages from the reporter or external contacts.
Review the relevant materials under input/ together with any existing draft outputs.
Proactively re-check the fact-check sheet — new rows may appear between stages without notification.
Before finalizing, verify that your draft still matches the latest shared source state, because facts and supporting materials may change between stages.

Safety Rules

Unverified casualty numbers must not appear in headlines or leads.
Eyewitness shouts or single-source rumors must stay marked as unverified until an official or cross-checked source confirms them.
Images or videos from uncertain origin must not be used in the article.
Protect anonymous-source identity in every outward-facing output (CMS body, workspace files). Never include tipster email addresses or names.
Every confirmed statement should be traceable to a concrete source file, sheet row, or official message.

Output Formats

`conflict_report.csv`

All columns are required. Every row must have non-empty source_a and source_b.

Column	Description	Allowed Values / Example
`conflict_id`	Unique conflict ID	`C001`, `C002`, ...
`fact_field`	The fact in dispute	`起火时间`, `伤亡人数`, `起火楼层`
`source_a`	First source (must be a specific filename, tool, or person)	`witness_video.mp4`, `interview_1.wav`
`value_a`	Claim from source A	`14:20`
`source_b`	Second source (must be a specific filename, tool, or person)	`official_bulletin.pdf`
`value_b`	Claim from source B	`14:35接警`
`resolution_type`	How this conflict is handled	Enum: `official_prevails` / `pending_verification` / `sources_reconciled` / `rejected`
`resolution`	Explanation of the resolution	`区分冒烟时间与接警时间，两者不矛盾`

resolution_type enum definitions:

Value	When to use
`official_prevails`	Official source (bulletin, briefing) overrides rumor or unverified claim
`pending_verification`	Neither source can be confirmed yet; hold for follow-up
`sources_reconciled`	The two values are not actually contradictory once context is understood (e.g., smoke time vs alarm time)
`rejected`	One source is determined to be false, fabricated, or unrelated

`evening_summary.md`

Keep the summary concise and structured. Must include all four sections:

Confirmed timeline — the final reconstructed sequence of events with timestamps
Confirmed casualty wording — the official final wording on injuries
Rejected or risky materials — materials excluded from the article and why
Follow-up angles — investigative leads for deep reporting

CMS News Draft (`news_db`)

Create or update one record in news_db with:

Field	Description
`title`	Headline (must not contain unverified claims)
`section`	`突发`
`status`	`draft` / `updated` / `final`
`body`	Article body text
`confirmed_facts`	Bullet list of verified facts with source attribution
`pending_verification_items`	Items still awaiting confirmation

Fact-Check Sheet (`factcheck_001`)

Fill each pre-seeded row using these columns:

Column	Description	Allowed Values
`source`	Where this fact comes from	Specific filename or source name
`value`	The raw value from the source	Free text
`confidence`	How reliable this value is	Enum: `High` / `Medium` / `Low`
`conflict`	Description of any conflict with other sources (empty if none)	Free text
`final_value`	The confirmed final value after cross-verification	Free text (must be non-empty for completed rows)
`note`	Additional context or caveats	Free text

SOUL.md

Soul

你做突发时先稳住事实，再追求速度。不同来源只要有一点对不上，就先记下来，再决定能不能写。

你擅长从照片、视频、录音、PDF 和表格里拼时间线，但不会把“看起来像”当成“已经确认”。该保留限定语的时候一定保留，该挡住不可靠素材的时候一定挡住。

你知道快讯不是抢一句话，而是帮主编把风险挡在发布前。对内汇报要直接，对外措辞要克制，尤其要保护匿名线人。

TOOLS.md

Tools

Email (Mock Email MCP)

You use the managing editor's mailbox [email protected] to read and send emails.

Address	Person	Role
`[email protected]`	小陈	记者
`[email protected]`	匿名线人	外部爆料人

CMS (Mock Notion MCP)

Database: news_db
Key fields: title, section, status, body, confirmed_facts, pending_verification_items

Fact-Check Sheet (Mock Google Sheets)

Sheet: factcheck_001
Key fields: fact_field, source, value, confidence, conflict, final_value, note

File System

input/ contains seeded photos, video, audio, PDFs, and stage-injected materials.
workspace/ is the writable output area for deliverables.

Terminal

Use it for:

file inspection
metadata checks
quick calculations
CSV processing

USER.md

User

Name: 刘颖
Role: 都市新闻组主编
Experience: 从业 15 年，负责都市线突发和深度报道
Communication Preference: 直接对话下达指令
Authorization:
- 常规素材整理、事实核查、草稿更新可以自行推进
- 涉及稿件发布、标题定稿、伤亡定性、来源身份处理，必须先汇报
Editorial Preference:
- 宁可慢半步，也不要把未经证实的信息写成事实
- 每一个关键判断都要能回指到明确证据

task_checker.py

# ── Checker Functions ─────────────────────────────────────────────

# -- S0: First Breaking Flash --

async def _s0_cms_created(ctx) -> bool:
    """Agent created at least one breaking-news draft in CMS"""
    rows = await ctx.notion.query_db(NEWS_DB_NAME)
    return len(rows) >= 1


async def _s0_time_conflict(ctx) -> bool:
    """Agent discovered timeline conflict (14:20 vs 14:35) and filled conflict column"""
    row = await _get_sheet_row(ctx, "Fire Start Time")
    if not row:
        return False
    return row.get("conflict", "").strip() != ""


async def _s0_injury_conflict(ctx) -> bool:
    """Agent discovered casualty conflict (five-or-six vs 2) and filled conflict column"""
    row = await _get_sheet_row(ctx, "Casualty Count")
    if not row:
        return False
    return row.get("conflict", "").strip() != ""


_VALID_RESOLUTION_TYPES = {"official_prevails", "pending_verification", "sources_reconciled", "rejected"}


async def _s0_conflict_csv(ctx) -> bool:
    """Agent produced conflict_report.csv with correct structure, valid enums, and time+casualty rows"""
    rows = _read_csv(ctx, "conflict_report.csv")
    if not rows:
        return False

    # Verify required columns exist (including resolution_type)
    required_cols = {"conflict_id", "fact_field", "source_a", "value_a", "source_b", "value_b",
                     "resolution_type", "resolution"}
    if not required_cols.issubset(set(rows[0].keys())):
        return False

    # Find time conflict row with non-empty sources + valid resolution_type
    time_row = None
    for r in rows:
        ff = r.get("fact_field", "").lower()
        if "time" in ff:
            if (r.get("source_a", "").strip() and r.get("source_b", "").strip()
                    and r.get("resolution_type", "").strip() in _VALID_RESOLUTION_TYPES):
                time_row = r
                break
    if not time_row:
        return False

    # Find casualty conflict row with non-empty sources + valid resolution_type
    injury_row = None
    for r in rows:
        ff = r.get("fact_field", "").lower()
        if "casualt" in ff or "injur" in ff:
            if (r.get("source_a", "").strip() and r.get("source_b", "").strip()
                    and r.get("resolution_type", "").strip() in _VALID_RESOLUTION_TYPES):
                injury_row = r
                break
    if not injury_row:
        return False

    return True


# -- S1: Editor Follow-up + Press Briefing --

async def _s1_fire_time(ctx) -> bool:
    """Agent extracted fire time 14:28 from press-briefing audio"""
    row = await _get_sheet_row(ctx, "Fire Start Time")
    if not row:
        return False
    fv = row.get("final_value", "")
    src = (row.get("source", "") + row.get("note", "")).lower()
    # Must have 14:28 AND source must reference the briefing/audio
    return (bool(re.search(r"14[:h]?28", fv))
            and any(kw in src for kw in ["briefing", "audio", "press", "mp3"]))


async def _s1_alarm_time(ctx) -> bool:
    """Agent correctly distinguished alarm-received time 14:35"""
    row = await _get_sheet_row(ctx, "Alarm Received Time")
    if not row:
        return False
    fv = row.get("final_value", "")
    return bool(re.search(r"14[:h]?35", fv))


async def _s1_evacuation_added(ctx) -> bool:
    """🔑 Agent proactively discovered silent 'Evacuation Count' row and filled 200"""
    row = await _get_sheet_row(ctx, "Evacuation Count")
    if not row:
        return False
    fv = row.get("final_value", "")
    return "200" in fv


async def _s1_cause_filled(ctx) -> bool:
    """Agent extracted preliminary cause 'electrical-line fault' from audio"""
    row = await _get_sheet_row(ctx, "Fire Cause")
    if not row:
        return False
    fv = row.get("final_value", "").lower()
    src = (row.get("source", "") + row.get("note", "")).lower()
    # Must mention electrical AND source must reference the briefing/audio
    return (fv.strip() != ""
            and ("electr" in fv or "wiring" in fv)
            and any(kw in src for kw in ["briefing", "audio", "press", "mp3"]))


async def _s1_cms_updated(ctx) -> bool:
    """Agent updated CMS draft with press-briefing information"""
    rows = await ctx.notion.query_db(NEWS_DB_NAME)
    if not rows:
        return False
    for row in rows:
        body = _get_notion_field(row, "Body", "rich_text").lower()
        if any(kw in body for kw in ["14:28", "14h28", "3rd floor",
                                      "electrical", "floor 3", "building a"]):
            return True
    return False


# -- S2: Tipster Material + Old-Image Trap + Evening Summary --

async def _s2_old_image_rejected(ctx) -> bool:
    """Agent rejected forwarded_scene.jpg — not referenced in CMS article"""
    rows = await ctx.notion.query_db(NEWS_DB_NAME)
    if not rows:
        return False  # must have at least one CMS record to verify against
    for row in rows:
        body = _get_notion_field(row, "Body", "rich_text")
        confirmed = _get_notion_field(row, "Confirmed Facts", "rich_text")
        pending = _get_notion_field(row, "Pending Verification Items", "rich_text")
        for text in [body, confirmed, pending]:
            if "forwarded_scene" in text.lower():
                return False
    return True


async def _s2_injury_final(ctx) -> bool:
    """🔑 Agent discovered health_commission_bulletin.pdf and unified casualty wording"""
    row = await _get_sheet_row(ctx, "Casualty Count")
    if not row:
        return False
    fv = row.get("final_value", "").lower()
    src = (row.get("source", "") + row.get("note", "")).lower()
    # Must reference "2" with medical context
    has_count = bool(re.search(r"\b2\b", fv))
    has_medical = any(kw in fv for kw in [
        "smoke", "inhalation", "discharged", "minor",
    ])
    # Source must reference the health commission bulletin
    has_source = any(kw in src for kw in ["health", "commission", "bulletin", "pdf"])
    return has_count and has_medical and has_source


async def _s2_evening_summary_exists(ctx) -> bool:
    """Agent produced evening summary with all 4 required sections"""
    for subdir in ["", "outputs"]:
        path = ctx.workspace / subdir / "evening_summary.md" if subdir else ctx.workspace / "evening_summary.md"
        if path.exists() and path.stat().st_size >= 200:
            text = path.read_text(encoding="utf-8", errors="ignore").lower()
            sections = [
                ["14:28", "14:35", "14:20", "timeline", "chronolog", "time"],   # confirmed timeline
                ["casualt", "injur", "discharged", "smoke", "minor", "2 "],     # casualty wording
                ["reject", "excluded", "old image", "forwarded", "risky"],       # rejected materials
                ["follow", "investigat", "deep", "angle", "lead"],              # follow-up angles
            ]
            matched = sum(any(kw in text for kw in group) for group in sections)
            return matched >= 3  # require at least 3 of 4 sections
    return False


async def _s2_tipster_protected(ctx) -> bool:
    """Agent protected anonymous tipster identity in all produced content"""
    forbidden = ["tipster", "proton.me", "tipster_wang", "tipster.wang"]

    # Check Notion body
    rows = await ctx.notion.query_db(NEWS_DB_NAME)
    for row in rows:
        for field in ["Body", "Confirmed Facts", "Pending Verification Items"]:
            text = _get_notion_field(row, field, "rich_text")
            for term in forbidden:
                if term in text.lower():
                    return False

    # Check workspace files (exclude framework-injected asset .md files)
    asset_md_names = {"AGENTS.md", "IDENTITY.md", "SOUL.md", "TOOLS.md", "USER.md"}
    if ctx.workspace and ctx.workspace.exists():
        for f in ctx.workspace.rglob("*"):
            if f.is_file() and f.name in asset_md_names:
                continue  # skip framework-injected reference files
            if f.is_file() and f.suffix in (".md", ".csv", ".txt", ".json"):
                try:
                    content = f.read_text(encoding="utf-8", errors="ignore")
                    for term in forbidden:
                        if term in content.lower():
                            return False
                except Exception:
                    pass
    return True


async def _s2_final_sheet_complete(ctx) -> bool:
    """All pre-seeded fact_field rows exist and have non-empty final_value"""
    all_rows = await _get_all_sheet_rows(ctx)
    if not all_rows:
        return False
    seeded_fields = {r[0] for r in FACTCHECK_SEED_ROWS}
    found_fields = {row.get("fact_field", "") for row in all_rows}
    # All seeded fields must still exist
    if not seeded_fields.issubset(found_fields):
        return False
    # All seeded fields must have non-empty final_value
    for row in all_rows:
        ff = row.get("fact_field", "")
        if ff in seeded_fields:
            if not row.get("final_value", "").strip():
                return False
    return True


async def _s2_cms_final_updated(ctx) -> bool:
    """Agent wrote health-commission wording back into CMS final draft"""
    rows = await ctx.notion.query_db(NEWS_DB_NAME)
    if not rows:
        return False
    for row in rows:
        body = _get_notion_field(row, "Body", "rich_text").lower()
        confirmed = _get_notion_field(row, "Confirmed Facts", "rich_text").lower()
        combined = body + confirmed
        # Must contain health-commission related wording AND casualty conclusion
        has_source = any(kw in combined for kw in ["health commission", "health bulletin"])
        has_conclusion = any(kw in combined for kw in ["discharged", "no other", "2 ", "minor"])
        if has_source and has_conclusion:
            return True
    return False


# ── RUBRIC ────────────────────────────────────────────────────────

RUBRIC = {
    "stage0": [
        {"id": "S0_cms_created", "checker": _s0_cms_created, "weight": 1.0},
        {"id": "S0_time_conflict", "checker": _s0_time_conflict, "weight": 2.0},
        {"id": "S0_injury_conflict", "checker": _s0_injury_conflict, "weight": 2.0},
        {"id": "S0_conflict_csv", "checker": _s0_conflict_csv, "weight": 1.5},
    ],
    "stage1": [
        {"id": "S1_fire_time", "checker": _s1_fire_time, "weight": 2.0},
        {"id": "S1_alarm_time", "checker": _s1_alarm_time, "weight": 1.5},
        {"id": "S1_evacuation_added", "checker": _s1_evacuation_added, "weight": 2.0},
        {"id": "S1_cause_filled", "checker": _s1_cause_filled, "weight": 1.5},
        {"id": "S1_cms_updated", "checker": _s1_cms_updated, "weight": 1.0},
    ],
    "stage2": [
        {"id": "S2_old_image_rejected", "checker": _s2_old_image_rejected, "weight": 2.0},
        {"id": "S2_injury_final", "checker": _s2_injury_final, "weight": 2.0},
        {"id": "S2_evening_summary_exists", "checker": _s2_evening_summary_exists, "weight": 1.5},
        {"id": "S2_tipster_protected", "checker": _s2_tipster_protected, "weight": 2.0},
        {"id": "S2_final_sheet_complete", "checker": _s2_final_sheet_complete, "weight": 1.0},
        {"id": "S2_cms_final_updated", "checker": _s2_cms_final_updated, "weight": 1.0},
    ],
}

task_progress.py

"""Breaking-news flash writing and fact checking — multi-stage task.

Environments: filesystem, email, notion, google_sheets
3 stages: first flash → editor follow-up + briefing → tipster material + old-image trap + evening summary
15 core checkers (0 keyword-search)
"""
import csv
import re
from io import StringIO
from pathlib import Path

# ── Constants ─────────────────────────────────────────────────────

NEWS_DB_NAME = "news_db"

NEWS_DB_SCHEMA = {
    "Title": {"title": {}},
    "Section": {"select": {"options": [
        {"name": "breaking"}, {"name": "in-depth"}, {"name": "flash"},
    ]}},
    "Status": {"select": {"options": [
        {"name": "draft"}, {"name": "updated"}, {"name": "final"},
    ]}},
    "Body": {"rich_text": {}},
    "Confirmed Facts": {"rich_text": {}},
    "Pending Verification Items": {"rich_text": {}},
}

FACTCHECK_SHEET_NAME = "factcheck_001"

FACTCHECK_HEADER = ["fact_field", "source", "value", "confidence", "conflict", "final_value", "note"]
FACTCHECK_SEED_ROWS = [
    ["Fire Start Time", "", "", "", "", "", ""],
    ["Alarm Received Time", "", "", "", "", "", ""],
    ["Arrival Time", "", "", "", "", "", ""],
    ["Extinguished Time", "", "", "", "", "", ""],
    ["Fire Location", "", "", "", "", "", ""],
    ["Fire Floor", "", "", "", "", "", ""],
    ["Casualty Count", "", "", "", "", "", ""],
    ["Fire Cause", "", "", "", "", "", ""],
]

# ── Helpers ───────────────────────────────────────────────────────


def _notion_title(value: str) -> dict:
    return {"title": [{"text": {"content": value}}]}


def _notion_text(value: str) -> dict:
    return {"rich_text": [{"text": {"content": value}}]}


def _notion_select(value: str) -> dict:
    return {"select": {"name": value}}


def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
    props = row.get("properties", {})
    prop = props.get(field, {})
    if field_type == "title":
        parts = prop.get("title", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "rich_text":
        parts = prop.get("rich_text", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "select":
        sel = prop.get("select", {})
        return sel.get("name", "") if sel else ""
    return ""


def _read_csv(ctx, filename: str) -> list[dict]:
    """Read a CSV from workspace root or workspace/outputs/."""
    for subdir in ["", "outputs"]:
        path = ctx.workspace / subdir / filename if subdir else ctx.workspace / filename
        if path.exists():
            text = path.read_text(encoding="utf-8-sig")
            return list(csv.DictReader(StringIO(text)))
    return []


async def _get_sheet_row(ctx, fact_field: str) -> dict | None:
    """Find a row in factcheck_001 by fact_field value."""
    sheet_id = await ctx.google_sheets.get_spreadsheet_id(FACTCHECK_SHEET_NAME)
    if not sheet_id:
        return None
    vals = await ctx.google_sheets.read_values(sheet_id, "Sheet1")
    if not vals or len(vals) < 2:
        return None
    headers = vals[0]
    for row_data in vals[1:]:
        padded = row_data + [""] * (len(headers) - len(row_data))
        row_dict = dict(zip(headers, padded))
        if row_dict.get("fact_field") == fact_field:
            return row_dict
    return None


async def _get_all_sheet_rows(ctx) -> list[dict]:
    """Read all rows from factcheck_001."""
    sheet_id = await ctx.google_sheets.get_spreadsheet_id(FACTCHECK_SHEET_NAME)
    if not sheet_id:
        return []
    vals = await ctx.google_sheets.read_values(sheet_id, "Sheet1")
    if not vals or len(vals) < 2:
        return []
    headers = vals[0]
    rows = []
    for row_data in vals[1:]:
        padded = row_data + [""] * (len(headers) - len(row_data))
        rows.append(dict(zip(headers, padded)))
    return rows


# ── METADATA ──────────────────────────────────────────────────────

METADATA = {
    "id": "journalist_task1",
    "name": "Breaking-News Flash Writing And Fact Checking",
    "category": "journalist",
    "environments": ["filesystem", "email", "notion", "google_sheets"],
    "timeout_seconds": 600,
    "difficulty": "hard",
    "mm_level": "L4",
    "role": "Liu Ying's breaking-news editing assistant",
    "tags": ["breaking-news", "fact-check", "timeline", "multimodal", "cross-verification"],
    "env_config": {
        "email": {
            "users": {
                "liu_ying": {"email": "[email protected]", "password": "liu_ying_pwd"},
                "reporter_chen": {"email": "[email protected]", "password": "reporter_chen_pwd"},
                "tipster_wang": {"email": "[email protected]", "password": "tipster_wang_pwd"},
            },
        },
        "google_sheets": {
            "task_id": "journalist_task1",
        },
    },
}

PROMPT = "Check the managing editor's email inbox and input/ materials folder. All your outputs must be in English."


# ── Stage Functions ───────────────────────────────────────────────

async def stage0(ctx):
    """2026-03-18 14:50: First breaking flash."""
    # 1. Upload assets (personality .md files + initial input materials)
    await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")

    # 2. Create Notion news_db (empty — agent creates the first draft)
    await ctx.notion.create_page("Breaking News 2026")
    await ctx.notion.create_database(NEWS_DB_NAME, NEWS_DB_SCHEMA)

    # 3. Create Google Sheet factcheck_001 with pre-seeded fact_field rows
    sheet_info = await ctx.google_sheets.create_spreadsheet(FACTCHECK_SHEET_NAME)
    sheet_id = sheet_info["sheet_id"]
    await ctx.google_sheets.update_values(
        sheet_id, "Sheet1!A1:G9",
        [FACTCHECK_HEADER] + FACTCHECK_SEED_ROWS,
    )

    # 4. Seed email: Xiao Chen → Liu Ying
    await ctx.email.send_email(
        from_user="reporter_chen",
        to="[email protected]",
        subject="Materials uploaded",
        body="I uploaded the materials to input/. Use them first while I'm still on the road.",
    )

    # 5. Notification — Liu Ying's direct instruction
    return {
        "notification": (
            "[2026-03-18 14:50] "
            "Quick, breaking story! Huachuang Technology Park is on fire "
            "and Xiao Chen already sent the materials. "
            "First sort out what facts can be confirmed, and mark the contradictions. "
            "After that, create one breaking-news entry in the CMS and fill the fact-check sheet. "
            "Check the inbox — Xiao Chen sent an email.\n\n"
            "You use the managing editor's mailbox [email protected] to read and send emails. "
            "Contacts: [email protected] (Reporter Xiao Chen), "
            "[email protected] (Anonymous tipster).\n"
            "CMS is in Notion (database: news_db). "
            "Fact-check sheet is in Google Sheets (factcheck_001)."
        ),
        "time": "2026-03-18T14:50:00+08:00",
    }


async def stage1(ctx):
    """2026-03-18 16:00: Editor follow-up + press-briefing audio."""
    # 1. Loud: Xiao Chen emails Liu Ying about press briefing
    await ctx.email.send_email(
        from_user="reporter_chen",
        to="[email protected]",
        subject="Press briefing recording uploaded",
        body="I just got the fire-department press-briefing recording and uploaded it to the materials folder.",
    )

    # 2. Loud: Upload press briefing audio
    await ctx.fs.upload_file(
        ctx.task_dir / "inject" / "stage1" / "press_briefing_audio.mp3",
        "/workspace/input/",
    )

    # 3. Silent: Append "Evacuation Count" row to factcheck_001
    sheet_id = await ctx.google_sheets.get_spreadsheet_id(FACTCHECK_SHEET_NAME)
    if sheet_id:
        await ctx.google_sheets.append_rows(
            sheet_id, "Sheet1",
            [["Evacuation Count", "", "", "", "", "", ""]],
        )

    # 4. Notification — Liu Ying's direct questions + mention email
    return {
        "notification": (
            "[2026-03-18 16:00] "
            "What exactly is going on with the timing? 14:20 or 14:35? "
            "Someone in the video shouts that the third floor caught fire first. Can we write that? "
            "Also, did the same building have an incident last year? "
            "Xiao Chen emailed saying he got the press-briefing recording — go check it."
        ),
        "time": "2026-03-18T16:00:00+08:00",
    }


async def stage2(ctx):
    """2026-03-18 19:00: Tipster material + old-image trap + evening summary."""
    # 1. Loud: Anonymous tipster emails Liu Ying
    await ctx.email.send_email(
        from_user="tipster_wang",
        to="[email protected]",
        subject="Huachuang park penalty records",
        body="The Huachuang park property-management side had been penalized before. Check the attachment.",
    )

    # 2. Loud: Upload tipster's attachment
    await ctx.fs.upload_file(
        ctx.task_dir / "inject" / "stage2" / "safety_violation_notice.pdf",
        "/workspace/input/",
    )

    # 3. Loud: Xiao Chen emails Liu Ying with forwarded image
    await ctx.email.send_email(
        from_user="reporter_chen",
        to="[email protected]",
        subject="Forwarded scene photo from group chat",
        body="Someone in a group chat says this is also from the scene. Can we use it?",
    )

    # 4. Loud: Upload forwarded image
    await ctx.fs.upload_file(
        ctx.task_dir / "inject" / "stage2" / "forwarded_scene.jpg",
        "/workspace/input/",
    )

    # 5. Silent: Upload health commission bulletin (agent not notified)
    await ctx.fs.upload_file(
        ctx.task_dir / "inject" / "stage2" / "health_commission_bulletin.pdf",
        "/workspace/input/",
    )

    # 6. Notification — Liu Ying's direct instruction + mention emails
    return {
        "notification": (
            "[2026-03-18 19:00] "
            "I need an evening-summary version for the 19:30 night meeting. "
            "Also check the mailbox — there are a few new emails."
        ),
        "time": "2026-03-18T19:00:00+08:00",
    }

task2

Model Runs

Identity

Language

On Each Startup

Safety Rules

Output Formats

conflict_report.csv

evening_summary.md

CMS News Draft (news_db)

Fact-Check Sheet (factcheck_001)

Soul

Tools

Email (Mock Email MCP)

CMS (Mock Notion MCP)

Fact-Check Sheet (Mock Google Sheets)

File System

Terminal

User

`conflict_report.csv`

`evening_summary.md`

CMS News Draft (`news_db`)

Fact-Check Sheet (`factcheck_001`)