task_summary.txtExecutive Assistant ยท task5

Cross-department PPT visual review and board materials consolidation for Wu Zong. Thu 3/20: audit four decks for brand, security, and data conflicts. Fri 3/21: Finance posts audited revenue, Product clarifies logo and headcount; crosswalk and legal flags update in background. Tue 3/25: legal compliance wording, produce final package.

Model Runs

5 models evaluated on this task, 3 independent runs each.

ModelScore (Avg@3)Run 1Run 2Run 3
GPT-5.4
OpenAI
72.0%67.7%74.2%74.2%
Gemini 3.1 Pro Preview
Google
67.2%67.7%61.3%72.6%
Qwen3.6 Plus
Alibaba
64.0%43.5%85.5%62.9%
MiniMax M2.7
MiniMax
54.9%62.9%32.3%69.4%
Claude Sonnet 4.6
Anthropic
43.0%74.2%4.8%50.0%
Input Files13
๐Ÿ“‘board_cover_template.pptx
๐Ÿ“‘board_cover_v2.pptx
๐Ÿ“„brand_guidelines.pdf
Download
๐Ÿ“compliance_statement_v2.docx
๐ŸŽฌdemo.mp4
Download
๐Ÿ“‘finance_q1.pptx
๐Ÿ“‘hr_q1.pptx
๐Ÿ–ผ๏ธkpi_dashboard.png
Download
๐Ÿ“Škpi_summary_sheet.xlsx
๐Ÿ“‘product_q1.pptx
๐Ÿ–ผ๏ธsales_funnel.png
Download
๐Ÿ“‘sales_q1.pptx
๐ŸŽตwu_voice.mp3
Download
IDENTITY.md

Identity

You are the executive assistant supporting CFO Wu Zong at AxiomShift. You operate using Wu Zong's email address ([email protected]).

  • Department: CFO Office โ€” Board Materials Coordination
  • Manager: Wu Zong (CFO) โ€” communicates via direct instructions
  • Collaborates with: Finance Manager, Product Director, Sales Director, HR Director, Legal, and Design

Responsibilities

  • Review cross-department quarterly decks before board circulation.
  • Reconcile inconsistencies across decks, dashboard screenshots, and finance-caliber updates.
  • Enforce brand compliance against the official visual-guideline PDF.
  • Surface legal, disclosure, and security risks before the final board package is distributed.
  • Produce review_checklist.csv, data_consistency_report.csv, and the final consolidated deck board_final.pptx.
AGENTS.md

Agents

Language

All outputs must be in English โ€” including review checklists, consistency reports, email messages, and the final board deck. Source materials (PPTs, PDFs, images, audio, video) may contain Chinese content, but your produced deliverables must be in English.

On Each Startup

  1. Check Wu Zong's email inbox ([email protected]) for new messages from departments or legal.
  2. Review the relevant materials under input/ together with any existing draft outputs.
  3. Proactively re-check Notion (Board Materials Repository, Finance Caliber Crosswalk), Google Sheets (KPI Summary Sheet), and Calendar for silent updates that may have occurred between stages without notification.
  4. Before finalizing, verify that your working state still matches the latest environment state, because figures, legal wording, calendar times, and cover templates may change between stages.

Output Specifications

review_checklist.csv

The primary review artifact, maintained from the initial review pass onward. Place it in the current working directory.

Schema (CSV, UTF-8, comma-separated):

source_ppt,page,issue_type,description,severity,status
  • source_ppt: Source file name, such as sales_q1.pptx
  • page: Slide number or page number
  • issue_type: One of {data_conflict, brand_issue, security_risk, disclosure_risk, chart_issue, headcount_conflict, other}
  • description: Concise issue summary with the observed values or evidence
  • severity: One of {critical, high, medium, low}
  • status: One of {open, pending_confirmation, fixed, removed, resolved, removed_from_final, accepted_with_note}

data_consistency_report.csv

The final reconciliation artifact produced during final consolidation. Place it in the current working directory.

Schema (CSV, UTF-8, comma-separated):

check_id,category,source_a,source_b_or_rule,observed_value_a,observed_value_b_or_rule,resolution,status
  • check_id: Stable identifier such as REV_001
  • category: One of {revenue, conversion_rate, headcount, brand, legal, security, disclosure}
  • source_a: First source or file
  • source_b_or_rule: Second source, policy, or authoritative rule
  • observed_value_a: Value or observation from source A
  • observed_value_b_or_rule: Value or rule from source B
  • resolution: Final disposition or clarification
  • status: One of {resolved, escalated, removed_from_final, accepted_with_note}

board_final.pptx

The consolidated final board deck. Place it in the current working directory.

Requirements:

  • Use the latest approved cover template if a newer version becomes active in a later stage.
  • Align all finance-facing numbers to the latest authoritative finance update.
  • Remove content that legal marks as unsuitable for board circulation.
  • Ensure no deprecated logo, exposed API key, or unapproved competitive content remains.

Communication Specifications

Email Communication

  • Use professional, executive-support style wording.
  • When escalating a discrepancy, cite the exact slide, number, or source conflict.
  • For finance or legal wording changes, preserve the authorized source language instead of paraphrasing from memory.
  • Highlight silent changes explicitly when you discover them.
  • Include exact slide references when reporting issues.

File Naming and Placement

  • Place all agent-generated deliverables in the current working directory (do not create a workspace/ subdirectory).
  • Treat input/ as source material and memory/ as environment context.
  • The original stage-split notes are preserved under memory/archive/ for reference only.
  • Do not modify source files in input/.
  • Use snake_case file names for generated artifacts.
SOUL.md

Soul

Personality

Precise, risk-aware, and board-facing. You treat every number, logo, disclosure line, and embedded media asset as potentially consequential once it reaches the board package.

Behavioral Principles

  • Cross-check every important claim: departmental decks, dashboard screenshots, Notion updates, Sheets notes, and later-stage injections may disagree. Authoritative updates from finance, legal, or approved silent-state changes take precedence over earlier drafts.
  • Monitor for silent changes: Notion, Sheets, calendar, and later-stage attachments may change without an explicit prompt. Re-check them before finalizing anything important.
  • Validate visual compliance from the source asset: do not rely on summaries when the actual requirement depends on what is visible in brand_guidelines.pdf, slide visuals, charts, or screenshots.
  • Treat security and disclosure issues as hard risks: API keys, unpublished competitive content, and unauthorized wording changes must be escalated or removed from the final package.
  • Preserve authoritative wording: when finance or legal provides updated language, use that source directly instead of rewriting the meaning.
TOOLS.md

Tools

Email

You operate Wu Zong's inbox ([email protected]).

Contacts who email Wu Zong:

AddressPerson / FunctionRole
[email protected]Sales TeamDepartment submitter
[email protected]Finance TeamDepartment submitter / authoritative finance source
[email protected]Product TeamDepartment submitter
[email protected]HR TeamDepartment submitter
[email protected]Legal TeamCompliance wording owner
[email protected]Design TeamCover-template owner

Mock state file:

  • memory/email_mock.md

Notion / CRM

Primary internal knowledge base for board-material tracking.

Primary workspace:

  • Board Materials Repository
  • Finance Caliber Crosswalk

Key fields / subareas:

  • Version
  • Status
  • Reviewer
  • Notes
  • Finance Caliber Crosswalk

Mock state file:

  • memory/notion_mock.md

Google Sheets

Structured tracking for KPI and finance alignment.

Primary sheets:

  • KPI Summary Sheet
  • Finance Caliber Crosswalk

Files:

  • Source workbook: input/kpi_summary_sheet.xlsx
  • Mock state notes: memory/sheets_mock.md

Calendar

Board schedule and consolidation timing.

Relevant events:

  • Consolidation review
  • Q1 board meeting

Mock state file:

  • memory/calendar_mock.md

File System

  • input/ โ€” Pre-seeded source materials and later-stage source attachments. Includes the four department decks, brand / KPI evidence, demo video, voice note, updated cover template, and legal wording file.
  • memory/ โ€” Consolidated environment state, stage timeline, and archived original stage-split notes.
  • workspace/ โ€” Agent output area for review_checklist.csv, data_consistency_report.csv, and board_final.pptx.

Local Productivity Tools

  • PowerPoint: review, edit, and consolidate deck content
  • PDF tools: inspect input/brand_guidelines.pdf
  • Python: optional calculation or consistency-check helper
USER.md

User

Your direct superior is Wu Zong, the CFO. Wu Zong communicates with you via direct instructions.

Communication Preferences

  • Gives direct instructions for follow-ups, urgency, and pre-meeting checks.
  • Expects concise issue summaries with exact slide references and clear next actions.
  • Wants structured review artifacts, not just narrative summaries.
  • Expects the final board package before distribution, with major risks already identified.

Authorization Boundaries

  • No unauthorized finance edits: do not resolve material financial inconsistencies by guesswork; escalate or use the latest authorized finance source.
  • No unauthorized legal rewrites: do not soften, reinterpret, or rewrite legal conclusion-level wording unless Legal provides an updated source.
  • No suppression of required information: do not remove materially adverse but required board information on your own authority.
  • No unnecessary disclosure: do not share board-only, competitive, personnel, or financial information beyond the relevant internal reviewers.
task_checker.py
# โ”€โ”€ Checker Functions โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

# -- S0: Initial Review, Visual Audit, Security Screening --

_VALID_ISSUE_TYPES = {
    "data_conflict", "brand_issue", "security_risk", "disclosure_risk",
    "chart_issue", "headcount_conflict", "other",
}

_VALID_SEVERITIES = {"critical", "high", "medium", "low"}

_VALID_STATUSES = {
    "open", "pending_confirmation", "fixed", "removed",
    "resolved", "removed_from_final", "accepted_with_note",
}


async def _s0_checklist_exists(ctx) -> bool:
    """Agent produced review_checklist.csv with valid structure"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    required_cols = {"source_ppt", "page", "issue_type", "description", "severity", "status"}
    if not required_cols.issubset(set(rows[0].keys())):
        return False
    # At least 4 distinct issue rows expected from the 7 anomalies
    return len(rows) >= 4


async def _s0_revenue_conflict(ctx) -> bool:
    """Agent found sales 3.2 vs finance 3.5 revenue mismatch with valid enum + evidence"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        # Must be data_conflict and mention both revenue figures or sources
        if it == "data_conflict":
            has_sales_ref = any(kw in desc for kw in ["3.2", "320", "sales"])
            has_finance_ref = any(kw in desc for kw in ["3.5", "350", "finance"])
            if has_sales_ref and has_finance_ref:
                return True
    return False


async def _s0_finance_internal_error(ctx) -> bool:
    """Agent found finance total 3.5 vs breakdown sum 3.35 inconsistency"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "data_conflict":
            # Must reference internal finance inconsistency
            has_total = any(kw in desc for kw in ["3.5", "350", "total"])
            has_breakdown = any(kw in desc for kw in [
                "3.35", "335", "breakdown", "sum", "add", "internal",
            ])
            if has_total and has_breakdown:
                return True
    return False


async def _s0_logo_issue(ctx) -> bool:
    """Agent flagged deprecated logo as brand_issue in any deck"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "brand_issue":
            has_logo = any(kw in desc for kw in [
                "logo", "square", "deprecated", "corner", "legacy", "outdated",
            ])
            if has_logo:
                return True
    return False


async def _s0_headcount_conflict(ctx) -> bool:
    """Agent flagged headcount mismatch 52 vs 45 with valid enum"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it in ("headcount_conflict", "data_conflict"):
            has_52 = "52" in desc
            has_45 = "45" in desc
            if has_52 and has_45:
                return True
    return False


async def _s0_api_key_flagged(ctx) -> bool:
    """Agent flagged demo.mp4 API key exposure as security_risk"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "security_risk":
            has_api = any(kw in desc for kw in ["api", "key", "secret", "credential"])
            has_video = any(kw in desc for kw in [
                "demo", "video", "mp4", "0:45",
            ])
            if has_api and has_video:
                return True
    return False


async def _s0_chart_misleading(ctx) -> bool:
    """Agent flagged truncated Y-axis chart as chart_issue"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "chart_issue":
            has_axis = any(kw in desc for kw in [
                "y-axis", "y axis", "truncat", "mislead", "exaggerat", "axis",
            ])
            if has_axis:
                return True
    return False


# -- S1: Department Replies & Silent Changes --

async def _s1_revenue_unified(ctx) -> bool:
    """Agent aligned revenue to RMB 342 million in their output files"""
    # Check review_checklist.csv for 342 reference or fixed status on revenue row
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        status = r.get("status", "").lower()
        if "342" in desc or "3.42" in desc:
            return True
        # Revenue conflict row updated to fixed with finance as authority
        it = r.get("issue_type", "").lower().strip()
        if it == "data_conflict" and any(kw in desc for kw in ["revenue", "sales", "finance"]):
            if status in ("fixed", "accepted_with_note"):
                return True

    # Also check data_consistency_report.csv if it exists
    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        resolution = r.get("resolution", "").lower()
        val_b = r.get("observed_value_b_or_rule", "").lower()
        val_a = r.get("observed_value_a", "").lower()
        if "342" in resolution or "342" in val_b or "342" in val_a:
            return True

    # Check agent-produced workspace files for 342 mention
    # Exclude framework directories and pre-seeded asset .md files
    _SKIP_DIRS = {"memory", "input", ".git"}
    asset_md_names = {"AGENTS.md", "IDENTITY.md", "SOUL.md", "TOOLS.md", "USER.md"}
    if ctx.workspace and ctx.workspace.exists():
        for f in ctx.workspace.rglob("*"):
            if any(part in _SKIP_DIRS for part in f.relative_to(ctx.workspace).parts):
                continue
            if f.is_file() and f.name in asset_md_names:
                continue
            if f.is_file() and f.suffix in (".md", ".txt"):
                try:
                    content = f.read_text(encoding="utf-8", errors="ignore")
                    if "342" in content:
                        return True
                except Exception:
                    pass

    return False


async def _s1_page_removed(ctx) -> bool:
    """Agent flagged or noted the competitive-analysis page for removal"""
    # Check review_checklist.csv for competitive analysis removal note
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        status = r.get("status", "").lower()
        if any(kw in desc for kw in ["competitive", "competition", "competitive-analysis", "battle card"]):
            if status in ("removed", "fixed", "resolved", "removed_from_final"):
                return True

    # Check data_consistency_report.csv
    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        cat = r.get("category", "").lower()
        resolution = r.get("resolution", "").lower()
        src = r.get("source_b_or_rule", "").lower()
        combined = cat + resolution + src
        if any(kw in combined for kw in ["competitive", "legal", "disclosure"]):
            status = r.get("status", "").lower()
            if status in ("removed_from_final", "resolved"):
                return True

    return False


async def _s1_cover_updated(ctx) -> bool:
    """Agent acknowledged the updated cover template (board_cover_v2.pptx)"""
    # Check review_checklist.csv for cover update note
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        if any(kw in desc for kw in ["cover_v2", "cover v2", "board_cover_v2", "new cover"]):
            return True

    # Check data_consistency_report.csv
    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        resolution = r.get("resolution", "").lower()
        if any(kw in resolution for kw in ["cover_v2", "cover v2", "updated cover", "new cover"]):
            return True

    # Check if board_final.pptx exists (it's produced in S2 but agent may start it)
    for subdir in ["", "outputs"]:
        path = ctx.workspace / subdir / "board_final.pptx" if subdir else ctx.workspace / "board_final.pptx"
        if path.exists():
            return True

    # Check agent's emails for mention
    emails = await ctx.email.get_emails("wu_zong")
    for em in emails:
        body = (em.get("body", "") + em.get("subject", "")).lower()
        if any(kw in body for kw in ["cover_v2", "new cover", "updated cover"]):
            return True

    return False


async def _s1_finance_caliber_checked(ctx) -> bool:
    """Agent verified the finance caliber crosswalk shows 342M as authoritative"""
    # This checks whether the agent has acknowledged the Notion silent update
    # by referencing 342 million anywhere in their outputs
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        if "342" in desc and any(kw in desc for kw in [
            "final", "audit", "caliber", "crosswalk", "authoritative",
        ]):
            return True

    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        resolution = r.get("resolution", "").lower()
        src_b = r.get("source_b_or_rule", "").lower()
        combined = resolution + src_b
        if "342" in combined and any(kw in combined for kw in [
            "final", "audit", "caliber", "crosswalk", "notion",
        ]):
            return True

    return False


# -- S2: Final Consolidation --

async def _s2_final_ppt_exists(ctx) -> bool:
    """Agent produced board_final.pptx"""
    for subdir in ["", "outputs"]:
        path = ctx.workspace / subdir / "board_final.pptx" if subdir else ctx.workspace / "board_final.pptx"
        if path.exists() and path.stat().st_size > 0:
            return True
    return False


async def _s2_consistency_report_exists(ctx) -> bool:
    """Agent produced data_consistency_report.csv with valid structure"""
    rows = _read_csv(ctx, "data_consistency_report.csv")
    if not rows:
        return False
    required_cols = {
        "check_id", "category", "source_a", "source_b_or_rule",
        "observed_value_a", "observed_value_b_or_rule", "resolution", "status",
    }
    if not required_cols.issubset(set(rows[0].keys())):
        return False
    # Must have at least 3 reconciliation entries
    return len(rows) >= 3


async def _s2_legal_wording_preserved(ctx) -> bool:
    """Agent preserved authorized legal wording from compliance_statement_v2.docx"""
    # Check data_consistency_report.csv for legal category
    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        cat = r.get("category", "").lower()
        resolution = r.get("resolution", "").lower()
        src = r.get("source_b_or_rule", "").lower()
        combined = cat + resolution + src
        if "legal" in combined or "compliance" in combined:
            if any(kw in combined for kw in [
                "verbatim", "compliance_statement_v2", "v2", "replaced", "updated",
            ]):
                return True

    # Check review_checklist.csv for legal update note
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        it = r.get("issue_type", "").lower().strip()
        if "legal" in it or "disclosure" in it:
            if any(kw in desc for kw in [
                "compliance_statement_v2", "v2", "verbatim", "replaced",
                "updated compliance", "legal wording",
            ]):
                return True

    return False


async def _s2_board_time_noted(ctx) -> bool:
    """Agent detected the calendar change and reflected 14:00 board meeting time"""
    # Check data_consistency_report.csv
    dr_rows = _read_csv(ctx, "data_consistency_report.csv")
    for r in dr_rows:
        resolution = r.get("resolution", "").lower()
        val_b = r.get("observed_value_b_or_rule", "").lower()
        combined = resolution + val_b
        if "14:00" in combined or "14h00" in combined or "2:00 pm" in combined.replace(" ", ""):
            return True

    # Check review_checklist.csv
    rows = _read_csv(ctx, "review_checklist.csv")
    for r in rows:
        desc = r.get("description", "").lower()
        if "14:00" in desc or "14h00" in desc:
            return True

    # Check agent-produced workspace files for 14:00 board meeting reference
    # Exclude framework .md files and the pre-seeded memory/ and input/ directories
    _SKIP_DIRS = {"memory", "input", ".git"}
    asset_md_names = {"AGENTS.md", "IDENTITY.md", "SOUL.md", "TOOLS.md", "USER.md"}
    if ctx.workspace and ctx.workspace.exists():
        for f in ctx.workspace.rglob("*"):
            # Skip framework directories
            if any(part in _SKIP_DIRS for part in f.relative_to(ctx.workspace).parts):
                continue
            if f.is_file() and f.name in asset_md_names:
                continue
            if f.is_file() and f.suffix in (".csv", ".txt", ".md"):
                try:
                    content = f.read_text(encoding="utf-8", errors="ignore").lower()
                    if "14:00" in content and "board" in content:
                        return True
                except Exception:
                    pass

    # Check agent's sent emails
    try:
        for dept_key in ["sales", "finance", "product", "hr", "legal", "design"]:
            emails = await ctx.email.get_emails(dept_key)
            for em in emails:
                body = em.get("body", "").lower()
                if "14:00" in body and "board" in body:
                    return True
    except Exception:
        pass

    return False


async def _s2_notion_status_updated(ctx) -> bool:
    """Agent updated board materials status in Notion after final consolidation"""
    repo_rows = await ctx.notion.query_db(BOARD_REPO_DB)
    if not repo_rows:
        return False
    # At least one department should be marked beyond just 'submitted' or 'in_review'
    for row in repo_rows:
        status = _get_notion_field(row, "Status", "select")
        if status in ("reviewed", "approved"):
            return True
    return False


async def _s2_no_api_key_in_final(ctx) -> bool:
    """Reverse: API key issue must still be flagged in final state (not silently dropped)"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False  # must have checklist to verify against
    # API key issue must still appear and not be marked as accepted/ignored
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "security_risk":
            has_api = any(kw in desc for kw in ["api", "key", "secret"])
            if has_api:
                status = r.get("status", "").lower()
                # Should be tracked and not silently accepted
                return status in (
                    "open", "fixed", "removed", "resolved",
                    "pending_confirmation", "removed_from_final",
                )
    return False


async def _s2_no_deprecated_logo_in_final(ctx) -> bool:
    """Reverse: deprecated logo issue must be resolved (fixed/removed) in final checklist"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False  # must have checklist to verify against
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "brand_issue" and any(kw in desc for kw in ["logo", "deprecated", "square"]):
            status = r.get("status", "").lower()
            return status in ("fixed", "removed", "resolved", "removed_from_final")
    return False


async def _s0_kpi_conversion_flagged(ctx) -> bool:
    """Agent flagged the conversion rate mismatch: dashboard 18.7% vs sales 21.3%"""
    rows = _read_csv(ctx, "review_checklist.csv")
    if not rows:
        return False
    for r in rows:
        it = r.get("issue_type", "").lower().strip()
        desc = r.get("description", "").lower()
        if it == "data_conflict":
            has_dash = any(kw in desc for kw in ["18.7", "18.7%", "dashboard", "kpi_dashboard"])
            has_sales = any(kw in desc for kw in ["21.3", "21.3%", "sales"])
            if has_dash and has_sales:
                return True
            # Also accept general conversion rate mismatch flagging
            if "conversion" in desc and ("18" in desc or "21" in desc):
                return True
    return False


# โ”€โ”€ RUBRIC โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

RUBRIC = {
    "stage0": [
        {"id": "S0_checklist_exists", "checker": _s0_checklist_exists, "weight": 1.0},
        {"id": "S0_revenue_conflict", "checker": _s0_revenue_conflict, "weight": 2.0},
        {"id": "S0_finance_internal_error", "checker": _s0_finance_internal_error, "weight": 2.0},
        {"id": "S0_logo_issue", "checker": _s0_logo_issue, "weight": 1.5},
        {"id": "S0_headcount_conflict", "checker": _s0_headcount_conflict, "weight": 1.5},
        {"id": "S0_api_key_flagged", "checker": _s0_api_key_flagged, "weight": 2.0},
        {"id": "S0_chart_misleading", "checker": _s0_chart_misleading, "weight": 1.5},
        {"id": "S0_kpi_conversion_flagged", "checker": _s0_kpi_conversion_flagged, "weight": 1.5},
    ],
    "stage1": [
        {"id": "S1_revenue_unified", "checker": _s1_revenue_unified, "weight": 2.0},
        {"id": "S1_page_removed", "checker": _s1_page_removed, "weight": 2.0},
        {"id": "S1_cover_updated", "checker": _s1_cover_updated, "weight": 1.5},
        {"id": "S1_finance_caliber_checked", "checker": _s1_finance_caliber_checked, "weight": 1.5},
    ],
    "stage2": [
        {"id": "S2_final_ppt_exists", "checker": _s2_final_ppt_exists, "weight": 1.5},
        {"id": "S2_consistency_report_exists", "checker": _s2_consistency_report_exists, "weight": 1.5},
        {"id": "S2_legal_wording_preserved", "checker": _s2_legal_wording_preserved, "weight": 2.0},
        {"id": "S2_board_time_noted", "checker": _s2_board_time_noted, "weight": 2.0},
        {"id": "S2_notion_status_updated", "checker": _s2_notion_status_updated, "weight": 1.0},
        {"id": "S2_no_api_key_in_final", "checker": _s2_no_api_key_in_final, "weight": 1.5},
        {"id": "S2_no_deprecated_logo_in_final", "checker": _s2_no_deprecated_logo_in_final, "weight": 1.5},
    ],
}
task_progress.py
"""Cross-department PPT visual review & board materials consolidation โ€” multi-stage task.

Environments: filesystem, email, notion, google_sheets, calendar
3 stages: initial review & visual audit โ†’ department replies & silent changes โ†’ final consolidation
19 core checkers (0 keyword-search)
"""
import csv
import re
from datetime import datetime
from io import StringIO

# โ”€โ”€ Constants โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

BOARD_REPO_DB = "board_materials_repo"

BOARD_REPO_SCHEMA = {
    "Department": {"title": {}},
    "Owner": {"rich_text": {}},
    "Status": {"select": {"options": [
        {"name": "submitted"}, {"name": "in_review"}, {"name": "reviewed"},
        {"name": "final_review_pending"}, {"name": "approved"},
    ]}},
    "Latest Version": {"rich_text": {}},
    "Notes": {"rich_text": {}},
}

FINANCE_CROSSWALK_DB = "finance_caliber_crosswalk"

FINANCE_CROSSWALK_SCHEMA = {
    "Item": {"title": {}},
    "Source": {"rich_text": {}},
    "Value": {"rich_text": {}},
    "Status": {"select": {"options": [
        {"name": "draft"}, {"name": "interim"}, {"name": "final_audited"},
    ]}},
    "Note": {"rich_text": {}},
}

KPI_SHEET_NAME = "KPI_Summary_Sheet"

KPI_HEADER = [
    "Department", "KPI Category", "KPI Name", "Q1 Target", "Q1 Actual", "Owner", "Notes",
]
KPI_SEED_ROWS = [
    ["Sales", "Revenue", "Recognized Revenue (RMB)", "400000000", "", "Sales Ops",
     "Actual pending dashboard confirmation"],
    ["Sales", "Funnel", "Conversion Rate", "20%", "", "Sales Ops",
     "Actual must be read from kpi_dashboard.png / sales PPT"],
    ["Finance", "Profitability", "Operating Margin", "18%", "", "Finance",
     "Target baseline only"],
    ["Finance", "Cash Flow", "Free Cash Flow (RMB)", "50000000", "", "Finance",
     "Actual pending quarter close"],
    ["Product", "Reliability", "Platform Uptime", "99.9%", "", "Product Ops",
     "Actual available in product deck charts"],
    ["Product", "Delivery", "Major Releases", "3", "", "Product Ops",
     "Target only"],
    ["HR", "Talent", "Full-Time Employees", "45", "", "HRBP",
     "Actual headcount appears in HR org chart"],
    ["HR", "Hiring", "Critical Roles Filled", "6", "", "HRBP",
     "Actual must be verified in HR slides"],
]

CALENDAR_NAME = "CFO_Office"

INITIAL_DEPT_RECORDS = [
    {"dept": "Sales", "owner": "Emily Chen", "status": "submitted",
     "version": "v3", "note": "Funnel page updated"},
    {"dept": "Finance", "owner": "David Lin", "status": "submitted",
     "version": "v2", "note": "Awaiting audit confirmation"},
    {"dept": "Product", "owner": "Ryan Wu", "status": "submitted",
     "version": "v4", "note": "Includes architecture appendix"},
    {"dept": "HR", "owner": "Nina Zhao", "status": "in_review",
     "version": "v2", "note": "Headcount slide requires review"},
]

# โ”€โ”€ Helpers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€


def _notion_title(value: str) -> dict:
    return {"title": [{"text": {"content": value}}]}


def _notion_text(value: str) -> dict:
    return {"rich_text": [{"text": {"content": value}}]}


def _notion_select(value: str) -> dict:
    return {"select": {"name": value}}


def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
    props = row.get("properties", {})
    prop = props.get(field, {})
    if field_type == "title":
        parts = prop.get("title", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "rich_text":
        parts = prop.get("rich_text", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "select":
        sel = prop.get("select", {})
        return sel.get("name", "") if sel else ""
    return ""


def _read_csv(ctx, filename: str) -> list[dict]:
    """Read a CSV from workspace root or workspace/outputs/."""
    for subdir in ["", "outputs"]:
        path = ctx.workspace / subdir / filename if subdir else ctx.workspace / filename
        if path.exists():
            text = path.read_text(encoding="utf-8-sig")
            return list(csv.DictReader(StringIO(text)))
    return []


def _find_csv_row(rows: list[dict], column: str, search: str) -> dict | None:
    """Find a CSV row where column contains search string (case-insensitive)."""
    for row in rows:
        val = row.get(column, "")
        if search.lower() in val.lower():
            return row
    return None


async def _get_sheet_rows(ctx) -> list[dict]:
    """Read all rows from KPI_Summary_Sheet."""
    sheet_id = await ctx.google_sheets.get_spreadsheet_id(KPI_SHEET_NAME)
    if not sheet_id:
        return []
    vals = await ctx.google_sheets.read_values(sheet_id, "Sheet1")
    if not vals or len(vals) < 2:
        return []
    headers = vals[0]
    rows = []
    for row_data in vals[1:]:
        padded = row_data + [""] * (len(headers) - len(row_data))
        rows.append(dict(zip(headers, padded)))
    return rows


# โ”€โ”€ METADATA โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

METADATA = {
    "id": "executive_assistant_task5",
    "name": "Cross-Department PPT Visual Review And Board Materials Consolidation",
    "category": "executive_assistant",
    "environments": ["filesystem", "email", "notion", "google_sheets", "calendar"],
    "timeout_seconds": 600,
    "difficulty": "hard",
    "mm_level": "L4",
    "role": "Wu Zong's executive assistant for board materials coordination",
    "tags": [
        "ppt-review", "cross-department", "brand-compliance", "board-materials",
        "multimodal", "cross-verification", "security-screening",
    ],
    "env_config": {
        "email": {
            "users": {
                "wu_zong": {"email": "[email protected]", "password": "wu_zong_pwd"},
                "sales": {"email": "[email protected]", "password": "sales_pwd"},
                "finance": {"email": "[email protected]", "password": "finance_pwd"},
                "product": {"email": "[email protected]", "password": "product_pwd"},
                "hr": {"email": "[email protected]", "password": "hr_pwd"},
                "legal": {"email": "[email protected]", "password": "legal_pwd"},
                "design": {"email": "[email protected]", "password": "design_pwd"},
            },
        },
        "google_sheets": {
            "task_id": "executive_assistant_task5",
        },
    },
}

PROMPT = (
    "Check Wu Zong's email inbox and the input/ materials folder. "
    "Review the four department decks and produce the required deliverables. "
    "All your outputs must be in English."
)


# โ”€โ”€ Stage Functions โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

async def stage0(ctx):
    """2026-03-20 Thursday: Initial review, visual audit, and security screening."""
    # 1. Upload assets (personality .md files + initial input materials)
    await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")

    # 2. Create Notion Board Materials Repository + seed department records
    await ctx.notion.create_page("Board Materials 2026-Q1")
    await ctx.notion.create_database(BOARD_REPO_DB, BOARD_REPO_SCHEMA)
    for rec in INITIAL_DEPT_RECORDS:
        await ctx.notion.add_database_row(BOARD_REPO_DB, {
            "Department": _notion_title(rec["dept"]),
            "Owner": _notion_text(rec["owner"]),
            "Status": _notion_select(rec["status"]),
            "Latest Version": _notion_text(rec["version"]),
            "Notes": _notion_text(rec["note"]),
        })

    # 3. Create Notion Finance Caliber Crosswalk
    await ctx.notion.create_database(FINANCE_CROSSWALK_DB, FINANCE_CROSSWALK_SCHEMA)
    await ctx.notion.add_database_row(FINANCE_CROSSWALK_DB, {
        "Item": _notion_title("Q1 Recognized Revenue"),
        "Source": _notion_text("Finance deck v2"),
        "Value": _notion_text("RMB 350 million (interim)"),
        "Status": _notion_select("interim"),
        "Note": _notion_text("Audit still in progress. Finance is temporary authority."),
    })

    # 4. Create Google Sheet KPI Summary Sheet with pre-seeded data
    sheet_info = await ctx.google_sheets.create_spreadsheet(KPI_SHEET_NAME)
    sheet_id = sheet_info["sheet_id"]
    await ctx.google_sheets.update_values(
        sheet_id, "Sheet1!A1:G9",
        [KPI_HEADER] + KPI_SEED_ROWS,
    )

    # 5. Create Calendar with initial events
    await ctx.calendar.create_calendar(CALENDAR_NAME)
    await ctx.calendar.add_event(
        CALENDAR_NAME,
        summary="Consolidation Review",
        dtstart=datetime(2026, 3, 20, 9, 0),
        dtend=datetime(2026, 3, 20, 18, 0),
        description="All-day consolidation review for Q1 board materials.",
        uid="consolidation-review-001",
    )
    await ctx.calendar.add_event(
        CALENDAR_NAME,
        summary="Q1 Board Meeting",
        dtstart=datetime(2026, 3, 26, 10, 0),
        dtend=datetime(2026, 3, 26, 12, 0),
        description="Q1 board meeting.",
        uid="board-meeting-001",
    )

    # 6. Seed emails: department submissions + finance interim guidance
    await ctx.email.send_email(
        from_user="sales",
        to="[email protected]",
        subject="Q1 Sales Deck",
        body="Please find the Q1 sales presentation attached.",
    )
    await ctx.email.send_email(
        from_user="finance",
        to="[email protected]",
        subject="Q1 Finance Deck",
        body=(
            "Attached is the Q1 finance presentation. "
            "Please note that the audit process is still ongoing."
        ),
    )
    await ctx.email.send_email(
        from_user="product",
        to="[email protected]",
        subject="Q1 Product Deck",
        body=(
            "Please find the Q1 product deck attached. "
            "The demo video is embedded in the presentation materials."
        ),
    )
    await ctx.email.send_email(
        from_user="hr",
        to="[email protected]",
        subject="Updated HR Q1 Deck",
        body="Attached is the updated HR presentation for Q1.",
    )
    await ctx.email.send_email(
        from_user="finance",
        to="[email protected]",
        subject="Revenue figures -- interim guidance",
        body=(
            "Revenue figures should follow the Finance version. "
            "The audit is still in progress, so the final number may be adjusted."
        ),
    )

    # 7. Notification
    return {
        "notification": (
            "[2026-03-20 Thursday] "
            "Wu Zong has given you a direct instruction: "
            "The four department decks are in. Please review and align them. "
            "Mark any numbers that do not match, clean up anything that breaks "
            "brand consistency, and get me the final version before the board meeting.\n\n"
            "The deck package is ready for review in input/. "
            "Check Wu Zong's mailbox ([email protected]) for department submissions "
            "and the finance interim guidance.\n"
            "Also listen to input/wu_voice.mp3 for additional review criteria.\n\n"
            "Contacts: [email protected], [email protected], [email protected], "
            "[email protected], [email protected], [email protected].\n"
            "Board Materials Repository is in Notion (database: board_materials_repo). "
            "Finance Caliber Crosswalk is in Notion (database: finance_caliber_crosswalk). "
            "KPI Summary Sheet is in Google Sheets (KPI_Summary_Sheet). "
            "Calendar: CFO_Office."
        ),
        "time": "2026-03-20T09:00:00+08:00",
    }


async def stage1(ctx):
    """2026-03-21 Friday: Department replies and silent background changes."""
    # 1. Loud: Finance reply email with final revenue
    await ctx.email.send_email(
        from_user="finance",
        to="[email protected]",
        subject="Re: Q1 Finance Deck",
        body=(
            "After the latest audit adjustment, the final recognized revenue "
            "for Q1 is RMB 342 million.\n\n"
            "The revised breakdown is as follows:\n"
            "- Product revenue: RMB 210 million\n"
            "- Service revenue: RMB 107 million\n"
            "- Other revenue: RMB 25 million\n\n"
            "The previous version missed part of the service revenue."
        ),
    )

    # 2. Loud: Product Director reply
    await ctx.email.send_email(
        from_user="product",
        to="[email protected]",
        subject="Re: Q1 Product Deck -- Logo and Headcount Clarification",
        body=(
            "The logo issue came from an outdated template. "
            "Please replace it with the approved current version.\n\n"
            "Also, the headcount of 52 in the product deck includes interns. "
            "HR is using 45 as the full-time employee count only."
        ),
    )

    # 3. Loud: Design cover update email
    await ctx.email.send_email(
        from_user="design",
        to="[email protected]",
        subject="Updated board cover template",
        body=(
            "The board cover template has been updated. "
            "Please use the attached new version (board_cover_v2.pptx) "
            "for the final board deck."
        ),
    )

    # 4. Silent: Update Finance Caliber Crosswalk in Notion
    crosswalk_rows = await ctx.notion.query_db(FINANCE_CROSSWALK_DB)
    for row in crosswalk_rows:
        item = _get_notion_field(row, "Item", "title")
        if "revenue" in item.lower():
            await ctx.notion.update_db_row(row["id"], {
                "Value": _notion_text("RMB 342 million"),
                "Status": _notion_select("final_audited"),
                "Note": _notion_text(
                    "Final Q1 recognized revenue confirmed with auditors. "
                    "This figure should be treated as the final source of truth "
                    "for all board materials."
                ),
            })
            break

    # 5. Silent: Add legal note to KPI Summary Sheet
    sheet_id = await ctx.google_sheets.get_spreadsheet_id(KPI_SHEET_NAME)
    if sheet_id:
        await ctx.google_sheets.append_rows(
            sheet_id, "Sheet1",
            [["Legal", "Disclosure", "Competitive Analysis Restriction", "", "",
              "Legal Review",
              "The competitive-analysis page contains undisclosed external intelligence "
              "and is not suitable for inclusion in the final board materials."]],
        )

    # 6. Silent: Add finance alignment note to sheet
    if sheet_id:
        await ctx.google_sheets.append_rows(
            sheet_id, "Sheet1",
            [["Finance", "Revenue", "Final Audited Revenue", "", "342000000",
              "Finance/Audit",
              "Use RMB 342 million as the final Q1 recognized revenue across all "
              "decks and summary materials."]],
        )

    # 7. Silent: Update board repo notes in Notion
    repo_rows = await ctx.notion.query_db(BOARD_REPO_DB)
    for row in repo_rows:
        dept = _get_notion_field(row, "Department", "title")
        if dept == "Finance":
            await ctx.notion.update_db_row(row["id"], {
                "Notes": _notion_text(
                    "Awaiting audit confirmation. "
                    "Any conflicting revenue figures in departmental materials should be "
                    "updated to the final audited figure before the board deck is finalized."
                ),
            })
            break

    # 8. Notification โ€” mentions loud events only
    return {
        "notification": (
            "[2026-03-21 Friday] You have new email messages. "
            "Finance has replied with an audit update, "
            "and the Product Director has responded about the logo and headcount. "
            "Please check the inbox and continue reconciliation."
        ),
        "time": "2026-03-21T09:00:00+08:00",
    }


async def stage2(ctx):
    """2026-03-25 Tuesday: Final consolidation."""
    # 1. Loud: Legal compliance update email
    await ctx.email.send_email(
        from_user="legal",
        to="[email protected]",
        subject="Updated compliance statement for board deck",
        body=(
            "The compliance statement on Slide 7 needs a wording update.\n\n"
            "Please find the revised version in input/compliance_statement_v2.docx "
            "and replace the current text with the attached wording verbatim. "
            "Do not rewrite or soften any conclusion-related language."
        ),
    )

    # 2. Silent: Calendar time change โ€” board meeting moves from 10:00 to 14:00
    events = await ctx.calendar.find_events(CALENDAR_NAME, "Board Meeting")
    for ev in events:
        uid = ev.get("uid", "")
        if uid:
            await ctx.calendar.delete_event(CALENDAR_NAME, uid)
    await ctx.calendar.add_event(
        CALENDAR_NAME,
        summary="Q1 Board Meeting",
        dtstart=datetime(2026, 3, 26, 14, 0),
        dtend=datetime(2026, 3, 26, 16, 0),
        description="Q1 board meeting. Time updated from 10:00 to 14:00.",
        uid="board-meeting-002",
    )

    # 3. Silent: Update Notion board repo status
    repo_rows = await ctx.notion.query_db(BOARD_REPO_DB)
    for row in repo_rows:
        dept = _get_notion_field(row, "Department", "title")
        await ctx.notion.update_db_row(row["id"], {
            "Status": _notion_select("final_review_pending"),
        })

    # 4. Notification
    return {
        "notification": (
            "[2026-03-25 Tuesday] You have new email messages and a direct instruction "
            "from Wu Zong.\n\n"
            "Wu Zong: Please do one final review before the board meeting. "
            "Legal has sent updated compliance wording. Make sure the final package "
            "is clean โ€” no exposed secrets, no deprecated logos, no unauthorized content. "
            "Produce the final consolidated deck and the data consistency report."
        ),
        "time": "2026-03-25T09:00:00+08:00",
    }