Roles/hr/task3
task_summary.txtHR Β· task3

Investigate three interview-compliance incidents and coordinate legal, managers, and candidates across the week. Mon 3/25: review recordings, scorecards, policy; classify violations and alert HR Manager. Tue 3/26: answer Candidate A's complaint, send Legal the fertility-question evidence, coach Interview C. Fri 3/29: file weekly report and follow up on open items.

Model Runs

5 models evaluated on this task, 3 independent runs each.

ModelScore (Avg@3)Run 1Run 2Run 3
Qwen3.6 Plus
Alibaba
57.5%38.7%45.2%88.7%
GPT-5.4
OpenAI
37.6%38.7%38.7%35.5%
Claude Sonnet 4.6
Anthropic
36.6%38.7%32.3%38.7%
Gemini 3.1 Pro Preview
Google
27.9%43.5%0.0%40.3%
MiniMax M2.7
MiniMax
25.8%24.2%35.5%17.7%
Input Files10
πŸ–ΌοΈcomplaint_email_screenshot.png
Download
πŸ“interview_A_transcript.txt
Download
πŸ“interview_B_transcript.txt
Download
πŸ“interview_C_transcript.txt
Download
πŸ“„interview_policy.pdf
Download
πŸ“interview_schedule.csv
Download
πŸ“score_sheet.csv
Download
πŸ“Šscore_sheet.xlsx
πŸ“scoring_system_log.txt
Download
πŸ“weekly_interviews_all.csv
Download
IDENTITY.md

IDENTITY

You are Zhou Ting (周婷), HR Operations Specialist at Xinghai Technology, reporting to HR Manager Wu Lei and working closely with Legal Counsel Chen Lvshi.

Role Overview

  • Position: HR Operations Specialist
  • Department: Human Resources
  • Reporting Line: HR Manager Wu Lei ([email protected])
  • Key Collaborations: Legal Counsel Chen Lvshi ([email protected])

Primary Responsibilities

  • Monitor interview recordings, score sheets, and process compliance
  • Identify interviewer violations, scoring errors, and process deviations
  • Handle candidate complaints and coordinate legal risk assessments
  • Update ATS exception tracking and sync with HR Manager and Legal
  • Generate weekly interview quality reports

Core Expertise

  • Interview process compliance and company policy interpretation
  • Multi-modal evidence analysis (video transcripts, score sheets, policy documents, screenshots)
  • Risk assessment and violation categorization
  • Stakeholder communication across candidates, interviewers, HR, and legal teams
AGENTS.md

AGENTS β€” Output Specifications

Output File Specifications

All structured outputs go to /workspace/outputs/.

interview_exception_triage.csv (Stage 0)

Initial exception assessment. One row per interview.

ColumnTypeAllowed Values
interview_idstringINT_2024_03_25_A, INT_2024_03_25_B, INT_2024_03_25_C
candidatestringcandidate name
violation_typeenumscore_conflict, fertility, process_deviation, discrimination, none
risk_levelenumhigh, medium, low
legal_escalationenumyes, no
notestringbrief description of the finding

weekly_summary.csv (Stage 2)

Weekly metrics report. One row per metric.

ColumnTypeDescription
metricstringmetric name (e.g., total_interviews, exception_count, exception_rate_pct)
valuestringmetric value

Required metrics:

  • total_interviews β€” total interviews this week (23)
  • exception_count β€” number of exceptions (3)
  • exception_rate_pct β€” exception rate as percentage (13.0)
  • score_conflict_count β€” count of score conflict violations
  • fertility_count β€” count of fertility question violations
  • process_deviation_count β€” count of process deviation violations
  • corrected_count β€” interviews with status=corrected
  • legal_pending_count β€” interviews with status=legal_pending
  • closed_count β€” interviews with status=closed

ATS (Notion) Update Requirements

Database: interview_exception_2024

Fields

FieldTypeAllowed Values
Interview IDtitleINT_2024_03_25_A, INT_2024_03_25_B, INT_2024_03_25_C
Candidatetextcandidate name
Interviewertextinterviewer name
Violation Typeselectscore_conflict, fertility, process_deviation, discrimination, none
Risk Levelselecthigh, medium, low
Statusselectopen, investigating, corrected, coached, legal_review, legal_pending, closed
Legal Escalation Requiredselectyes, no
Root Causetextinvestigation findings
Notestextadditional notes, timestamps, evidence

Status Workflow

  • Stage 0: open (initial triage)
  • Stage 1: A β†’ corrected, B β†’ legal_review, C β†’ coached
  • Stage 2: A β†’ closed, B β†’ legal_pending, C β†’ closed

Email Communication Standards

Subject Format

[Interview Exception] {Brief Description} - {Interview ID}

Templates

  • HR Manager alerts: Concise summary + triage list + key evidence timestamps
  • Candidate responses: Apologetic, fact-based, legally safe β€” no admission of subjective fault
  • Legal coordination: Structured evidence package with specific questions and policy references
  • Interviewer coaching: Constructive feedback with policy section citations
SOUL.md

SOUL

Core Principles

Compliance-First Mindset

  • Always reference official policies before making determinations
  • Proactively identify legal and reputational risks
  • Rely on concrete evidence rather than assumptions

Empathetic Communication

  • Balance company interests with fair candidate treatment
  • Provide clear information while avoiding premature conclusions
  • Maintain professional, respectful tone with all stakeholders

Analytical Precision

  • Examine all available evidence before drawing conclusions
  • Cross-reference findings across multiple sources and modalities
  • Look beyond surface symptoms to identify root causes

Proactive Initiative

  • Consider implications and next steps before they become urgent
  • Track and follow up on all open items
  • Suggest process improvements based on identified patterns

Behavioral Guidelines

  1. Verify policy compliance before making any determinations
  2. Document all findings with evidence and timestamps
  3. Escalate high-risk violations promptly to legal counsel
  4. Maintain candidate confidentiality with internal stakeholders
  5. Use systematic investigation methods β€” do not jump to conclusions
  6. Balance thoroughness with efficiency
  7. Communicate findings clearly to all audiences
  8. Follow up on all commitments and open items
TOOLS.md

TOOLS

Available Environments

Email (Mock Email MCP)

AccountEmailRole
Your account[email protected]HR Operations Specialist
HR Manager[email protected]Wu Lei
Legal Counsel[email protected]Chen Lvshi
Candidate A[email protected]For complaint resolution
Interviewer C[email protected]Wang Engineer (for coaching)

ATS β€” Notion (Mock Notion MCP)

  • Database: interview_exception_2024
  • Operations: Create records, update status, add notes, track legal escalation

File System

  • input/ β€” read-only materials: transcripts, score sheets, policy PDF, complaint screenshot, interview schedule
  • outputs/ β€” your deliverables: CSVs, reports, evidence packages

Input Materials in input/

FileDescription
interview_A_transcript.txtInterview A full transcript
interview_B_transcript.txtInterview B full transcript β€” contains policy violation
interview_C_transcript.txtInterview C full transcript β€” shows overtime
score_sheet.csvInterview scores (also available as score_sheet.xlsx)
scoring_system_log.txtScoring system audit log β€” shows score entry timestamps and overrides
interview_schedule.csvSchedule with durations and consent flags
weekly_interviews_all.csvFull week's interview log (23 interviews, 3 flagged + 20 normal)
interview_policy.pdfCompany interview policy (Β§2.3, Β§4.1, Β§5.1, Β§5.2)
complaint_email_screenshot.pngCandidate A's complaint email screenshot
USER.md

USER

Your Manager: HR Manager Wu Lei (吴磊)

Communication Style

  • Results-oriented: focuses on actionable outcomes and clear next steps
  • Risk-aware: prioritizes legal compliance and company reputation
  • Prefers brief status updates with clear escalation indicators

Authorization Boundaries

  • Full authority: compliance monitoring, initial assessments, interviewer coaching
  • Escalation required: high-risk violations (discrimination, forbidden questions) β†’ Legal
  • Limited authority: cannot make final decisions on candidate compensation or legal settlements

Current Priorities

  • Ensure interview compliance with company policies and legal requirements
  • Maintain positive candidate experience while protecting company interests
  • Strengthen HR-Legal coordination for risk mitigation

Legal Counsel: Chen Lvshi (ι™ˆεΎ‹εΈˆ)

Working Relationship

  • Regular coordination on compliance matters and risk assessments
  • Expects structured evidence packages with specific legal questions
  • 24-hour response expectation for high-priority legal matters
  • All consultations must include supporting documentation

Legal Guidance Areas

  • Discrimination and harassment assessment
  • Candidate complaint risk evaluation and response strategy
  • Policy compliance interpretation
  • Documentation standards for potential legal proceedings
task_checker.py
# ── Checker Functions ─────────────────────────────────────────────

# ---------- S0: Initial Investigation & Triage ----------


async def _s0_triage_csv_has_3_rows(ctx):
    """interview_exception_triage.csv has 3 rows covering interviews A, B, C"""
    rows = _read_csv(ctx, "interview_exception_triage.csv")
    if len(rows) < 3:
        return False
    ids_found = set()
    for r in rows:
        iid = r.get("interview_id", "").upper()
        for suffix in ("_A", "_B", "_C"):
            if suffix in iid:
                ids_found.add(suffix)
    return len(ids_found) >= 3


async def _s0_violation_a_score_conflict(ctx):
    """Triage row A: normalized violation_type=score_conflict, risk_level=medium or high"""
    rows = _read_csv(ctx, "interview_exception_triage.csv")
    row = _find_csv_row(rows, "interview_id", "_A")
    if not row:
        return False
    canonical = _normalize_violation(row.get("violation_type", ""))
    if canonical != "score_conflict":
        return False
    # Accept both medium (pure score mismatch) and high (complaint elevates risk)
    return row.get("risk_level", "").lower() in ("medium", "high")


async def _s0_violation_b_fertility(ctx):
    """Triage row B: normalized violation_type=fertility, risk=high, legal_escalation=yes"""
    rows = _read_csv(ctx, "interview_exception_triage.csv")
    row = _find_csv_row(rows, "interview_id", "_B")
    if not row:
        return False
    canonical = _normalize_violation(row.get("violation_type", ""))
    if canonical != "fertility":
        return False
    if row.get("risk_level", "").lower() != "high":
        return False
    legal = row.get("legal_escalation", "").lower()
    return legal in ("yes", "true")


async def _s0_violation_c_process(ctx):
    """Triage row C: normalized violation_type=process_deviation"""
    rows = _read_csv(ctx, "interview_exception_triage.csv")
    row = _find_csv_row(rows, "interview_id", "_C")
    if not row:
        return False
    canonical = _normalize_violation(row.get("violation_type", ""))
    return canonical == "process_deviation"


async def _s0_ats_3_records_created(ctx):
    """Notion ATS has at least 3 exception records"""
    rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    return len(rows) >= 3


async def _s0_ats_b_legal_escalation_only(ctx):
    """Notion: only Interview B has Legal Escalation Required=yes"""
    rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    b_escalated = False
    others_escalated = False
    for row in rows:
        iid = _get_notion_field(row, "Interview ID", "title").upper()
        legal = _get_notion_field(
            row, "Legal Escalation Required", "select",
        ).lower()
        if "_B" in iid:
            b_escalated = legal == "yes"
        elif "_A" in iid or "_C" in iid:
            if legal == "yes":
                others_escalated = True
    return b_escalated and not others_escalated


async def _s0_manager_alert_sent(ctx):
    """Wu Lei received β‰₯1 email from agent (S0 alert)"""
    emails = await ctx.email.get_emails("wulei")
    return len(emails) >= 1


async def _s0_csv_notion_consistency(ctx):
    """Cross-env: normalized triage CSV violation types match Notion for β‰₯2 of 3"""
    csv_rows = _read_csv(ctx, "interview_exception_triage.csv")
    if len(csv_rows) < 3:
        return False
    notion_rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    if len(notion_rows) < 3:
        return False
    # Build map: interview suffix β†’ normalized Notion violation type
    notion_map = {}
    for row in notion_rows:
        iid = _get_notion_field(row, "Interview ID", "title").upper()
        vtype = _get_notion_field(row, "Violation Type", "select")
        for suffix in ("_A", "_B", "_C"):
            if suffix in iid:
                notion_map[suffix] = _normalize_violation(vtype)
    # Check CSV rows: normalize and compare
    matches = 0
    for r in csv_rows:
        iid = r.get("interview_id", "").upper()
        csv_canonical = _normalize_violation(r.get("violation_type", ""))
        for suffix in ("_A", "_B", "_C"):
            if suffix in iid and suffix in notion_map:
                if csv_canonical and csv_canonical == notion_map[suffix]:
                    matches += 1
    return matches >= 2


# ---------- S1: Complaint Response & Coordination ----------


async def _s1_ats_a_root_cause_entry_error(ctx):
    """Notion row A: root_cause has cause token + evidence token, status=corrected"""
    row = await _find_notion_row(ctx, "INT_2024_03_25_A")
    if not row:
        return False
    root_cause = _get_notion_field(row, "Root Cause", "rich_text").lower()
    if len(root_cause) < 10:
        return False
    # Require one cause-category token
    cause_tokens = (
        "entry", "error", "input", "typo", "录ε…₯", "data",
        "mistake", "bug", "system", "incorrect",
    )
    has_cause = any(kw in root_cause for kw in cause_tokens)
    # Require one evidence token referencing the score anomaly
    evidence_tokens = (
        "override", "18", "1", "auto", "manual", "ζ‰‹εŠ¨",
        "改写", "score", "total", "mismatch", "conflict",
    )
    has_evidence = any(kw in root_cause for kw in evidence_tokens)
    if not (has_cause and has_evidence):
        return False
    status = _get_notion_field(row, "Status", "select").lower()
    return status in ("corrected", "closed")


async def _s1_candidate_response_sent(ctx):
    """Candidate A received β‰₯1 email (investigation response)"""
    emails = await ctx.email.get_emails("candidate_a")
    return len(emails) >= 1


async def _s1_legal_packet_sent(ctx):
    """Legal Counsel received β‰₯1 email mentioning Interview B"""
    emails = await ctx.email.get_emails("chen_lvshi")
    if len(emails) < 1:
        return False
    # At least one email should reference Interview B or fertility
    for e in emails:
        content = (e.get("subject", "") + " " + e.get("body", "")).lower()
        if any(kw in content for kw in (
            "_b", "interview b", "fertility", "forbidden",
            "ζŽζ€»η›‘", "Β§2.3", "08:23",
        )):
            return True
    return False


async def _s1_coaching_email_sent(ctx):
    """Interview C interviewer (Wang Engineer) received β‰₯1 coaching email"""
    emails = await ctx.email.get_emails("wang_engineer")
    return len(emails) >= 1


async def _s1_ats_statuses_updated(ctx):
    """Notion: A=corrected, B=legal_review, C=coached"""
    rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    statuses = {}
    for row in rows:
        iid = _get_notion_field(row, "Interview ID", "title").upper()
        status = _get_notion_field(row, "Status", "select").lower()
        if "_A" in iid:
            statuses["A"] = status
        elif "_B" in iid:
            statuses["B"] = status
        elif "_C" in iid:
            statuses["C"] = status
    if len(statuses) < 3:
        return False
    a_ok = statuses.get("A", "") in ("corrected", "closed")
    b_ok = statuses.get("B", "") in ("legal_review", "legal_pending",
                                      "investigating")
    c_ok = statuses.get("C", "") in ("coached", "closed", "corrected")
    return a_ok and b_ok and c_ok


async def _s1_no_discrimination_label_a(ctx):
    """REVERSE: Interview A must NOT be labelled discrimination (it is a data entry error)"""
    row = await _find_notion_row(ctx, "INT_2024_03_25_A")
    if not row:
        return False  # agent must at least create the record
    vtype = _get_notion_field(row, "Violation Type", "select").lower()
    return vtype != "discrimination"


async def _s1_manager_cc_on_response(ctx):
    """Wu Lei received β‰₯2 emails (S0 alert + S1 update / CC on candidate reply)"""
    emails = await ctx.email.get_emails("wulei")
    return len(emails) >= 2


# ---------- S2: Weekly Summary & Follow-up ----------


async def _s2_weekly_summary_csv(ctx):
    """weekly_summary.csv has required metrics with plausible values"""
    rows = _read_csv(ctx, "weekly_summary.csv")
    if not rows:
        return False
    metrics = {}
    for r in rows:
        key = r.get("metric", "").lower().replace(" ", "_")
        metrics[key] = r.get("value", "")
    # Require at least 4 distinct metric rows
    if len(metrics) < 4:
        return False
    # Must have total interviews
    has_total = any("total" in k and "interview" in k for k in metrics)
    if not has_total:
        return False
    # Must have exception count or rate
    has_exception = any("exception" in k for k in metrics)
    has_rate = any("rate" in k for k in metrics)
    if not (has_exception or has_rate):
        return False
    # Cross-verify total interviews == 23 (grounded by weekly_interviews_all.csv)
    for k, v in metrics.items():
        if "total" in k and "interview" in k:
            try:
                total = int(v)
                if total != 23:
                    return False
            except ValueError:
                return False
    # Cross-verify exception rate (should be ~13%, accept 5%-25%)
    for k, v in metrics.items():
        if "rate" in k:
            try:
                rate = float(v.replace("%", ""))
                if rate < 5 or rate > 25:
                    return False
            except ValueError:
                pass
    # Cross-verify against Notion: closed + legal_pending count
    notion_rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    notion_count = len(notion_rows)
    for k, v in metrics.items():
        if "exception" in k and "count" in k:
            try:
                csv_count = int(v)
                if csv_count != notion_count:
                    return False
            except ValueError:
                pass
    return True


async def _s2_report_email_sent(ctx):
    """Wu Lei received β‰₯3 emails (S0 alert + S1 CC + S2 weekly report)"""
    emails = await ctx.email.get_emails("wulei")
    return len(emails) >= 3


async def _s2_ats_final_statuses(ctx):
    """Notion: A=closed, B=legal_pending (not closed), C=closed"""
    rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    statuses = {}
    for row in rows:
        iid = _get_notion_field(row, "Interview ID", "title").upper()
        status = _get_notion_field(row, "Status", "select").lower()
        if "_A" in iid:
            statuses["A"] = status
        elif "_B" in iid:
            statuses["B"] = status
        elif "_C" in iid:
            statuses["C"] = status
    a_ok = statuses.get("A", "") == "closed"
    b_ok = statuses.get("B", "") in ("legal_pending", "legal_review")
    c_ok = statuses.get("C", "") == "closed"
    return a_ok and b_ok and c_ok


async def _s2_legal_followup_sent(ctx):
    """Legal Counsel received β‰₯2 emails; second mentions B follow-up"""
    emails = await ctx.email.get_emails("chen_lvshi")
    if len(emails) < 2:
        return False
    # At least one email should be a follow-up/reminder
    for e in emails:
        content = (e.get("subject", "") + " " + e.get("body", "")).lower()
        if any(kw in content for kw in (
            "follow", "reminder", "status", "update", "pending",
            "weekly", "report",
        )):
            return True
    return False


async def _s2_candidate_followup_sent(ctx):
    """Candidate A received β‰₯2 emails (S1 response + S2 satisfaction follow-up)"""
    emails = await ctx.email.get_emails("candidate_a")
    return len(emails) >= 2


async def _s2_no_self_closed_b(ctx):
    """REVERSE: Interview B must NOT be closed/corrected β€” still awaiting legal"""
    row = await _find_notion_row(ctx, "INT_2024_03_25_B")
    if not row:
        return False  # agent must at least have the record
    status = _get_notion_field(row, "Status", "select").lower()
    return status not in ("closed", "corrected", "coached")


# ── RUBRIC ────────────────────────────────────────────────────────

RUBRIC = {
    "stage0": [
        {"id": "S0_triage_csv_has_3_rows",
         "checker": _s0_triage_csv_has_3_rows, "weight": 1.0},
        {"id": "S0_violation_a_score_conflict",
         "checker": _s0_violation_a_score_conflict, "weight": 1.5},
        {"id": "S0_violation_b_fertility",
         "checker": _s0_violation_b_fertility, "weight": 2.0},
        {"id": "S0_violation_c_process",
         "checker": _s0_violation_c_process, "weight": 1.0},
        {"id": "S0_ats_3_records_created",
         "checker": _s0_ats_3_records_created, "weight": 1.0},
        {"id": "S0_ats_b_legal_escalation_only",
         "checker": _s0_ats_b_legal_escalation_only, "weight": 2.0},
        {"id": "S0_manager_alert_sent",
         "checker": _s0_manager_alert_sent, "weight": 1.0},
        {"id": "S0_csv_notion_consistency",
         "checker": _s0_csv_notion_consistency, "weight": 1.5},
    ],
    "stage1": [
        {"id": "S1_ats_a_root_cause_entry_error",
         "checker": _s1_ats_a_root_cause_entry_error, "weight": 2.0},
        {"id": "S1_candidate_response_sent",
         "checker": _s1_candidate_response_sent, "weight": 1.0},
        {"id": "S1_legal_packet_sent",
         "checker": _s1_legal_packet_sent, "weight": 1.5},
        {"id": "S1_coaching_email_sent",
         "checker": _s1_coaching_email_sent, "weight": 1.0},
        {"id": "S1_ats_statuses_updated",
         "checker": _s1_ats_statuses_updated, "weight": 1.5},
        {"id": "S1_no_discrimination_label_a",
         "checker": _s1_no_discrimination_label_a, "weight": 2.0},
        {"id": "S1_manager_cc_on_response",
         "checker": _s1_manager_cc_on_response, "weight": 1.0},
    ],
    "stage2": [
        {"id": "S2_weekly_summary_csv",
         "checker": _s2_weekly_summary_csv, "weight": 1.5},
        {"id": "S2_report_email_sent",
         "checker": _s2_report_email_sent, "weight": 1.0},
        {"id": "S2_ats_final_statuses",
         "checker": _s2_ats_final_statuses, "weight": 1.5},
        {"id": "S2_legal_followup_sent",
         "checker": _s2_legal_followup_sent, "weight": 2.0},
        {"id": "S2_candidate_followup_sent",
         "checker": _s2_candidate_followup_sent, "weight": 2.0},
    ],
    "final": [
        {"id": "S2_no_self_closed_b",
         "checker": _s2_no_self_closed_b, "weight": 2.0},
    ],
}
task_progress.py
"""Interview compliance violation & resolution β€” multi-environment multi-stage task.

Environments: filesystem, email, notion
3 stages: investigation & triage β†’ complaint response & coordination β†’ weekly summary & follow-up
21 core checkers (0 keyword-search)
"""

import csv
from io import StringIO

# ── Constants ─────────────────────────────────────────────────────

EXCEPTION_DB_NAME = "interview_exception_2024"

EXCEPTION_DB_SCHEMA = {
    "Interview ID": {"title": {}},
    "Candidate": {"rich_text": {}},
    "Interviewer": {"rich_text": {}},
    "Violation Type": {"select": {"options": [
        {"name": "score_conflict"}, {"name": "fertility"},
        {"name": "process_deviation"}, {"name": "discrimination"},
        {"name": "harassment"}, {"name": "none"},
    ]}},
    "Risk Level": {"select": {"options": [
        {"name": "high"}, {"name": "medium"}, {"name": "low"},
    ]}},
    "Status": {"select": {"options": [
        {"name": "open"}, {"name": "investigating"},
        {"name": "corrected"}, {"name": "coached"},
        {"name": "legal_review"}, {"name": "legal_pending"},
        {"name": "closed"},
    ]}},
    "Legal Escalation Required": {"select": {"options": [
        {"name": "yes"}, {"name": "no"},
    ]}},
    "Root Cause": {"rich_text": {}},
    "Notes": {"rich_text": {}},
}


# ── Helpers ───────────────────────────────────────────────────────


def _notion_title(value: str) -> dict:
    return {"title": [{"text": {"content": value}}]}


def _notion_text(value: str) -> dict:
    return {"rich_text": [{"text": {"content": value}}]}


def _notion_select(value: str) -> dict:
    return {"select": {"name": value}}


def _read_csv(ctx, filename: str) -> list[dict]:
    path = ctx.workspace / "outputs" / filename
    if not path.exists():
        return []
    text = path.read_text(encoding="utf-8-sig")
    return list(csv.DictReader(StringIO(text)))


def _find_csv_row(rows: list[dict], column: str, search: str) -> dict | None:
    """Find a CSV row where *column* contains *search* (case-insensitive)."""
    for row in rows:
        val = row.get(column, "")
        if search.lower() in val.lower():
            return row
    return None


def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
    """Extract a typed field value from a Notion query-result row."""
    props = row.get("properties", {})
    prop = props.get(field, {})
    if field_type == "title":
        parts = prop.get("title", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "rich_text":
        parts = prop.get("rich_text", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "select":
        sel = prop.get("select", {})
        return sel.get("name", "") if sel else ""
    return ""


async def _find_notion_row(ctx, interview_id_fragment: str) -> dict | None:
    """Find a Notion row whose Interview ID contains *interview_id_fragment*."""
    rows = await ctx.notion.query_db(EXCEPTION_DB_NAME)
    for row in rows:
        rid = _get_notion_field(row, "Interview ID", "title")
        if interview_id_fragment.lower() in rid.lower():
            return row
    return None


def _normalize_violation(raw: str) -> str:
    """Map free-text violation labels to canonical enum values."""
    raw = raw.lower().strip()
    if any(kw in raw for kw in ("score", "scoring", "mismatch")):
        return "score_conflict"
    if any(kw in raw for kw in ("fertility", "forbidden", "pregnancy",
                                 "child", "marital")):
        return "fertility"
    if any(kw in raw for kw in ("process", "overtime", "extension",
                                 "duration", "consent")):
        return "process_deviation"
    if "discrimin" in raw:
        return "discrimination"
    return raw


# ── METADATA ──────────────────────────────────────────────────────

METADATA = {
    "id": "hr_task3",
    "name": "Interview Compliance Violation & Resolution",
    "category": "hr",
    "environments": ["filesystem", "email", "notion"],
    "timeout_seconds": 600,
    "difficulty": "hard",
    "mm_level": "L4",
    "role": "Zhou Ting, HR Operations Specialist at Xinghai Technology",
    "tags": [
        "hr", "compliance", "interview", "legal",
        "multimodal", "violation", "triage",
    ],
    "env_config": {
        "email": {
            "users": {
                "hr_ops": {
                    "email": "[email protected]",
                    "password": "hrops_pwd",
                },
                "wulei": {
                    "email": "[email protected]",
                    "password": "wulei_pwd",
                },
                "chen_lvshi": {
                    "email": "[email protected]",
                    "password": "chen_pwd",
                },
                "candidate_a": {
                    "email": "[email protected]",
                    "password": "canda_pwd",
                },
                "wang_engineer": {
                    "email": "[email protected]",
                    "password": "wang_pwd",
                },
            },
        },
    },
}

PROMPT = (
    "Review the interview recordings, score sheets, and process compliance "
    "for today's three interviews."
)


# ── Stage Functions ───────────────────────────────────────────────


async def stage0(ctx):
    """Monday 2024-03-25 18:00: Initial investigation & exception identification."""
    # 1. Upload all assets (personality .md files + input/ materials)
    await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")

    # 2. Create Notion exception-tracking database (starts empty)
    await ctx.notion.create_page("Interview Exception Tracking 2024")
    await ctx.notion.create_database(EXCEPTION_DB_NAME, EXCEPTION_DB_SCHEMA)

    # 3. Loud: initial instruction email from HR Manager
    await ctx.email.send_email(
        from_user="wulei",
        to="[email protected]",
        subject="[Urgent] Three Interview Issues Today β€” Please Review",
        body=(
            "Zhou Ting,\n\n"
            "Three interviews today had issues. Please review the recordings, "
            "score sheets, and process compliance in input/. Determine if there "
            "are interviewer violations, scoring errors, or complaints requiring "
            "escalation, then sync results with me and Legal.\n\n"
            "Materials in your workspace:\n"
            "- interview_A/B/C_transcript.txt\n"
            "- score_sheet.csv (also .xlsx)\n"
            "- interview_schedule.csv\n"
            "- scoring_system_log.txt\n"
            "- interview_policy.pdf\n"
            "- complaint_email_screenshot.png\n\n"
            "Create exception records in the ATS (Notion) and send me a summary.\n\n"
            "Wu Lei"
        ),
    )

    # 4. Notification β€” loud events + Feishu context
    return {
        "notification": (
            "[Monday 2024-03-25 18:00]\n\n"
            "[Feishu #hr-operations] Wu Lei (17:45): "
            "Today's three interviews all had abnormal feedback. "
            "Zhou Ting please review ASAP, materials in the shared drive.\n\n"
            "[Feishu #hr-operations] Chen Lvshi (17:50): "
            "Noted. If there are any high-risk policy violations, "
            "loop me in immediately.\n\n"
            "Your email: [email protected]\n"
            "HR Manager Wu Lei: [email protected]\n"
            "Legal Counsel Chen Lvshi: [email protected]\n"
            "ATS exception tracking is in Notion "
            "(database: interview_exception_2024).\n\n"
            "Please review all materials in input/, identify violations, "
            "create ATS exception records, and send an alert email to "
            "HR Manager with your triage findings."
        ),
        "time": "2024-03-25T18:00:00+08:00",
    }


async def stage1(ctx):
    """Tuesday 2024-03-26 10:00: Complaint injection & multi-party coordination."""
    # 1. Loud: Candidate A formal complaint
    await ctx.email.send_email(
        from_user="candidate_a",
        to="[email protected]",
        subject="[Formal Complaint] Interview Scoring β€” INT_2024_03_25_A",
        body=(
            "Ms. Zhou,\n\n"
            "I am writing to formally file a complaint about my interview "
            "scoring. During the interview, the interviewer gave me very "
            'positive feedback, saying I had "great technical depth, clear '
            'communication, very excellent." However, my final score was '
            "extremely low, which makes no sense.\n\n"
            "Is it fair that I performed well but received such a low score? "
            "Please investigate and reply with the results in writing.\n\n"
            "Candidate A"
        ),
    )

    # 2. Loud: Legal Counsel CC
    await ctx.email.send_email(
        from_user="chen_lvshi",
        to="[email protected]",
        subject="RE: [Urgent] Interview Compliance β€” Legal Risk Assessment",
        body=(
            "Zhou Ting,\n\n"
            "I've been informed about the interview issues. Please note:\n"
            "- Complaints involving potential discrimination or forbidden "
            "questions require legal risk assessment within 24 hours.\n"
            "- Send me the complete evidence package for any high-risk "
            "violations.\n"
            "- For candidate communications, ensure legal-safe wording.\n\n"
            "Chen Lvshi\nLegal Counsel"
        ),
    )

    # 3. Notification β€” loud events + Feishu context
    return {
        "notification": (
            "[Tuesday 2024-03-26 10:00]\n\n"
            "[Feishu #hr-operations] Wu Lei (09:30): "
            "Candidate A just called about the scoring issue. "
            "Handle carefully β€” Legal is watching.\n\n"
            "[Email] Candidate A filed a formal complaint about Interview A "
            "scoring. Legal Counsel Chen Lvshi requires risk assessment "
            "within 24 hours.\n\n"
            "Candidate A email: [email protected]\n"
            "Interview C interviewer (Wang Engineer): [email protected]\n\n"
            "Please:\n"
            "1. Deep-investigate Interview A root cause and respond to "
            "candidate\n"
            "2. Compile evidence package for Interview B and send to Legal\n"
            "3. Send coaching reminder to Interview C interviewer\n"
            "4. Update all ATS records with current status"
        ),
        "time": "2024-03-26T10:00:00+08:00",
    }


async def stage2(ctx):
    """Friday 2024-03-29 16:00: Weekly summary & proactive follow-up."""
    # No new emails or data injected β€” agent must self-trigger

    return {
        "notification": (
            "[Friday 2024-03-29 16:00]\n\n"
            "[Calendar] Recurring reminder: Weekly Interview Exception "
            "Report due every Friday 4:00 PM.\n\n"
            "[Feishu #hr-operations] Wu Lei (15:55): "
            "Weekly report due today. Don't forget to check open items.\n\n"
            "Compile the weekly interview metrics, update ATS final statuses, "
            "and send the report to HR Manager (CC Legal). "
            "Check if any open items need follow-up."
        ),
        "time": "2024-03-29T16:00:00+08:00",
    }