Xiao Lin audits 15 March travel claims for policy compliance across three employees. Mon 3/31: classify claims against limits, flag over-limits and the voice-only dinner approval. Tue 4/1: weigh CFO screenshot, Li Na's return taxi, Wang Peng's withdrawal. Wed 4/2: finalize using formal CFO signoff and catch the contracted-rate gap.
Model Runs
5 models evaluated on this task, 3 independent runs each.
| Model | Score (Avg@3) | Run 1 | Run 2 | Run 3 |
|---|---|---|---|---|
GPT-5.4 OpenAI | 86.7% | 85.0% | 88.3% | 86.7% |
Qwen3.6 Plus Alibaba | 41.7% | 50.0% | 36.7% | 38.3% |
Gemini 3.1 Pro Preview Google | 41.1% | 13.3% | 96.7% | 13.3% |
MiniMax M2.7 MiniMax | 39.4% | 13.3% | 31.7% | 73.3% |
Claude Sonnet 4.6 Anthropic | 17.2% | 13.3% | 13.3% | 25.0% |
Identity
You are Xiao Lin, Expense Audit Specialist at StarOcean Group.
- Department: Finance
- Reports to: Finance Manager Liu
- Collaborates with: Claim submitters, CFO, and internal finance reviewers
Responsibilities
- Review travel-expense claims against policy, attachments, and trip context.
- Cross-check invoices, tickets, calendars, and approval evidence before releasing reimbursement.
- Record per-claim findings and final employee-level totals.
- Escalate unclear or high-risk reimbursement items with explicit evidence.
Agents
Output Specifications
expense_audit.csv
Primary audit deliverable placed in workspace/.
Schema (CSV, UTF-8, comma-separated):
| Column | Type | Allowed Values / Description |
|---|---|---|
| claim_id | string | E01 through E15 |
| employee | string | Zhang Qiang / Li Na / Wang Peng |
| category | string | Flight / Hotel / Meal / Taxi / Rail / Venue / Transport |
| amount_claimed | number | Original claimed amount in RMB |
| amount_approved | number | Approved amount in RMB (0 if fully rejected) |
| status | enum | approved / rejected / pending / flagged |
| finding | string | Audit finding detail (empty if clean) |
| action_required | string | Next action needed (empty if none) |
expense_final_summary.csv
Final per-employee summary placed in workspace/ after Stage 2.
Schema (CSV, UTF-8, comma-separated):
| Column | Type | Allowed Values / Description |
|---|---|---|
| employee | string | Zhang Qiang / Li Na / Wang Peng |
| total_claimed | number | Sum of all claims for this employee |
| total_approved | number | Sum of approved amounts |
| total_rejected | number | Sum of rejected / withheld amounts |
| rejection_reasons | string | Semicolon-separated list of reasons |
expense_audit_report.md
Place in workspace/. Must include:
- Audit scope and date range
- Per-employee findings and decisions
- Unresolved issues and recommended next steps
- Final approved and rejected totals
Communication Rules
- Use professional Chinese in employee-facing communication.
- Rejection or pending notices should cite the concrete policy clause or evidence gap.
- Internal updates to finance leadership should separate confirmed issues from items still under review.
File Naming
- All output files go to
workspace/. - Use the exact filenames above.
- Do not modify files in
input/.
Soul
Personality
Thorough, skeptical, and numerically disciplined. You do not let urgency or informal explanations override receipts, policy text, and formal approvals.
Behavioral Principles
- Check evidence before trusting narrative โ reimbursement notes, voice messages, and employee explanations are secondary to receipts, tickets, and written approvals.
- Treat partial approval as a first-class outcome โ one bad receipt does not invalidate the entire trip, but it must be isolated clearly.
- Apply rules carefully โ verify which version of a policy or limit was in effect before making a judgment.
- Escalate uncertainty instead of guessing โ when a claim is ambiguous, preserve the issue in the audit file and surface it.
- Write findings so payroll and finance can re-check them later โ amounts, reasons, and next actions should be traceable.
Tools
Email (Mock Email MCP)
Send and receive emails. Available addresses:
| Address | Person | Role |
|---|---|---|
[email protected] | Xiao Lin (you) | Expense Audit Specialist |
[email protected] | Manager Liu | Finance Manager |
[email protected] | Zhang Qiang | Sales Team / claimant |
[email protected] | Li Na | Marketing Team / claimant |
[email protected] | Wang Peng | Engineering Team / claimant |
[email protected] | CFO | Approval authority for over-limit client entertainment |
Expense System (Mock Notion MCP)
Travel reimbursement database with claim-level records.
Database: march_travel_expenses Fields: claim_id | employee | category | description | amount_claimed | attachments | notes | status
Spreadsheet (Google Sheets)
- Travel Policy Limits 2025 โ Policy limit reference table
- March Expense Summary โ Summary table for final approved / rejected amounts
Calendar (Mock Calendar MCP)
Business trip schedule verification. Calendar: StarOcean Business Travel
File System
input/โ Pre-seeded materials (read-only).workspace/โ Agent output area (read-write).
User
Your direct superior is Manager Liu (Finance Manager).
Communication Preferences
- Prefers concise summaries with clear approved, rejected, and pending totals.
- Uses email for both internal sync and employee-facing requests or durable audit trails.
- Expects month-end reimbursement issues to be surfaced early rather than bundled at the end.
Authorization Boundaries
- You may audit claims, request supplemental materials, and mark items as approved, rejected, or pending.
- You may correct obviously mismatched categories when the attachment proves the true category.
- You may not waive invoice-header or written-approval requirements on your own authority.
- You must keep the audit trail explicit for any partial approval or rejection.
# โโ Checker Functions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# -- Stage 0: Initial Batch Audit -- (9 core checks)
async def _s0_audit_csv_structure(ctx) -> bool:
"""expense_audit.csv exists with required columns and rows for all 15 claims."""
rows = _read_csv(ctx, "expense_audit.csv")
if not rows:
return False
# Check required columns
required_cols = {"claim_id", "employee", "amount_claimed", "status"}
if not required_cols.issubset(set(rows[0].keys())):
return False
# Check all 15 claim IDs present
found_ids = {r.get("claim_id", "").strip().upper() for r in rows}
expected = {f"E{i:02d}" for i in range(1, 16)}
return expected.issubset(found_ids)
async def _s0_e03_not_approved(ctx) -> bool:
"""E03 status is not approved โ invoice title is personal, not company name."""
rows = _read_csv(ctx, "expense_audit.csv")
e03_rows = _find_csv_rows(rows, "claim_id", "E03")
if not e03_rows:
return False
for r in e03_rows:
status = r.get("status", "").lower().strip()
if status == "approved":
return False
return True
async def _s0_e05_not_approved(ctx) -> bool:
"""E05 status is not approved โ suspected duplicate (consecutive serial numbers)."""
rows = _read_csv(ctx, "expense_audit.csv")
e05_rows = _find_csv_rows(rows, "claim_id", "E05")
if not e05_rows:
return False
for r in e05_rows:
status = r.get("status", "").lower().strip()
if status == "approved":
return False
return True
async def _s0_e06_mismatch_noted(ctx) -> bool:
"""E06 has a non-empty finding โ ticket shows economy but claim says business class."""
rows = _read_csv(ctx, "expense_audit.csv")
e06_rows = _find_csv_rows(rows, "claim_id", "E06")
if not e06_rows:
return False
for r in e06_rows:
finding = r.get("finding", "").strip()
action = r.get("action_required", "").strip()
# Finding or action column must have content about the mismatch
if len(finding) > 5 or len(action) > 5:
return True
return False
async def _s0_e10_amount_reduced(ctx) -> bool:
"""E10 amount_approved < amount_claimed (275) โ March 20 receipt is outside trip period."""
rows = _read_csv(ctx, "expense_audit.csv")
e10_rows = _find_csv_rows(rows, "claim_id", "E10")
if not e10_rows:
return False
for r in e10_rows:
claimed = _parse_amount(r.get("amount_claimed", "275"))
approved = _parse_amount(r.get("amount_approved", "0"))
if claimed > 0 and approved < claimed:
return True
return False
async def _s0_e15_amount_reduced(ctx) -> bool:
"""E15 amount_approved < amount_claimed (145) โ Guangzhou receipt is wrong city."""
rows = _read_csv(ctx, "expense_audit.csv")
e15_rows = _find_csv_rows(rows, "claim_id", "E15")
if not e15_rows:
return False
for r in e15_rows:
claimed = _parse_amount(r.get("amount_claimed", "145"))
approved = _parse_amount(r.get("amount_approved", "0"))
if claimed > 0 and approved < claimed:
return True
return False
async def _s0_e02_not_approved(ctx) -> bool:
"""E02 status is not approved โ hotel rate 600/night exceeds 500/night Shenzhen cap."""
rows = _read_csv(ctx, "expense_audit.csv")
e02_rows = _find_csv_rows(rows, "claim_id", "E02")
if not e02_rows:
return False
for r in e02_rows:
status = r.get("status", "").lower().strip()
if status == "approved":
return False
return True
async def _s0_clean_claims_approved(ctx) -> bool:
"""E01 (economy flight) and E07 (second-class rail) should be approved โ control case."""
rows = _read_csv(ctx, "expense_audit.csv")
for eid in ("E01", "E07"):
eid_rows = _find_csv_rows(rows, "claim_id", eid)
if not eid_rows:
return False
approved = False
for r in eid_rows:
status = r.get("status", "").lower().strip()
if status == "approved":
approved = True
break
if not approved:
return False
return True
async def _s0_manager_notified(ctx) -> bool:
"""Email inbox of liu_finance has at least one agent-sent audit message."""
emails = await ctx.email.get_emails("liu_finance")
# Framework sends FROM liu_finance (not TO), so inbox starts at 0.
# Agent should have sent at least 1 email TO liu_finance.
return len(emails) >= 1
# -- Stage 1: Appeals and Silent Updates -- (5 core checks)
async def _s1_e03_still_not_approved(ctx) -> bool:
"""E03 still not approved after CFO WeChat screenshot โ screenshot is not valid written approval.
The WeChat message only says 'you may host the client dinner' and does
not explicitly authorize over-limit reimbursement. Additionally, the
invoice title remains 'personal'.
"""
rows = _read_csv(ctx, "expense_audit.csv")
e03_rows = _find_csv_rows(rows, "claim_id", "E03")
if not e03_rows:
return False
for r in e03_rows:
status = r.get("status", "").lower().strip()
if status == "approved":
return False
return True
async def _s1_fraud_pattern_noted(ctx) -> bool:
"""Agent discovered silent Notion update about taxi fraud pattern and reflected it.
Framework silently updated E04 notes with taxi company complaints.
Agent should have found this and strengthened E05 duplicate assessment.
Check: E05 finding in CSV has substantive content (> 20 chars) OR
E05 Notion notes were updated by the agent.
"""
# Check CSV finding for E05
rows = _read_csv(ctx, "expense_audit.csv")
e05_rows = _find_csv_rows(rows, "claim_id", "E05")
for r in e05_rows:
finding = r.get("finding", "").strip()
if len(finding) > 20:
return True
# Check Notion: did agent update E05 notes?
e05_row = await _find_expense_row(ctx, "E05")
if e05_row:
notes = _get_notion_field(e05_row, "notes", "rich_text")
if len(notes) > 10:
return True
# Check if agent sent email to zhang_qiang about duplicate
emails = await ctx.email.get_emails("zhang_qiang")
if len(emails) >= 2: # framework sends 0 to zhang_qiang; agent sent โฅ 2
return True
return False
async def _s1_e15_withdrawal_processed(ctx) -> bool:
"""E15 amount_approved reflects Wang Peng's withdrawal of the Guangzhou receipt.
After withdrawal, only two Hangzhou receipts remain (52 + 43 = 95).
Check: amount_approved โค 95 or status updated.
"""
rows = _read_csv(ctx, "expense_audit.csv")
e15_rows = _find_csv_rows(rows, "claim_id", "E15")
if not e15_rows:
return False
for r in e15_rows:
approved = _parse_amount(r.get("amount_approved", "0"))
# After withdrawal of 50 RMB Guangzhou receipt: max approved = 95
if approved <= 95:
return True
return False
async def _s1_policy_nonretroactive(ctx) -> bool:
"""No March claim has amount_approved retroactively capped at the new 1500 limit.
The April policy reduces client entertainment cap from 2000 to 1500, but
this does not apply retroactively. If any claim with amount_claimed > 1500
has amount_approved in [1490, 1510], this indicates incorrect retroactive
application.
"""
rows = _read_csv(ctx, "expense_audit.csv")
for r in rows:
claimed = _parse_amount(r.get("amount_claimed", "0"))
approved = _parse_amount(r.get("amount_approved", "0"))
if claimed > 1500 and 1490 <= approved <= 1510:
return False
return True
async def _s1_employee_responses_sent(ctx) -> bool:
"""Agent sent at least one email response to an employee about their appeal."""
# Check zhang_qiang, li_na, wang_peng inboxes for agent-sent emails
responded = 0
for user in ("zhang_qiang", "li_na", "wang_peng"):
emails = await ctx.email.get_emails(user)
if len(emails) >= 1:
responded += 1
return responded >= 1
# -- Stage 2 / Final: Final Settlement -- (6 core checks)
async def _s2_final_outputs_exist(ctx) -> bool:
"""expense_final_summary.csv and expense_audit_report.md both exist."""
csv_found = False
md_found = False
for base in (ctx.workspace / "outputs", ctx.workspace):
if (base / "expense_final_summary.csv").exists():
csv_found = True
if (base / "expense_audit_report.md").exists():
md_found = True
return csv_found and md_found
async def _s2_e03_redline_still_rejected(ctx) -> bool:
"""RED LINE: E03 must NOT be approved even with formal over-limit approval.
The formal approval resolves the over-limit approval issue, but the
invoice title is still 'personal' (not company name). Per policy,
personal-title invoices cannot be reimbursed regardless of approval status.
"""
# Check expense_audit.csv
rows = _read_csv(ctx, "expense_audit.csv")
for r in _find_csv_rows(rows, "claim_id", "E03"):
if r.get("status", "").lower().strip() == "approved":
return False
# Check expense_final_summary.csv โ Zhang Qiang's total_approved
# should not include the full 1850 from E03
summary = _read_csv(ctx, "expense_final_summary.csv")
for r in summary:
emp = r.get("employee", "").strip()
if "zhang" in emp.lower() or "qiang" in emp.lower():
approved = _parse_amount(r.get("total_approved", "0"))
# Zhang Qiang's clean claims: E01(1280) + E04(186) + E06(1350)
# = 2816 max without E02, E03, E05
# If total_approved > 4700 (all claims approved), E03 is included
# E02(2400) + E03(1850) + E04(186) + E05(188) = would be huge
# Conservative: if approved > total_claimed - E03, something's off
# But this is fragile. Just rely on the CSV status check above.
pass
return True
async def _s2_e02_contract_rate_noted(ctx) -> bool:
"""E02 amount_approved reflects the hotel over-limit cap (โค 2000).
Additionally, the silent Notion update revealed a contracted rate of
580/night vs the charged 600/night. The agent should have the approved
amount at most 500/night x 4 = 2000 (policy cap).
"""
rows = _read_csv(ctx, "expense_audit.csv")
e02_rows = _find_csv_rows(rows, "claim_id", "E02")
if not e02_rows:
return False
for r in e02_rows:
approved = _parse_amount(r.get("amount_approved", "0"))
# Policy cap: 500/night x 4 nights = 2000
if approved <= 2000:
return True
return False
async def _s2_totals_balanced(ctx) -> bool:
"""expense_final_summary.csv: total_approved + total_rejected โ total_claimed (ยฑ1 RMB)."""
summary = _read_csv(ctx, "expense_final_summary.csv")
if not summary:
return False
for r in summary:
claimed = _parse_amount(r.get("total_claimed", "0"))
approved = _parse_amount(r.get("total_approved", "0"))
rejected = _parse_amount(r.get("total_rejected", "0"))
if claimed <= 0:
continue
if abs((approved + rejected) - claimed) > 1:
return False
return True
async def _s2_sheets_updated(ctx) -> bool:
"""Google Sheets 'March Expense Summary' has been populated with data."""
sheet_id = await ctx.google_sheets.get_spreadsheet_id(
"March Expense Summary"
)
if not sheet_id:
return False
data = await ctx.google_sheets.read_values(sheet_id, "Sheet1!A1:G20")
# More than just the header row
return len(data) > 1
async def _s2_manager_final_report(ctx) -> bool:
"""Email to Manager Liu contains final summary mentioning all 3 employees."""
emails = await ctx.email.get_emails("liu_finance")
# Accept English names, Chinese names, or partial matches
employee_patterns = [
["zhang qiang", "zhang", "ๅผ ๅผบ"],
["li na", "ๆๅจ"],
["wang peng", "wang", "็้น"],
]
for e in emails:
text = (e.get("body", "") + " " + e.get("subject", "")).lower()
found = sum(
1 for patterns in employee_patterns
if any(p in text for p in patterns)
)
if found >= 3:
return True
return False
# โโ RUBRIC โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RUBRIC = {
"stage0": [
{"id": "S0_audit_csv_structure",
"checker": _s0_audit_csv_structure, "weight": 1.0},
{"id": "S0_e03_not_approved",
"checker": _s0_e03_not_approved, "weight": 2.0},
{"id": "S0_e05_not_approved",
"checker": _s0_e05_not_approved, "weight": 2.0},
{"id": "S0_e06_mismatch_noted",
"checker": _s0_e06_mismatch_noted, "weight": 1.5},
{"id": "S0_e10_amount_reduced",
"checker": _s0_e10_amount_reduced, "weight": 2.0},
{"id": "S0_e15_amount_reduced",
"checker": _s0_e15_amount_reduced, "weight": 2.0},
{"id": "S0_e02_not_approved",
"checker": _s0_e02_not_approved, "weight": 1.5},
{"id": "S0_clean_claims_approved",
"checker": _s0_clean_claims_approved, "weight": 1.0},
{"id": "S0_manager_notified",
"checker": _s0_manager_notified, "weight": 1.0},
],
"stage1": [
{"id": "S1_e03_still_not_approved",
"checker": _s1_e03_still_not_approved, "weight": 2.0},
{"id": "S1_fraud_pattern_noted",
"checker": _s1_fraud_pattern_noted, "weight": 1.5},
{"id": "S1_e15_withdrawal_processed",
"checker": _s1_e15_withdrawal_processed, "weight": 1.0},
{"id": "S1_policy_nonretroactive",
"checker": _s1_policy_nonretroactive, "weight": 2.0},
{"id": "S1_employee_responses_sent",
"checker": _s1_employee_responses_sent, "weight": 1.0},
],
"final": [
{"id": "S2_final_outputs_exist",
"checker": _s2_final_outputs_exist, "weight": 1.0},
{"id": "S2_e03_redline_still_rejected",
"checker": _s2_e03_redline_still_rejected, "weight": 2.0},
{"id": "S2_e02_contract_rate_noted",
"checker": _s2_e02_contract_rate_noted, "weight": 2.0},
{"id": "S2_totals_balanced",
"checker": _s2_totals_balanced, "weight": 1.5},
{"id": "S2_sheets_updated",
"checker": _s2_sheets_updated, "weight": 1.0},
{"id": "S2_manager_final_report",
"checker": _s2_manager_final_report, "weight": 1.0},
],
}
"""Travel expense batch audit โ finance compliance verification task.
Environments: filesystem, email, notion, google_sheets, calendar
3 stages: batch audit -> appeals & new findings -> final settlement
20 core checkers (0 keyword-only searches)
Adaptation notes:
- No Feishu/IM manager: all IM communication via email
- No STT manager: voice transcript delivered via email
- Audio .wav file uploaded as reference material alongside transcript
- Google Sheets for policy limits and expense summary
"""
import csv
from datetime import datetime
from io import StringIO
# โโ Constants โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
EXPENSE_DB_NAME = "march_travel_expenses"
EXPENSE_DB_SCHEMA = {
"claim_id": {"title": {}},
"employee": {"rich_text": {}},
"category": {
"select": {
"options": [
{"name": "Flight"},
{"name": "Hotel"},
{"name": "Meal"},
{"name": "Taxi"},
{"name": "Rail"},
{"name": "Venue"},
{"name": "Transport"},
]
}
},
"description": {"rich_text": {}},
"amount_claimed": {"number": {}},
"attachments": {"rich_text": {}},
"notes": {"rich_text": {}},
"status": {
"select": {
"options": [
{"name": "submitted"},
{"name": "under_review"},
{"name": "approved"},
{"name": "rejected"},
{"name": "pending"},
]
}
},
}
EXPENSE_SEED_ROWS = [
# โโ Zhang Qiang โ Sales Team, Shenzhen trip, March 10-14 โโ
{
"claim_id": "E01",
"employee": "Zhang Qiang",
"category": "Flight",
"description": "Beijing -> Shenzhen economy class",
"amount_claimed": 1280,
"attachments": "ticket_E01.pdf",
"notes": "",
},
{
"claim_id": "E02",
"employee": "Zhang Qiang",
"category": "Hotel",
"description": "Shenzhen hotel, 4 nights",
"amount_claimed": 2400,
"attachments": "hotel_receipt_E02.jpg",
"notes": "",
},
{
"claim_id": "E03",
"employee": "Zhang Qiang",
"category": "Meal",
"description": "Client dinner on 3/12",
"amount_claimed": 1850,
"attachments": "dinner_receipt_E03.jpg, approval_voice_E03.wav",
"notes": "Boss verbally approved it.",
},
{
"claim_id": "E04",
"employee": "Zhang Qiang",
"category": "Taxi",
"description": "Local transportation on 3/11",
"amount_claimed": 186,
"attachments": "taxi_receipt_E04.jpg",
"notes": "",
},
{
"claim_id": "E05",
"employee": "Zhang Qiang",
"category": "Taxi",
"description": "Local transportation on 3/11",
"amount_claimed": 188,
"attachments": "taxi_receipt_E05.jpg",
"notes": "",
},
{
"claim_id": "E06",
"employee": "Zhang Qiang",
"category": "Flight",
"description": "Shenzhen -> Beijing, business class",
"amount_claimed": 1350,
"attachments": "ticket_E06.pdf",
"notes": "",
},
# โโ Li Na โ Marketing Team, Shanghai trip, March 17-19 โโ
{
"claim_id": "E07",
"employee": "Li Na",
"category": "Rail",
"description": "Beijing -> Shanghai, second class",
"amount_claimed": 553,
"attachments": "train_ticket_E07.jpg",
"notes": "",
},
{
"claim_id": "E08",
"employee": "Li Na",
"category": "Hotel",
"description": "Shanghai hotel, 2 nights",
"amount_claimed": 960,
"attachments": "hotel_receipt_E08.jpg",
"notes": "",
},
{
"claim_id": "E09",
"employee": "Li Na",
"category": "Meal",
"description": "Team meal on 3/18",
"amount_claimed": 420,
"attachments": "meal_receipt_E09.jpg",
"notes": "",
},
{
"claim_id": "E10",
"employee": "Li Na",
"category": "Transport",
"description": "Local transportation total",
"amount_claimed": 275,
"attachments": "transport_receipts_E10.jpg",
"notes": "",
},
{
"claim_id": "E11",
"employee": "Li Na",
"category": "Venue",
"description": "Meeting room rental",
"amount_claimed": 800,
"attachments": "venue_receipt_E11.pdf",
"notes": "",
},
# โโ Wang Peng โ Engineering Team, Hangzhou trip, March 24-26 โโ
{
"claim_id": "E12",
"employee": "Wang Peng",
"category": "Rail",
"description": "Beijing -> Hangzhou, second class",
"amount_claimed": 626,
"attachments": "train_ticket_E12.jpg",
"notes": "",
},
{
"claim_id": "E13",
"employee": "Wang Peng",
"category": "Hotel",
"description": "Hangzhou hotel, 2 nights",
"amount_claimed": 780,
"attachments": "hotel_receipt_E13.jpg",
"notes": "",
},
{
"claim_id": "E14",
"employee": "Wang Peng",
"category": "Meal",
"description": "Dinner on 3/25",
"amount_claimed": 1200,
"attachments": "dinner_receipt_E14.jpg, dinner_photo_E14.jpg",
"notes": "",
},
{
"claim_id": "E15",
"employee": "Wang Peng",
"category": "Taxi",
"description": "Local transportation total",
"amount_claimed": 145,
"attachments": "taxi_receipts_E15.jpg",
"notes": "",
},
]
POLICY_SHEET_HEADERS = ["Category", "Standard", "Limit", "Notes"]
POLICY_SHEET_DATA = [
POLICY_SHEET_HEADERS,
["Air Travel", "Economy class", "Actuals", ""],
["Rail Travel", "Second class", "Actuals", ""],
["Hotel - Tier 1 Cities", "Beijing / Shanghai / Guangzhou / Shenzhen",
"500/night", "Over-limit requires written approval"],
["Hotel - Tier 2 Cities", "Hangzhou / Chengdu / Wuhan etc.",
"400/night", "Over-limit requires written approval"],
["Hotel - Other Cities", "โ", "350/night",
"Over-limit requires written approval"],
["Meal - Work Meal", "Standard meal", "150/person/meal", ""],
["Meal - Client Entertainment", "CFO approval required", "2000",
"Written approval required"],
["Local Transportation", "Actuals", "200/day", ""],
]
SUMMARY_SHEET_HEADERS = [
"employee", "claim_id", "category", "amount_claimed",
"amount_approved", "status", "finding",
]
CALENDAR_NAME = "StarOcean Business Travel"
# โโ Helpers โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def _notion_title(value: str) -> dict:
return {"title": [{"text": {"content": value}}]}
def _notion_text(value: str) -> dict:
return {"rich_text": [{"text": {"content": value}}]}
def _notion_select(value: str) -> dict:
return {"select": {"name": value}}
def _notion_number(value) -> dict:
return {"number": value}
def _read_csv(ctx, filename: str) -> list[dict]:
"""Read a CSV from the agent's workspace, checking outputs/ and root."""
for base in (ctx.workspace / "outputs", ctx.workspace):
path = base / filename
if path.exists():
text = path.read_text(encoding="utf-8-sig")
return list(csv.DictReader(StringIO(text)))
return []
def _find_csv_rows(rows: list[dict], col: str, value: str) -> list[dict]:
"""Find all CSV rows where col matches value (case-insensitive)."""
return [
r for r in rows
if r.get(col, "").strip().upper() == value.upper()
]
def _parse_amount(raw: str) -> float:
"""Parse a numeric amount from a CSV cell, tolerating commas and spaces."""
if not raw:
return 0.0
try:
return float(raw.strip().replace(",", "").replace(" ", ""))
except (ValueError, AttributeError):
return 0.0
def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
"""Extract a field value from a Notion query result row."""
props = row.get("properties", {})
prop = props.get(field, {})
if field_type == "title":
parts = prop.get("title", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "rich_text":
parts = prop.get("rich_text", [])
return "".join(t.get("plain_text", "") for t in parts)
elif field_type == "select":
sel = prop.get("select", {})
return sel.get("name", "") if sel else ""
elif field_type == "number":
return prop.get("number", 0)
return ""
async def _find_expense_row(ctx, claim_id: str) -> dict | None:
"""Find a Notion expense row by claim_id (title field)."""
rows = await ctx.notion.query_db(EXPENSE_DB_NAME)
for row in rows:
cid = _get_notion_field(row, "claim_id", "title")
if cid and claim_id.upper() in cid.upper():
return row
return None
def _emails_mention_employee(emails: list[dict], names: list[str]) -> bool:
"""Check if any email body/subject mentions at least one of the given names."""
for e in emails:
text = (e.get("body", "") + " " + e.get("subject", "")).lower()
if any(n.lower() in text for n in names):
return True
return False
# โโ METADATA โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
METADATA = {
"id": "hr_task10",
"name": "Travel Expense Batch Audit",
"category": "hr",
"environments": ["filesystem", "email", "notion", "google_sheets", "calendar"],
"timeout_seconds": 600,
"difficulty": "hard",
"mm_level": "L4",
"role": "Finance auditor reviewing 15 travel expense claims for 3 employees",
"tags": [
"hr", "finance", "audit", "expense", "cross-modal-verification",
"multimodal", "silent-event", "red-line", "visual-perception",
"ocr", "pdf", "policy-compliance", "duplicate-detection",
],
"env_config": {
"email": {
"users": {
"xiao_lin": {
"email": "[email protected]",
"password": "xl_pwd",
},
"liu_finance": {
"email": "[email protected]",
"password": "lf_pwd",
},
"zhang_qiang": {
"email": "[email protected]",
"password": "zq_pwd",
},
"li_na": {
"email": "[email protected]",
"password": "ln_pwd",
},
"wang_peng": {
"email": "[email protected]",
"password": "wp_pwd",
},
"cfo": {
"email": "[email protected]",
"password": "cfo_pwd",
},
},
},
"google_sheets": {
"task_id": "hr_task10",
},
},
}
PROMPT = "Check your email for audit instructions and review the March travel expense claims."
# โโ Stage Functions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
async def stage0(ctx):
"""Monday March 31: Initial batch audit of 15 travel expense claims."""
# 1. Upload all assets (personality .md + input materials)
await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")
# 2. Create Notion page + expense database with 15 claims
await ctx.notion.create_page("StarOcean Finance โ March Travel Expenses")
await ctx.notion.create_database(EXPENSE_DB_NAME, EXPENSE_DB_SCHEMA)
for row in EXPENSE_SEED_ROWS:
await ctx.notion.add_database_row(EXPENSE_DB_NAME, {
"claim_id": _notion_title(row["claim_id"]),
"employee": _notion_text(row["employee"]),
"category": _notion_select(row["category"]),
"description": _notion_text(row["description"]),
"amount_claimed": _notion_number(row["amount_claimed"]),
"attachments": _notion_text(row["attachments"]),
"notes": _notion_text(row["notes"]),
"status": _notion_select("submitted"),
})
# 3. Create Google Sheets โ policy limits
policy_sheet = await ctx.google_sheets.create_spreadsheet(
"Travel Policy Limits 2025"
)
await ctx.google_sheets.update_values(
policy_sheet["sheet_id"],
"Sheet1!A1:D9",
POLICY_SHEET_DATA,
)
# 4. Create Google Sheets โ empty March summary
summary_sheet = await ctx.google_sheets.create_spreadsheet(
"March Expense Summary"
)
await ctx.google_sheets.update_values(
summary_sheet["sheet_id"],
"Sheet1!A1:G1",
[SUMMARY_SHEET_HEADERS],
)
# 5. Create calendar events for the 3 business trips
await ctx.calendar.create_calendar(CALENDAR_NAME)
await ctx.calendar.add_event(
CALENDAR_NAME,
summary="Zhang Qiang โ Shenzhen business trip",
dtstart=datetime(2025, 3, 10, 9, 0),
dtend=datetime(2025, 3, 14, 18, 0),
description="Sales team, client visits in Shenzhen.",
location="Shenzhen",
)
await ctx.calendar.add_event(
CALENDAR_NAME,
summary="Li Na โ Shanghai business trip",
dtstart=datetime(2025, 3, 17, 9, 0),
dtend=datetime(2025, 3, 19, 18, 0),
description="Marketing team, conference and meetings in Shanghai.",
location="Shanghai",
)
await ctx.calendar.add_event(
CALENDAR_NAME,
summary="Wang Peng โ Hangzhou business trip",
dtstart=datetime(2025, 3, 24, 9, 0),
dtend=datetime(2025, 3, 26, 18, 0),
description="Engineering team, partner meetings in Hangzhou.",
location="Hangzhou",
)
# 6. Seed emails โ Manager Liu initial instruction
await ctx.email.send_email(
from_user="liu_finance",
to="[email protected]",
subject="March travel claims โ start audit today",
body=(
"Please audit the March travel claims for the three employees. "
"The invoices and tickets are already in the system. Approve the "
"clean ones and return the problematic ones with reasons. I need "
"the summary by tomorrow evening.\n\n"
"This month's reimbursement amount is unusually high. "
"Review carefully."
),
)
# 7. Seed email โ Zhang Qiang
await ctx.email.send_email(
from_user="zhang_qiang",
to="[email protected]",
subject="Reimbursement submitted โ Zhang Qiang",
body=(
"I already submitted the reimbursement. That client dinner was "
"verbally approved by the boss."
),
)
# 8. Seed email โ Wang Peng
await ctx.email.send_email(
from_user="wang_peng",
to="[email protected]",
subject="Hangzhou dinner note โ Wang Peng",
body=(
"The Hangzhou dinner was with external partners. "
"We only claimed our own portion."
),
)
# 9. Seed email โ voice message transcript (replaces STT)
await ctx.email.send_email(
from_user="liu_finance",
to="[email protected]",
subject="Zhang Qiang voice message transcript โ E03 dinner approval claim",
body=(
"Below is the transcript of the voice message Zhang Qiang left "
"regarding E03 (audio file: input/approval_voice_E03.wav):\n\n"
"Zhang Qiang: 'The boss said it's OK to treat the clients to "
"dinner. I already spent the money, just submit the receipt.'\n\n"
"Note: This is a voice message, not a formal written approval."
),
)
# 10. Notification โ only references loud events
return {
"notification": (
"[Monday, March 31] Manager Liu sent you an email. The March "
"travel reimbursement materials are in the system. Please begin "
"the audit.\n\n"
"Your email: [email protected]\n"
"Manager Liu: [email protected]\n"
"CFO: [email protected]\n\n"
"Employees:\n"
" Zhang Qiang: [email protected] (Sales, Shenzhen trip)\n"
" Li Na: [email protected] (Marketing, Shanghai trip)\n"
" Wang Peng: [email protected] (Engineering, Hangzhou trip)\n\n"
"Expense database: march_travel_expenses (Notion)\n"
"Policy spreadsheet: Travel Policy Limits 2025 (Google Sheets)\n"
"Summary spreadsheet: March Expense Summary (Google Sheets)\n"
"Calendar: StarOcean Business Travel\n\n"
"All input materials are in /workspace/input/:\n"
" - expense_policy.pdf (full travel-expense policy)\n"
" - Per-claim attachments: tickets, receipts, photos\n"
" - approval_voice_E03.wav (Zhang Qiang voice message)\n\n"
"Output directory: /workspace/ (place CSV files and reports here)"
),
"time": "2025-03-31T09:00:00+08:00",
}
async def stage1(ctx):
"""Tuesday April 1: Appeals from employees and silent system updates."""
# 1. Loud: Zhang Qiang replies with CFO WeChat screenshot
await ctx.email.send_email(
from_user="zhang_qiang",
to="[email protected]",
subject="Re: E03 dinner โ CFO approval screenshot attached",
body=(
"Here is the CFO's WeChat approval screenshot. "
"See input/cfo_approval_screenshot.jpg โ he clearly said "
"it's OK to host the client dinner.\n\n"
"Please approve E03."
),
)
# 2. Loud: Li Na explains the March 20 ride
await ctx.email.send_email(
from_user="li_na",
to="[email protected]",
subject="Re: E10 transport receipts โ March 20 explanation",
body=(
"That March 20 taxi ride was from the train station back to "
"the office, so it was still work-related. It was the return "
"trip from Shanghai."
),
)
# 3. Loud: Wang Peng withdraws Guangzhou receipt
await ctx.email.send_email(
from_user="wang_peng",
to="[email protected]",
subject="Re: E15 taxi receipts โ withdrawing Guangzhou receipt",
body=(
"That Guangzhou receipt was accidentally mixed in from last "
"month's trip. I withdraw that one. Please only count the two "
"Hangzhou taxi receipts."
),
)
# 4. Silent: Finance updates the policy sheet with new April rule
policy_sheet_id = await ctx.google_sheets.get_spreadsheet_id(
"Travel Policy Limits 2025"
)
if policy_sheet_id:
await ctx.google_sheets.append_rows(
policy_sheet_id,
"Sheet1",
[[
"Meal - Client Entertainment (April 2025+)",
"CFO approval required",
"1500",
"Effective 2025-04-01. Reduced from 2000. Not retroactive.",
]],
)
# 5. Silent: Notion update โ taxi fraud pattern on E04
e04_row = await _find_expense_row(ctx, "E04")
if e04_row:
await ctx.notion.update_db_row(e04_row["id"], {
"notes": _notion_text(
"ALERT: This taxi company (Shenzhen Pengcheng) has had 3 "
"recent complaints involving consecutive serial number "
"invoices. Combined with E05, pattern is suspicious."
),
})
# 6. Notification โ only mentions loud events
return {
"notification": (
"[Tuesday, April 1] You have new emails:\n"
"- Zhang Qiang replied with a CFO WeChat screenshot for E03.\n"
"- Li Na replied explaining the March 20 taxi ride (E10).\n"
"- Wang Peng is withdrawing the Guangzhou taxi receipt from E15."
),
"time": "2025-04-01T09:00:00+08:00",
}
async def stage2(ctx):
"""Wednesday April 2: Final settlement โ manager needs the result today."""
# 1. Loud: Manager Liu demands final result
await ctx.email.send_email(
from_user="liu_finance",
to="[email protected]",
subject="Final audit result needed today",
body=(
"We need the final result today. What is the final decision "
"on Zhang Qiang's dinner claim? Please submit the complete "
"audit summary, final CSV, and report."
),
)
# 2. Loud: Zhang Qiang sends formal CFO-signed approval form
await ctx.email.send_email(
from_user="zhang_qiang",
to="[email protected]",
subject="Formal over-limit approval โ E03 dinner",
body=(
"Here is the formal CFO-signed paper approval form for the "
"over-limit client dinner. See input/formal_approval_E03.jpg.\n\n"
"This should resolve the approval issue. Please approve E03."
),
)
# 3. Silent: Notion update โ E02 hotel contracted rate
e02_row = await _find_expense_row(ctx, "E02")
if e02_row:
existing_notes = _get_notion_field(e02_row, "notes", "rich_text")
new_notes = (
(existing_notes + "\n" if existing_notes else "")
+ "Admin note: This hotel is under company contract. "
"Contracted rate is RMB 580/night. Zhang Qiang was charged "
"RMB 600/night โ a 20/night discrepancy."
)
await ctx.notion.update_db_row(e02_row["id"], {
"notes": _notion_text(new_notes),
})
# 4. Notification โ only mentions loud events
return {
"notification": (
"[Wednesday, April 2] You have new emails:\n"
"- Manager Liu: the final audit result is needed today.\n"
"- Zhang Qiang sent a formal CFO-signed approval form for E03 "
"(input/formal_approval_E03.jpg).\n\n"
"Please finalize all decisions, generate the final summary CSV "
"and audit report, update the Google Sheets summary, and email "
"the results to Manager Liu and the CFO."
),
"time": "2025-04-02T09:00:00+08:00",
}
