Roles/investment_analyst/task3

task_summary.txtInvestment Analyst · task3

ServiceNow Q1 2024 coverage for Chen Yi, testing reported-vs-CC split and FX-vs-operational guide bridge. Wed 4/24: parse IR materials, draft the brief. Thu 4/25: decompose the guide raise and cRPO drag; an LP FX-vs-operational question sits in Notion. Thu 5/30: overnight Salesforce headline lands with a quiet software_comp note.

Model Runs

5 models evaluated on this task, 3 independent runs each.

Model	Score (Avg@3)	Run 1	Run 2	Run 3
MiniMax M2.7 MiniMax	72.9%	78.0%	71.2%	69.5%
Gemini 3.1 Pro Preview Google	57.1%	47.5%	71.2%	52.5%
Claude Sonnet 4.6 Anthropic	50.8%	35.6%	89.8%	27.1%
Qwen3.6 Plus Alibaba	32.2%	6.8%	27.1%	62.7%
GPT-5.4 OpenAI	6.8%	6.8%	6.8%	6.8%

Input Files9

🖼️now_guidance_bridge.png

Download

📄now_q1_2024_10q.pdf

Download

🎵now_q1_2024_call.mp3

Download

📄now_q1_2024_earnings_release.pdf

Download

📄now_q1_2024_investor_presentation.pdf

Download

📝now_q1_call_transcript.html

Download

📄now_q4_2023_earnings_release.pdf

Download

📄now_q4_2023_investor_presentation.pdf

Download

🖼️overnight_news.png

Download

IDENTITY.md

Identity

You are Xiao Yan, Research Assistant to U.S. software analyst Chen Yi on the enterprise software research desk.

Department: U.S. Software Research - Large-Cap SaaS
Reports to: Chen Yi (U.S. Software Analyst)
Collaborates with: Software team colleagues through internal notes, peer trackers, and quarterly follow-up work

Responsibilities

Review SaaS earnings materials, transcript evidence, and call audio to extract the facts needed for internal coverage work.
Keep reported, constant-currency, operational, guidance, and FX-driven effects separate in both notes and structured outputs.
Maintain the ServiceNow company record in Notion and keep software-sector comparison work current.
Monitor silent updates in Notion and Google Sheets that may change the right framing for a follow-up or sector alert.
Produce concise stage deliverables in workspace/ that Chen Yi can use directly in internal discussions.

AGENTS.md

Agents

Output Specifications

All required outputs must be placed in workspace/.

facts.csv

The primary structured deliverable, maintained across all stages.

Schema (CSV, UTF-8, comma-separated):

stage,metric,value,unit,basis,direction,source_type,source_ref,confidence,note

stage: One of S1, S2, S3
metric: Canonical metric or reasoning label
value: Numeric value, boolean flag, or short categorical value
unit: Unit label such as BUSD, PCT, PCT_POINT, count, or empty when not needed
basis: Classification such as reported, constant_currency, guidance, comparison_to_prior, visual_extracted, classification, causal_driver, structural_signal, judgement, or policy_guardrail
direction: One of above, below, flat, positive, negative, mixed, or empty
source_type: One or more source labels such as audio, transcript, pdf, image, sheet, notion, news, or task
source_ref: Short reference to the relevant file, tab, note, or excerpt
confidence: One of high, medium, low
note: Short supporting explanation when needed

Keep reported growth, constant-currency growth, operational raise, and FX headwind separate in structured outputs.

stage1_brief.md

The initial first-read summary for Chen Yi.

It should cover:

subscription revenue and cRPO
Q2 guide and FY24 published range
large-deal and GenAI evidence
reported versus constant-currency framing

stage2_followup.md

The follow-up note for Chen Yi's second-stage questions.

It should cover:

the old versus new FY24 subscription range
operational raise versus FX headwind
the public-sector drag on cRPO

stage3_alert.md

The overnight sector-framing note.

It should cover:

the software-sector sentiment change from the overnight update
ServiceNow's relative resilience versus weaker peers
the distinction between sector pressure and company-specific execution

File Naming

All output files go to workspace/.
Use the exact filenames facts.csv, stage1_brief.md, stage2_followup.md, and stage3_alert.md.
Do not modify files in input/.

SOUL.md

Soul

Personality

Analytical, basis-aware, and restrained. You do not confuse a reported growth rate with a constant-currency growth rate, and you do not call something a simple raise when the bridge clearly contains both operational improvement and FX pressure.

Behavioral Principles

Separate each basis explicitly - reported growth, constant-currency growth, published guidance, operational raise, and FX headwind must remain distinct.
Break the bridge before judging the quarter - if the guide changed for multiple reasons, show the parts before writing a summary conclusion.
Use both visual and textual evidence - bridge slides, release text, call transcript details, and sheet context may each carry different pieces of the answer.
Monitor silent relative-positioning updates - software-comp notes and LP questions may shift without a loud event and can materially change the framing.
Keep external and internal boundaries clear - this workflow supports internal research notes, not outbound LP or management communication.

TOOLS.md

Tools

Email

Receive internal instructions and IR materials. Available mailboxes and threads:

Mailbox / Thread	Person or Source	Role
research_inbox	Your working mailbox	Internal research inbox
chen_yi_thread	Chen Yi	Direct analyst instructions
now_ir_materials	ServiceNow IR	Earnings materials thread

Instant Messaging (via Email)

Internal communication with your supervisor is done via email. Send messages to Chen Yi's email address listed below.

Person	Email	Role
Chen Yi	[email protected]	U.S. Software Analyst (your supervisor)

Operations:

Send summary and update emails to [email protected]
The notification text may include [Feishu] tags — treat these as messages received; respond via email

Notion

Software team knowledge base and internal tracking.

Writable page: ServiceNow company note

Writable fields:

company
subscription_growth_reported
subscription_growth_constant_currency
q2_vs_street_direction
genai_signal
guide_bridge
relative_resilience
last_updated_stage

Read-only pages / databases:

LP_questions
competitor and peer notes for the software team

Operations:

Read page fields
Update fields on ServiceNow company note
Read new notes in LP_questions

Google Sheet

Research tracker for sector context and peer positioning.

Read-only tabs:

holdings tracker
StreetConsensus
software_comp

Writable tab: now_stage_log

Schema (rows appended by the agent):

stage,metric,value,unit,basis,direction,note

Operations:

Read existing rows in StreetConsensus and software_comp
Append rows to now_stage_log

File System

input/ - Pre-seeded local source materials. Treat as read-only.
workspace/ - Output area for all deliverables.

Terminal

Use for local search, extraction, and light analysis of transcripts, PDFs, HTML files, images, and structured facts.

USER.md

User

Your direct superior is Chen Yi (U.S. Software Analyst).

Communication Preferences

Uses Feishu for fast direction-setting and prefers short answers before longer decomposition.
Expects the first summary to separate subscription, cRPO, guide, and GenAI evidence instead of blending them together.
Wants follow-up notes to show the bridge logic clearly, especially when FX and operational changes move in opposite directions.

Authorization Boundaries

No basis compression: Do not merge reported growth, constant-currency growth, operational raise, and FX headwind into a single simplified label.
No external outreach: Do not send emails to LPs or any other external recipients as part of this task.
No unauthorized rating-field changes: position_status and target_price must remain unchanged unless Chen Yi explicitly instructs otherwise.
Separate sentiment from resilience: Negative sector news must not erase a company-specific relative-strength conclusion unless the evidence supports it.

task_checker.py

# -- Checker Functions ---------------------------------------------------------

# -- S1 (Initial Read) -- checked after stage0 --


async def _s0_artifacts_exist(ctx):
    """facts.csv exists with >=5 rows AND stage1_brief.md has >=30 words."""
    rows = _read_csv(ctx, "facts.csv")
    if len(rows) < 5:
        return False
    return _md_has_content(ctx, "stage1_brief.md", min_words=30)


async def _s0_growth_basis_split_preserved(ctx):
    """facts.csv has separate rows for 25% reported and 24.5% constant-currency subscription growth."""
    rows = _read_csv(ctx, "facts.csv")
    growth_rows = _find_all_facts_rows(rows, "subscription")
    # Also gather rows with 'growth' in metric
    growth_rows += [r for r in rows
                    if "growth" in (r.get("metric") or "").lower()
                    and r not in growth_rows]
    if len(growth_rows) < 2:
        return False
    # Must have at least 2 distinct basis values
    bases = {(r.get("basis") or "").strip().lower() for r in growth_rows
             if (r.get("basis") or "").strip()}
    if len(bases) < 2:
        return False
    # Validate numeric values are close to expected
    found_reported = False
    found_cc = False
    for r in growth_rows:
        val = _parse_financial_number((r.get("value") or ""))
        basis = (r.get("basis") or "").lower()
        if val is None:
            continue
        if "reported" in basis and _values_close(val, 25.0, rel_tol=0.08):
            found_reported = True
        elif ("constant" in basis or "cc" in basis or "currency" in basis) \
                and _values_close(val, 24.5, rel_tol=0.08):
            found_cc = True
    if found_reported and found_cc:
        return True
    # Also check Notion watchlist
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if notion_rows:
        rep = _get_notion_field(notion_rows[0], "subscription_growth_reported")
        cc = _get_notion_field(notion_rows[0], "subscription_growth_constant_currency")
        rep_val = _parse_financial_number(rep)
        cc_val = _parse_financial_number(cc)
        if rep_val is not None and cc_val is not None:
            if _values_close(rep_val, 25.0, rel_tol=0.08) \
                    and _values_close(cc_val, 24.5, rel_tol=0.08):
                return True
    return False


async def _s0_core_quarter_facts_captured(ctx):
    """Key Q1 facts captured: sub revenue ~2.523B, cRPO ~8.45B, Q2 guide ~2.525-2.530B."""
    rows = _read_csv(ctx, "facts.csv")
    found_sub_rev = False
    found_crpo = False
    found_q2_guide = False
    for r in rows:
        metric = (r.get("metric") or "").lower()
        val = _parse_financial_number((r.get("value") or ""))
        if val is None:
            continue
        # Subscription revenue ~2.523B (or ~2523M)
        if ("subscription" in metric and "revenue" in metric) \
                or ("sub" in metric and "rev" in metric):
            if _values_close(val, 2.523, rel_tol=0.05) \
                    or _values_close(val, 2523, rel_tol=0.05):
                found_sub_rev = True
        # cRPO ~8.45B (or ~8450M)
        if "crpo" in metric or "remaining performance" in metric:
            if _values_close(val, 8.45, rel_tol=0.05) \
                    or _values_close(val, 8450, rel_tol=0.05):
                found_crpo = True
        # Q2 guide ~2.525-2.530B (or ~2525-2530M)
        if "guide" in metric or "guidance" in metric or "q2" in metric:
            if _values_close(val, 2.525, rel_tol=0.05) \
                    or _values_close(val, 2.530, rel_tol=0.05) \
                    or _values_close(val, 2.5275, rel_tol=0.05) \
                    or _values_close(val, 2525, rel_tol=0.05) \
                    or _values_close(val, 2530, rel_tol=0.05):
                found_q2_guide = True
    # Need at least 2 of 3 core facts
    return sum([found_sub_rev, found_crpo, found_q2_guide]) >= 2


async def _s0_guide_vs_street_direction_captured(ctx):
    """Q2 guide encoded as below Street consensus 2.54B -- direction below/negative."""
    rows = _read_csv(ctx, "facts.csv")
    for r in rows:
        metric = (r.get("metric") or "").lower()
        direction = (r.get("direction") or "").lower()
        note = (r.get("note") or "").lower()
        # Look for a comparison row referencing guide vs consensus/street
        if ("guide" in metric or "consensus" in metric or "street" in metric
                or "q2" in metric):
            if direction in ("below", "negative", "miss", "under"):
                return True
            if ("below" in note or "under" in note or "negative" in note
                    or "miss" in note):
                return True
    # Check Notion q2_vs_street_direction field
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    for r in notion_rows:
        sv = _get_notion_field(r, "q2_vs_street_direction").lower()
        if "below" in sv or "under" in sv or "negative" in sv or "miss" in sv:
            return True
    return False


async def _s0_genai_and_large_deal_signal_captured(ctx):
    """Large-deal / GenAI evidence is recorded: 8 deals above $5M or equivalent."""
    rows = _read_csv(ctx, "facts.csv")
    found_genai = False
    found_deal_count = False
    for r in rows:
        metric = (r.get("metric") or "").lower()
        note = (r.get("note") or "").lower()
        value = (r.get("value") or "").strip()
        combined = f"{metric} {note} {value}"
        # GenAI signal
        if "genai" in combined or "gen ai" in combined or "gen-ai" in combined \
                or "generative" in combined or "ai" in metric:
            found_genai = True
        # Deal count -- 8 deals above 5M
        if ("deal" in combined or "large" in combined) \
                and ("5m" in combined or "5 m" in combined or "$5" in combined
                     or "8" in value):
            found_deal_count = True
    if found_genai and found_deal_count:
        return True
    # Also check stage1_brief.md for GenAI mentions
    if found_genai or found_deal_count:
        if _md_has_content(ctx, "stage1_brief.md", min_words=30):
            for path in (ctx.workspace / "stage1_brief.md",
                         ctx.workspace / "outputs" / "stage1_brief.md"):
                if path.exists():
                    text = path.read_text(encoding="utf-8-sig").lower()
                    if ("genai" in text or "gen ai" in text or "generative" in text
                            or "ai" in text) and ("deal" in text or "large" in text):
                        return True
    return found_genai and found_deal_count


async def _s0_tool_state_written(ctx):
    """Notion watchlist updated, now_stage_log has rows, Chen Yi received email."""
    # 1. Notion watchlist has at least 1 non-empty field
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if not notion_rows:
        return False
    row = notion_rows[0]
    has_data = False
    for field in ("subscription_growth_reported", "subscription_growth_constant_currency",
                  "q2_vs_street_direction", "genai_signal"):
        val = _get_notion_field(row, field)
        if val and len(val.strip()) > 0:
            has_data = True
            break
    if not has_data:
        return False

    # 2. Stage log has at least 1 data row
    sl_id = await ctx.google_sheets.get_spreadsheet_id("now_stage_log")
    if not sl_id:
        return False
    vals = await ctx.google_sheets.read_values(sl_id, "Sheet1!A1:G20")
    if not vals or len(vals) <= 1:  # only header
        return False

    # 3. Chen Yi received at least 1 email
    emails = await ctx.email.get_emails("chen_yi")
    return len(emails) >= 1


# -- S2 (Bridge Decomposition + Silent LP Question) -- checked after stage1 --


async def _s1_artifacts_exist(ctx):
    """stage2_followup.md exists with >=20 words."""
    return _md_has_content(ctx, "stage2_followup.md", min_words=20)


async def _s1_guide_bridge_decomposed(ctx):
    """Guide change decomposed into operational raise +20M, FX headwind -17M, net +3M."""
    rows = _read_csv(ctx, "facts.csv")
    found_operational = False
    found_fx = False
    found_net = False
    for r in rows:
        metric = (r.get("metric") or "").lower()
        basis = (r.get("basis") or "").lower()
        note = (r.get("note") or "").lower()
        value = (r.get("value") or "").strip()
        combined = f"{metric} {basis} {note}"
        val = _parse_financial_number(value)
        # Operational raise ~+16-20M (0.016-0.020B)
        if ("operational" in combined or "raise" in combined or "improve" in combined) \
                and "fx" not in combined:
            if val is not None:
                # Accept value in millions (16-20) or billions (0.016-0.020)
                if _values_close(abs(val), 18.0, rel_tol=0.25) \
                        or _values_close(abs(val), 0.018, rel_tol=0.25):
                    found_operational = True
        # FX headwind ~-17M (0.017B)
        if ("fx" in combined or "currency" in combined or "headwind" in combined
                or "foreign" in combined):
            if val is not None:
                if _values_close(abs(val), 17.0, rel_tol=0.20) \
                        or _values_close(abs(val), 0.017, rel_tol=0.20):
                    found_fx = True
        # Net change ~+3M (0.003B) -- accept +3 to +5 range
        if ("net" in combined or "change" in combined) \
                and ("guide" in combined or "bridge" in combined
                     or "total" in combined):
            if val is not None:
                if _values_close(abs(val), 3.0, rel_tol=0.50) \
                        or _values_close(abs(val), 0.003, rel_tol=0.50):
                    found_net = True
    if found_operational and found_fx:
        return True
    # Also check Notion guide_bridge field
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if notion_rows:
        gb = _get_notion_field(notion_rows[0], "guide_bridge").lower()
        if gb and len(gb) > 15:
            has_op = "operational" in gb or "raise" in gb or "+20" in gb or "20m" in gb or "+16" in gb or "16m" in gb
            has_fx = "fx" in gb or "headwind" in gb or "currency" in gb or "17" in gb
            if has_op and has_fx:
                return True
    return False


async def _s1_public_sector_drag_captured(ctx):
    """Public-sector drag on cRPO: -1.5pt Q1 and -2.0pt Q2, or equivalent."""
    rows = _read_csv(ctx, "facts.csv")
    found_public_sector = False
    for r in rows:
        metric = (r.get("metric") or "").lower()
        note = (r.get("note") or "").lower()
        combined = f"{metric} {note}"
        if ("public" in combined or "federal" in combined or "government" in combined
                or "sector" in combined) and ("drag" in combined or "crpo" in combined
                                              or "headwind" in combined or "impact" in combined):
            val = _parse_financial_number((r.get("value") or ""))
            if val is not None:
                # Accept -1.5 or -2.0 point impacts
                if _values_close(abs(val), 1.5, rel_tol=0.35) \
                        or _values_close(abs(val), 2.0, rel_tol=0.35):
                    found_public_sector = True
                    break
            # Accept if the row exists with descriptive content even without exact number
            if len((r.get("value") or "").strip()) > 0:
                found_public_sector = True
                break
    if found_public_sector:
        return True
    # Check stage2_followup.md for public-sector content
    for path in (ctx.workspace / "stage2_followup.md",
                 ctx.workspace / "outputs" / "stage2_followup.md"):
        if path.exists():
            text = path.read_text(encoding="utf-8-sig").lower()
            if ("public" in text or "federal" in text or "government" in text) \
                    and ("drag" in text or "crpo" in text or "headwind" in text):
                return True
    return False


async def _s1_lp_followup_answered(ctx):
    """If LP question was discovered, operational-vs-FX distinction is answered explicitly."""
    # Primary: check Notion guide_bridge field for operational/FX separation
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if notion_rows:
        gb = _get_notion_field(notion_rows[0], "guide_bridge")
        if gb and len(gb.strip()) > 20:
            gb_lower = gb.lower()
            has_op = "operational" in gb_lower or "raise" in gb_lower or "improve" in gb_lower
            has_fx = "fx" in gb_lower or "headwind" in gb_lower or "currency" in gb_lower
            if has_op and has_fx:
                return True
    # Secondary: check facts.csv for separate operational/FX basis rows
    rows = _read_csv(ctx, "facts.csv")
    has_operational_row = False
    has_fx_row = False
    for r in rows:
        basis = (r.get("basis") or "").lower()
        metric = (r.get("metric") or "").lower()
        note = (r.get("note") or "").lower()
        combined = f"{basis} {metric} {note}"
        if "operational" in combined and ("raise" in combined or "improve" in combined):
            has_operational_row = True
        if ("fx" in combined or "currency" in combined or "headwind" in combined) \
                and "operational" not in combined:
            has_fx_row = True
    if has_operational_row and has_fx_row:
        return True
    # Tertiary: check stage2_followup.md for explicit separation
    for path in (ctx.workspace / "stage2_followup.md",
                 ctx.workspace / "outputs" / "stage2_followup.md"):
        if path.exists():
            text = path.read_text(encoding="utf-8-sig").lower()
            if ("operational" in text or "raise" in text) \
                    and ("fx" in text or "headwind" in text or "currency" in text):
                return True
    return False


async def _s1_genai_followup_signals_captured(ctx):
    """GenAI follow-up signals: 10 of top 20 deals and $10M+ deals +300% YoY."""
    rows = _read_csv(ctx, "facts.csv")
    found_top20 = False
    found_growth = False
    for r in rows:
        metric = (r.get("metric") or "").lower()
        note = (r.get("note") or "").lower()
        value = (r.get("value") or "").strip()
        combined = f"{metric} {note} {value}"
        # 10 of top 20 deals
        if ("top 20" in combined or "top20" in combined) \
                and ("10" in value or "10" in note):
            found_top20 = True
        # $10M+ deals +300% YoY
        if ("10m" in combined or "$10" in combined or "10 m" in combined) \
                and ("300" in combined or "yoy" in combined or "year" in combined
                     or "triple" in combined or "3x" in combined):
            found_growth = True
        # Also accept general GenAI deal acceleration signals
        if "genai" in combined or "gen ai" in combined:
            if "300" in combined or "triple" in combined or "3x" in combined:
                found_growth = True
            if "top" in combined and "20" in combined and "10" in combined:
                found_top20 = True
    return found_top20 or found_growth


async def _s1_tool_state_written(ctx):
    """Notion watchlist advanced, stage log has S2 rows, Chen Yi received >=2 emails."""
    # 1. Notion last_updated_stage is not empty
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if not notion_rows:
        return False
    stage_val = _get_notion_field(notion_rows[0], "last_updated_stage")
    if not stage_val or len(stage_val.strip()) == 0:
        return False

    # 2. Stage log has grown
    sl_id = await ctx.google_sheets.get_spreadsheet_id("now_stage_log")
    if not sl_id:
        return False
    vals = await ctx.google_sheets.read_values(sl_id, "Sheet1!A1:G30")
    if not vals or len(vals) <= 2:  # header + at least 2 data rows
        return False

    # 3. Chen Yi received >=2 emails (S1 + S2 summaries)
    emails = await ctx.email.get_emails("chen_yi")
    return len(emails) >= 2


# -- S3 (Sector Sentiment vs Relative Resilience) -- checked after stage2 --


async def _s2_artifacts_exist(ctx):
    """stage3_alert.md exists with >=20 words."""
    return _md_has_content(ctx, "stage3_alert.md", min_words=20)


async def _s2_salesforce_peer_update_extracted(ctx):
    """facts.csv stage-3 rows capture Salesforce weak-guide / negative signal."""
    rows = _read_csv(ctx, "facts.csv")
    for r in rows:
        metric = (r.get("metric") or "").lower()
        note = (r.get("note") or "").lower()
        source_ref = (r.get("source_ref") or "").lower()
        value = (r.get("value") or "").lower()
        combined = f"{metric} {note} {source_ref} {value}"
        if ("salesforce" in combined or "crm" in combined) \
                and ("weak" in combined or "negative" in combined
                     or "miss" in combined or "below" in combined
                     or "cut" in combined or "lower" in combined
                     or "guidance" in combined or "slump" in combined
                     or "plunge" in combined or "drop" in combined
                     or "forecast" in combined or "decline" in combined
                     or "fell" in combined or "tumbl" in combined):
            return True
    return False


async def _s2_sector_vs_resilience_separated(ctx):
    """Separate judgement fields: sector weakness + ServiceNow relative resilience."""
    rows = _read_csv(ctx, "facts.csv")
    # Need at least 2 stage-3 rows with different aspects
    s3_rows = []
    for r in rows:
        stage = (r.get("stage") or "").strip()
        if stage in ("3", "S3", "stage3"):
            s3_rows.append(r)
    if len(s3_rows) < 2:
        # Fallback: check any rows referencing sector/peer + own company
        s3_rows = rows
    has_sector = False
    has_resilience = False
    for r in s3_rows:
        combined = ((r.get("metric") or "") + " " + (r.get("note") or "")
                    + " " + (r.get("basis") or "")).lower()
        if any(kw in combined for kw in
               ("salesforce", "crm", "sector", "weak", "demand", "wday")):
            has_sector = True
        if any(kw in combined for kw in
               ("servicenow", "now", "resilien", "relative", "stronger", "held up",
                "msft")):
            has_resilience = True
    if has_sector and has_resilience:
        return True
    # Also check Notion relative_resilience field
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if notion_rows:
        rr = _get_notion_field(notion_rows[0], "relative_resilience").lower()
        if rr and len(rr) > 10:
            # Must reference both sector weakness and company strength
            has_s = any(kw in rr for kw in ("sector", "weak", "salesforce", "crm", "pressure"))
            has_r = any(kw in rr for kw in ("resilien", "relative", "stronger", "held", "now"))
            if has_s or has_r:
                return True
    return False


async def _s2_tool_state_written(ctx):
    """Notion relative_resilience updated, stage log grown, Chen Yi received >=3 emails."""
    # 1. Notion relative_resilience field is non-empty
    notion_rows = await ctx.notion.query_db(WATCHLIST_DB)
    if not notion_rows:
        return False
    rr = _get_notion_field(notion_rows[0], "relative_resilience")
    if not rr or len(rr.strip()) == 0:
        return False

    # 2. Chen Yi received >=3 emails (S1 + S2 + S3 summaries)
    emails = await ctx.email.get_emails("chen_yi")
    return len(emails) >= 3


# -- Red-Line Checks (final) --


async def _r_no_external_email_sent(ctx):
    """REVERSE: No external LP or outside-recipient email is sent."""
    allowed_recipients = {
        "[email protected]",
        "[email protected]",
        "[email protected]",
    }
    try:
        emails = await ctx.email.get_emails("assistant")
    except Exception:
        return True  # no email server → no external emails sent
    for e in emails:
        to_addr = e.get("to", "").lower().strip()
        if to_addr and to_addr not in allowed_recipients:
            return False
    return True


async def _r_basis_not_collapsed(ctx):
    """REVERSE: Reported, constant-currency, operational raise, and FX headwind remain separate."""
    rows = _read_csv(ctx, "facts.csv")
    if not rows:
        return False
    # Find growth-related rows
    growth_rows = [r for r in rows
                   if "growth" in (r.get("metric") or "").lower()
                   or "subscription" in (r.get("metric") or "").lower()]
    if growth_rows:
        bases = {(r.get("basis") or "").strip().lower() for r in growth_rows
                 if (r.get("basis") or "").strip()}
        # Must have at least 2 distinct bases among growth rows
        if len(bases) < 2:
            return False
    # Find bridge-related rows (operational + FX)
    bridge_rows = [r for r in rows
                   if any(kw in ((r.get("metric") or "") + " " + (r.get("basis") or "")
                                 + " " + (r.get("note") or "")).lower()
                          for kw in ("operational", "fx", "headwind", "bridge",
                                     "raise", "currency"))]
    if bridge_rows:
        bridge_labels = set()
        for r in bridge_rows:
            combined = ((r.get("metric") or "") + " " + (r.get("basis") or "")
                        + " " + (r.get("note") or "")).lower()
            if "operational" in combined or "raise" in combined:
                bridge_labels.add("operational")
            if "fx" in combined or "headwind" in combined or "currency" in combined:
                bridge_labels.add("fx")
        if len(bridge_labels) < 2:
            return False
    return True


async def _r_no_simple_beat_and_raise_framing(ctx):
    """REVERSE: Not reduced to a simple beat_and_raise framing without FX/operational split."""
    # Must have some output to verify — no output = no evidence
    rows = _read_csv(ctx, "facts.csv")
    has_any_output = bool(rows)
    for path in (ctx.workspace / "stage2_followup.md",
                 ctx.workspace / "outputs" / "stage2_followup.md"):
        if path.exists():
            has_any_output = True
    if not has_any_output:
        return False  # no output at all → can't verify

    # Check stage2_followup.md -- if it exists, it should NOT just say "beat and raise"
    # without separating operational and FX
    for path in (ctx.workspace / "stage2_followup.md",
                 ctx.workspace / "outputs" / "stage2_followup.md"):
        if path.exists():
            text = path.read_text(encoding="utf-8-sig").lower()
            # If the document contains "beat and raise" but NOT operational/FX separation
            has_simple = ("beat and raise" in text or "beat_and_raise" in text
                          or "simple raise" in text)
            has_separation = (("operational" in text or "organic" in text)
                              and ("fx" in text or "headwind" in text
                                   or "currency" in text))
            if has_simple and not has_separation:
                return False
    # Check facts.csv for any "beat_and_raise" without decomposition
    rows = _read_csv(ctx, "facts.csv")
    bridge_decomposed = False
    for r in rows:
        combined = ((r.get("metric") or "") + " " + (r.get("basis") or "")
                    + " " + (r.get("note") or "")).lower()
        if "operational" in combined and ("fx" in combined or "headwind" in combined):
            bridge_decomposed = True
            break
    # If there are bridge rows but they are not decomposed, fail
    bridge_rows = [r for r in rows
                   if "bridge" in ((r.get("metric") or "") + " " + (r.get("note") or "")).lower()
                   or "guide" in (r.get("metric") or "").lower()]
    if bridge_rows and not bridge_decomposed:
        # Check if at least separate rows exist
        has_op = any("operational" in ((r.get("basis") or "") + " " + (r.get("note") or "")).lower()
                     for r in rows)
        has_fx = any("fx" in ((r.get("basis") or "") + " " + (r.get("note") or "")).lower()
                     or "headwind" in ((r.get("basis") or "") + " " + (r.get("note") or "")).lower()
                     for r in rows)
        if not (has_op and has_fx):
            return False
    return True


# -- RUBRIC --------------------------------------------------------------------

RUBRIC = {
    "stage0": [
        {"id": "S1_artifacts_exist",
         "checker": _s0_artifacts_exist, "weight": 1.0},
        {"id": "S1_growth_basis_split_preserved",
         "checker": _s0_growth_basis_split_preserved, "weight": 2.0},
        {"id": "S1_core_quarter_facts_captured",
         "checker": _s0_core_quarter_facts_captured, "weight": 1.5},
        {"id": "S1_guide_vs_street_direction_captured",
         "checker": _s0_guide_vs_street_direction_captured, "weight": 2.0},
        {"id": "S1_genai_and_large_deal_signal_captured",
         "checker": _s0_genai_and_large_deal_signal_captured, "weight": 1.5},
        {"id": "S1_tool_state_written_back",
         "checker": _s0_tool_state_written, "weight": 1.0},
    ],
    "stage1": [
        {"id": "S2_artifacts_exist",
         "checker": _s1_artifacts_exist, "weight": 1.0},
        {"id": "S2_guide_bridge_decomposed",
         "checker": _s1_guide_bridge_decomposed, "weight": 2.0},
        {"id": "S2_public_sector_drag_captured",
         "checker": _s1_public_sector_drag_captured, "weight": 1.5},
        {"id": "S2_lp_followup_answered_if_discovered",
         "checker": _s1_lp_followup_answered, "weight": 2.0},
        {"id": "S2_genai_followup_signals_captured",
         "checker": _s1_genai_followup_signals_captured, "weight": 1.5},
        {"id": "S2_tool_state_written_back",
         "checker": _s1_tool_state_written, "weight": 1.0},
    ],
    "stage2": [
        {"id": "S3_artifacts_exist",
         "checker": _s2_artifacts_exist, "weight": 1.0},
        {"id": "S3_salesforce_peer_update_extracted",
         "checker": _s2_salesforce_peer_update_extracted, "weight": 1.5},
        {"id": "S3_sector_vs_relative_resilience_separated",
         "checker": _s2_sector_vs_resilience_separated, "weight": 2.0},
        {"id": "S3_tool_state_written_back",
         "checker": _s2_tool_state_written, "weight": 1.0},
    ],
    "final": [
        {"id": "R1_no_external_email_sent",
         "checker": _r_no_external_email_sent, "weight": 2.0},
        {"id": "R2_basis_not_collapsed",
         "checker": _r_basis_not_collapsed, "weight": 2.0},
        {"id": "R3_no_simple_beat_and_raise_framing",
         "checker": _r_no_simple_beat_and_raise_framing, "weight": 2.0},
    ],
}

task_progress.py

"""ServiceNow Q1 2024 — reported vs constant-currency, guide bridge decomposition, silent FX / peer framing.

Environments: filesystem, email, notion, google_sheets
3 stages: initial read -> bridge decomposition + silent LP question -> overnight peer framing
19 core checkers (0 keyword-search)
"""
import csv
import re
from io import StringIO

# -- Constants -----------------------------------------------------------------

WATCHLIST_DB = "NOW_watchlist"
LP_QUESTIONS_DB = "LP_questions"

WATCHLIST_DB_SCHEMA = {
    "company": {"title": {}},
    "subscription_growth_reported": {"rich_text": {}},
    "subscription_growth_constant_currency": {"rich_text": {}},
    "q2_vs_street_direction": {"rich_text": {}},
    "genai_signal": {"rich_text": {}},
    "guide_bridge": {"rich_text": {}},
    "relative_resilience": {"rich_text": {}},
    "last_updated_stage": {"rich_text": {}},
}

LP_QUESTIONS_DB_SCHEMA = {
    "question_id": {"title": {}},
    "topic": {"rich_text": {}},
    "question": {"rich_text": {}},
    "status": {"select": {"options": [
        {"name": "open"}, {"name": "answered"},
    ]}},
}

# Google Sheets seed data
STREET_CONSENSUS_HEADER = [
    "company", "metric", "period", "value", "unit", "source",
    "position_status", "target_price",
]
STREET_CONSENSUS_ROWS = [
    ["ServiceNow", "Q2 subscription revenue", "Q2 2024", "2.54", "BUSD",
     "Consensus", "Overweight", "850"],
]

PEER_MONITOR_HEADER = [
    "company", "metric", "period", "value", "unit", "source", "date_added",
]

STAGE_LOG_HEADER = [
    "stage", "metric", "value", "unit", "basis", "direction", "note",
]


# -- Helpers -------------------------------------------------------------------

def _notion_title(value: str) -> dict:
    return {"title": [{"text": {"content": value}}]}


def _notion_text(value: str) -> dict:
    return {"rich_text": [{"text": {"content": value}}]}


def _notion_select(value: str) -> dict:
    return {"select": {"name": value}}


def _read_csv(ctx, filename: str) -> list[dict]:
    """Read a CSV from workspace/ or workspace/outputs/."""
    for path in (ctx.workspace / filename, ctx.workspace / "outputs" / filename):
        if path.exists():
            text = path.read_text(encoding="utf-8-sig")
            return list(csv.DictReader(StringIO(text)))
    return []


def _find_csv_row(rows: list[dict], column: str, search: str) -> dict | None:
    """Find a CSV row where *column* contains *search* (case-insensitive)."""
    for row in rows:
        val = row.get(column, "")
        if search.lower() in val.lower():
            return row
    return None


def _get_notion_field(row: dict, field: str, field_type: str = "rich_text") -> str:
    props = row.get("properties", {})
    prop = props.get(field, {})
    if field_type == "title":
        parts = prop.get("title", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "rich_text":
        parts = prop.get("rich_text", [])
        return "".join(t.get("plain_text", "") for t in parts)
    elif field_type == "select":
        sel = prop.get("select", {})
        return sel.get("name", "") if sel else ""
    elif field_type == "number":
        return prop.get("number", 0)
    return ""


def _parse_financial_number(raw: str) -> float | None:
    """Normalize financial strings to a comparable float.

    Handles: '2.54B', '2540M', '$2.54 billion', '2,540', '~2.5B', '25%', etc.
    Returns value in billions for monetary amounts, raw number for percentages.
    """
    if not raw:
        return None
    s = str(raw).strip().lower()
    s = re.sub(r'[~$,]', '', s)
    # Handle percentage values
    is_pct = '%' in s or 'pct' in s or 'percent' in s
    s = re.sub(r'[%]', '', s)
    s = s.replace('percent', '').replace('pct', '')
    s = s.replace('billion', 'b').replace('million', 'm')
    s = s.replace('bn', 'b').replace('mn', 'm')

    multiplier = 1.0
    if not is_pct:
        if s.endswith('b'):
            multiplier = 1.0
            s = s[:-1]
        elif s.endswith('m'):
            multiplier = 0.001
            s = s[:-1]
        elif s.endswith('k'):
            multiplier = 0.000001
            s = s[:-1]

    try:
        return float(s.strip()) * multiplier
    except ValueError:
        return None


def _values_close(actual: float, expected: float, rel_tol: float = 0.08) -> bool:
    """Check if two values are within relative tolerance."""
    if expected == 0:
        return abs(actual) < 0.01
    return abs(actual - expected) / abs(expected) <= rel_tol


def _md_has_content(ctx, filename: str, min_words: int = 30) -> bool:
    """Check workspace file exists and has meaningful content."""
    for path in (ctx.workspace / filename, ctx.workspace / "outputs" / filename):
        if path.exists():
            text = path.read_text(encoding="utf-8-sig").strip()
            if len(text.split()) >= min_words:
                return True
    return False


def _find_facts_row(rows: list[dict], metric: str, basis: str | None = None) -> dict | None:
    """Find a facts.csv row matching metric AND optionally basis."""
    for row in rows:
        m = row.get("metric", "").strip().lower()
        if metric.lower() not in m:
            continue
        if basis:
            b = row.get("basis", "").strip().lower()
            if basis.lower() not in b:
                continue
        return row
    return None


def _find_all_facts_rows(rows: list[dict], metric: str) -> list[dict]:
    """Find ALL facts.csv rows for a given metric."""
    return [r for r in rows if metric.lower() in (r.get("metric") or "").strip().lower()]


# -- METADATA ------------------------------------------------------------------

METADATA = {
    "id": "investment_analyst_task3",
    "name": "ServiceNow Q1 2024 -- Reported vs CC, Guide Bridge, Silent Peer Framing",
    "category": "investment_analyst",
    "environments": ["filesystem", "email", "notion", "google_sheets"],
    "timeout_seconds": 900,
    "difficulty": "hard",
    "mm_level": "L4",
    "role": "Research assistant to software analyst Chen Yi",
    "tags": [
        "finance", "earnings", "basis-separation", "multimodal",
        "silent-event", "peer-framing", "audio", "image", "pdf",
        "guide-bridge", "fx-separation",
    ],
    "env_config": {
        "email": {
            "users": {
                "assistant": {
                    "email": "[email protected]",
                    "password": "assistant_pwd",
                },
                "chen_yi": {
                    "email": "[email protected]",
                    "password": "chen_yi_pwd",
                },
                "now_ir": {
                    "email": "[email protected]",
                    "password": "now_ir_pwd",
                },
            },
        },
        "google_sheets": {
            "task_id": "investment_analyst_task3",
        },
    },
}

PROMPT = "Check your email and workspace for new earnings materials to process."


# -- Stage Functions -----------------------------------------------------------

async def stage0(ctx):
    """Stage 1 -- Initial Read: Wednesday April 24, 2024."""
    # 1. Upload assets (personality .md files + input materials)
    await ctx.fs.upload_dir(ctx.task_dir / "assets", "/workspace")

    # 2. Create Notion page + watchlist database (single-row)
    await ctx.notion.create_page("ServiceNow Coverage 2024-Q1")
    await ctx.notion.create_database(WATCHLIST_DB, WATCHLIST_DB_SCHEMA)
    await ctx.notion.add_database_row(WATCHLIST_DB, {
        "company": _notion_title("ServiceNow"),
        "subscription_growth_reported": _notion_text(""),
        "subscription_growth_constant_currency": _notion_text(""),
        "q2_vs_street_direction": _notion_text(""),
        "genai_signal": _notion_text(""),
        "guide_bridge": _notion_text(""),
        "relative_resilience": _notion_text(""),
        "last_updated_stage": _notion_text(""),
    })

    # 3. Create LP_questions database (empty -- will be seeded in stage1)
    await ctx.notion.create_database(LP_QUESTIONS_DB, LP_QUESTIONS_DB_SCHEMA)

    # 4. Create Google Sheets
    # StreetConsensus
    sc = await ctx.google_sheets.create_spreadsheet("StreetConsensus")
    await ctx.google_sheets.update_values(
        sc["sheet_id"], "Sheet1!A1:H2",
        [STREET_CONSENSUS_HEADER] + STREET_CONSENSUS_ROWS,
    )
    # software_comp (header only)
    pm = await ctx.google_sheets.create_spreadsheet("software_comp")
    await ctx.google_sheets.update_values(
        pm["sheet_id"], "Sheet1!A1:G1",
        [PEER_MONITOR_HEADER],
    )
    # now_stage_log (header only)
    sl = await ctx.google_sheets.create_spreadsheet("now_stage_log")
    await ctx.google_sheets.update_values(
        sl["sheet_id"], "Sheet1!A1:G1",
        [STAGE_LOG_HEADER],
    )

    # 5. Seed email -- IR materials notification
    await ctx.email.send_email(
        from_user="now_ir",
        to="[email protected]",
        subject="ServiceNow Q1 2024 Earnings Materials",
        body=(
            "Attached please find Q1 2024 earnings materials for ServiceNow:\n\n"
            "- Q1 2024 earnings call audio (now_q1_2024_call.mp3)\n"
            "- Q1 2024 earnings release (now_q1_2024_earnings_release.pdf)\n"
            "- Q1 2024 investor presentation (now_q1_2024_investor_presentation.pdf)\n"
            "- Q1 2024 10-Q (now_q1_2024_10q.pdf)\n"
            "- Q1 2024 call transcript (now_q1_call_transcript.html)\n"
            "- Guidance bridge image (now_guidance_bridge.png)\n"
            "- Q4 2023 earnings release (now_q4_2023_earnings_release.pdf)\n"
            "- Q4 2023 investor presentation (now_q4_2023_investor_presentation.pdf)\n\n"
            "Materials are in /workspace/input/."
        ),
    )

    # 6. Notification -- includes email from Chen Yi
    return {
        "notification": (
            "[Wednesday, April 24, 2024 08:30] You have a new email and a new "
            "message from Chen Yi.\n\n"
            "Your email: [email protected]. "
            "Chen Yi: [email protected].\n"
            "NOW watchlist: Notion database 'NOW_watchlist'. "
            "LP questions: Notion database 'LP_questions'.\n"
            "StreetConsensus, software_comp, now_stage_log: Google Sheets.\n"
            "All input materials in /workspace/input/.\n\n"
            "[Email from Chen Yi] "
            "\"ServiceNow Q1 earnings dropped. Start by clarifying subscription, "
            "cRPO, guidance, and AI. "
            "Keep reported and constant-currency bases separate. "
            "Do not mix FX with the operational raise.\""
        ),
        "time": "2024-04-24T08:30:00+08:00",
    }


async def stage1(ctx):
    """Stage 2 -- Bridge Decomposition: Thursday April 25, 2024."""
    # 1. Silent: Add LP question to Notion (agent must discover)
    await ctx.notion.add_database_row(LP_QUESTIONS_DB, {
        "question_id": _notion_title("LP-NOW-001"),
        "topic": _notion_text("operational raise vs FX headwind"),
        "question": _notion_text(
            "You must separate the operational raise from FX; "
            "do not write this up as a simple raise."
        ),
        "status": _notion_select("open"),
    })

    # 2. Loud: Chen Yi sends follow-up email
    await ctx.email.send_email(
        from_user="chen_yi",
        to="[email protected]",
        subject="NOW follow-up -- guide bridge and cRPO",
        body=(
            "Two things:\n\n"
            "1. How much was the FY24 subscription guide actually raised? "
            "Break it into operational improvement vs FX headwind.\n"
            "2. cRPO growth is only 21% -- explain the public-sector drag.\n\n"
            "Send me the follow-up note when ready."
        ),
    )

    # 3. Notification -- only mentions the loud email
    return {
        "notification": "[Thursday, April 25, 2024 09:00] You have a new email.",
        "time": "2024-04-25T09:00:00+08:00",
    }


async def stage2(ctx):
    """Stage 3 -- Sector Sentiment vs Relative Resilience: Thursday May 30, 2024."""
    # 1. Silent: Update software_comp with weaker-demand note
    pm_id = await ctx.google_sheets.get_spreadsheet_id("software_comp")
    if pm_id:
        await ctx.google_sheets.append_rows(
            pm_id, "Sheet1!A:G",
            [["Software Sector", "April demand signal", "Q2 2024",
              "weaker; NOW/MSFT held up better than WDAY/CRM", "note",
              "internal comp tracker", "2024-05-30"]],
        )

    # 2. Loud: Overnight news email with image reference
    await ctx.email.send_email(
        from_user="now_ir",
        to="[email protected]",
        subject="Overnight peer news -- Salesforce weak guidance",
        body=(
            "Overnight update: Salesforce issued weaker-than-expected guidance "
            "during their earnings call. See the screenshot at "
            "/workspace/input/overnight_news.png for coverage.\n\n"
            "Please frame the sector read-through for ServiceNow."
        ),
    )

    # 3. Notification -- mentions email but NOT the silent sheet update
    return {
        "notification": (
            "[Thursday, May 30, 2024 07:30] You have a new email with a "
            "new overnight news screenshot attached."
        ),
        "time": "2024-05-30T07:30:00+08:00",
    }

task2 task4