Tiered NPC dialogue, cost-capped

The bartender mentioning your 18-and-4 round by name is the moment Open Hours stops being "another competitive shooter" and starts being something the genre hasn't really done. It's also the feature most likely to bankrupt the project if I build it wrong.

Real LLM dialogue at scale is expensive. A naïve implementation — every NPC interaction triggers a Claude call — means a single active player chewing through dollars an hour. Multiply by 50 concurrent in soft alpha, by 200 in public alpha, by 1000+ at launch, and you're looking at an AI bill that grows linearly with active users in the most punishing way. That's not a business. That's a charity.

The fix is tiered inference. Three tiers, used in cost order, with hard caps. Most of the time the player gets a hand-written or templated line. Occasionally — when something noteworthy happens — they get a real LLM call. The system makes the choice; the player just experiences "the bartender said something specific about my last match."

Three-tier NPC inference pyramid: bark, template, real LLM — Three tiers, used in cost order. Most lines are free. Real inference fires only when it matters.

The three tiers

Tier 1 — Ambient barks ($0)

A line bank of about 200 pre-written lines per NPC. When a player walks past, when they stand idle near an NPC, when nothing in particular has happened — pick one at random with mild deduping so they don't repeat the same line twice in a session.

These cost nothing to deliver, take a couple of days to write, and account for ~80% of the speech the player hears in a normal session. They're flavor. Done well, the player stops noticing they're pre-written.

// Tier 1: pure local lookup, no network, no cost
public string PickAmbientLine(NpcId npc, PlayerContext ctx) {
    var bank = LineBanks[npc].Ambient;
    var notRecent = bank.Where(l => !ctx.RecentLines.Contains(l.Id));
    return notRecent.RandomOrDefault(bank).Text;
}

Tier 2 — Templates ($0)

The NPC's first click in a session, or a return visit after a meaningful event, gets a templated line. These are pre-written templates with variables injected from the player's actual data:

"Same chest piece for {weeks} weeks. Brave. Or comfortable."
"Heard you went {kills} and {deaths} last round. {result_remark}."
"You bought the {cosmetic}. Wore it {wear_count} times. {wear_remark}."

The template engine is small — basic conditionals, variable substitution, and a curated set of "remark" fragments per branch. Still $0 per render. The variability comes from your actual match data being injected, so two different players hear different specifics from the same template.

Templates account for the next ~15% of speech. They're where most of the "wait, the NPC knows things about me" moments come from, and they're the highest-leverage tier in the whole system.

Tier 3 — Real LLM (~2¢ per call)

The remaining 5% — the moments that have to be specific, narrative, and unique — go to a real Anthropic API call. The bartender on first contact. The bartender after a notable match (something the player will remember, like a perfect game or a near-comeback). Weekly events. Story beats.

The system prompt for each NPC is designed once, versioned, and includes the persona, the worldview, the speech style, and a constrained set of inputs. The user-message portion is structured: recent match, recent loadout, recent NPC interactions, current state. Output is a short paragraph. That's it.

POST /v1/messages (Anthropic, claude-haiku-4-5-20251001)

System: You are the bartender at a competitive arena town.
Skill matters. Gear is a lever. Cosmetics are flex. Be warm,
specific, observant. Reference the player's last match if it
was notable. Keep it under 60 words. Never break character.

User: Player {display_name} just returned from Arena.
Last match: 18 kills, 4 deaths, MVP, 7-minute round.
Loadout: legendary chest, blue jacket they wore once.
Last bartender exchange: 6 days ago, joked about their losing streak.

Bartender conversation referencing recent match in town — Tier 3 in action: the bartender comments on a notable round, by name, by score, by playstyle.

The cost ceiling

None of this matters without enforcement. The architecture has three hard limits that fire in order:

Per-prompt cache. Identical prompt within 60 seconds returns the cached response. Stops a double-click on an NPC from being two API calls.
Per-player rate limit. Maximum 5 Tier 3 calls per player per hour, hard-coded into the Edge Function. The 6th request gets a Tier 2 template instead and the player never knows.
Daily cost ceiling. Per-player and global. We monitor inference cost in a Supabase inference_log table. If a player's daily spend exceeds $0.10, they fall back to Tier 2 for the rest of the day. If global daily spend exceeds the runway, Tier 3 disables for everyone until the next reset.

These aren't features the player sees. They're guardrails the system enforces so the project survives contact with real usage patterns. The player just sees consistent NPC behavior; the cost stays bounded.

Why server-side only? Anthropic API keys never touch the client. Every NPC inference is a server call from the match server (or from a Supabase Edge Function for town-side interactions). This isn't optional — a leaked client key means anyone in the world can drain your account. The architecture from last episode makes this trivial: the server is already where authority lives.

The model choice

Currently using claude-haiku-4-5-20251001 for all Tier 3 calls. Haiku is the right cost/quality point for short, persona-driven, in-character generation. Sonnet would be overkill for 60-word bartender lines and 4× the cost. We'd reconsider for premium dialogue moments — the Earth Beacon endgame, season finales — but for steady-state town chatter, Haiku reads convincingly in-character and stays under the cost ceiling.

Caching at the prompt level matters more than model selection. If two players hit the bartender after similar-shaped matches, the prompt hash collides and they share a cached response. We're not chasing 100% uniqueness per player; we're chasing "feels like the bartender knows you" with bounded cost. Those two goals are not the same goal.

Why this is the differentiator

The pillars from EP 01 say "the town is the draw." That only works if the town responds to you. Static NPCs with random lines don't do it — players figure that out in two sessions and the town becomes a vending machine again. Real reactivity is the moat.

But real reactivity that costs $5 per active hour is a moat with a price tag the project can't pay. The tiered architecture is what lets us have both: NPCs that genuinely know what just happened to you, at a per-player cost that scales with the business, not against it.

That's all three Week 1 devlogs out. Next batch covers the actual Week 1 build progress — what shipped, what got cut, what surprised. Get on the list to catch them.