H1LLM Engine Upgrade — Forward-Looking Spec
Status: 📝 Planning Notes (pre-consolidation) Authoritative source for: multi-model, multi-provider, per-user keys, auth gating, usage capping.
H21. Current Architecture (as of 2026-05)
openai.ts (singleton client, single OPENAI_API_KEY from env)
├── runLLM() → openai.responses.create (free-text / "template" execution)
└── runStructured() → openai.responses.parse (zod-schema / "structured" execution)
└── fallback → runLLM() (when parse fails)
modelRegistry.ts (5 static profiles mapped to model strings)
compat → "gpt-4.1" (default — preserves existing behaviour)
frontier → "gpt-5.5" (complex reasoning / coding / orchestration)
balanced → "gpt-5.4" (affordable quality)
fast → "gpt-5.4-mini" (low-latency structured prompts)
cheap → "gpt-5.4-nano" (drafts / classification)
H22. Desired End State
┌──────────────────────────────┐
│ ModelRouter (new) │
│ resolves: provider + model │
│ + API key for each request │
└──────────┬───────────────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
openai-provider.ts anthropic-provider.ts google-provider.ts
(OpenAI SDK) (Anthropic SDK) (Google AI SDK)
│ │ │
└──────────────────────┴──────────────────────┘
│
┌──────────▼───────────────────┐
│ RateLimitInterceptor │
│ UsageTelemetry │
│ AuthGate (free tier cap) │
└──────────────────────────────┘
H23. Multi-Model Selection
H33.1 Resolution Chain (per request)
1. User-level override (user_settings.apiKey + user_settings.model)
2. Mode-level default (MODE_REGISTRY[mode].modelProfile ??)
3. Surface-level pin (assistantSurfaceRegistry[surface].modelProfile ??)
4. System default (DEFAULT_ASSISTANT_MODEL_PROFILE = "compat")
H33.2 Model Profiles — expandable
Keep the profile abstraction so surfaces reference intent not concrete model names.
// Updated modelRegistry.ts
export type AssistantModelProfile =
| "compat" // gpt-4.1 — legacy preservation
| "frontier" // gpt-5.5 — reasoning / orchestration
| "balanced" // gpt-5.4 — quality + cost balance
| "fast" // gpt-5.4-mini — low-latency
| "cheap" // gpt-5.4-nano — drafts
| "user-chosen"; // resolved from UserSettings.modelPreference
export type ModelProvider = "openai" | "anthropic" | "google" | "groq";
H33.3 Provider-Level Config
export type ProviderConfig = {
provider: ModelProvider;
apiKeySource: "env" | "user" | "system";
models: { profile: AssistantModelProfile; modelId: string }[];
};
H24. Per-User API Key Override
H34.1 Data Model (Prisma)
model UserSettings {
id String @id @default(cuid())
userId String @unique
openaiApiKey String? // encrypted at rest
anthropicApiKey String? // encrypted at rest
googleApiKey String? // encrypted at rest
preferredModel String? // model profile or raw model ID
aiEnabled Boolean @default(true)
usageLimit Int? // max requests per period (null = unlimited)
usagePeriod String? // "daily" | "monthly" | null
usageCount Int @default(0)
usageResetAt DateTime?
}
H34.2 Key Resolution in `getOpenAI()`
export async function getOpenAIForUser(userId?: string | null): Promise<OpenAI> {
if (userId) {
const settings = await prisma.userSettings.findUnique({ where: { userId } });
if (settings?.openaiApiKey) {
return new OpenAI({ apiKey: decrypt(settings.openaiApiKey) });
}
}
// fall back to system key
return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
}
H34.3 UI for User Settings
Routes and components to create:
/settings/ai— API key management, model preference, enable/disable AIUserAISettingsPanelcomponent — embeddable in any settings pageApiKeyFieldcomponent — masked input with "test connection" buttonModelSelectorcomponent — dropdown of available models for the user's provider
H25. Dynamic Model List for Selectors
H35.1 User-Configurable Model Presets
Users should be able to define a list of "favorite" models that appear in a quick-selector:
// Stored in UserSettings.favoriteModels: string[]
// Each entry is a model ID like "gpt-5.4-mini" or "claude-sonnet-5"
H35.2 Surface-Bound Model Pinning
Each assistant surface can optionally pin a model profile:
// assistantSurfaceRegistry entry:
trips: {
key: "trips",
defaultMode: "coaching",
modelProfile: "fast", // ← NEW: pin to fast model
structuredModelProfile: "compat", // ← NEW: pin structured calls to compat
// ...existing fields
}
H26. Auth Gating & Usage Limiting
H36.1 Gating Tiers
| Tier | Who | AI Access | Model | Rate Limit |
|---|---|---|---|---|
| 0 | Unauthenticated | ❌ None | — | — |
| 1 | Unverified email | ⚠️ Sample (3 prompts) | cheap only | 3 total, then locked |
| 2 | Verified Google/Email | ✅ Full | user's choice | configured per plan |
| 3 | Own API key set | ✅ Unlimited | user's key | no platform limit |
H36.2 Enforcement Layer
A middleware or interceptor wrapping every LLM call:
export async function withAIGuard<T>(
userId: string | null,
fn: () => Promise<T>
): Promise<T> {
const tier = await resolveUserTier(userId);
if (tier === "blocked") {
throw new AIAccessError("AI features require a verified account.");
}
if (tier === "sample") {
const used = await getUsageCount(userId);
if (used >= 3) {
throw new AIAccessError("Free sample used. Sign in with a verified account for full access.");
}
await incrementUsage(userId);
}
return fn();
}
H36.3 Public Site Kill Switch
// In app/lib/ai/featureFlags.ts
export const AI_FEATURES_ENABLED = process.env.AI_FEATURES_ENABLED !== "false";
// Wrap all AI surfaces:
if (!AI_FEATURES_ENABLED) {
return <AIDisabledBanner />;
}
H36.4 UI Surfaces to Gate
| Surface | Current | Future |
|---|---|---|
/assistant assistant panel | visible to all | require verified user |
/trips/[tripId] AI chat | visible to all | gate behind auth |
/ideas | visible to all | gate behind auth |
/study-guide/mlv | visible to all | gate behind auth |
/concept-cards/new | visible to all | gate behind auth |
| Floating dock AI toggle | visible to all | hide when not authorized |
H27. Multi-Provider Integration
H37.1 Abstract Provider Interface
export interface AIProvider {
name: ModelProvider;
runLLM(params: LLMParams): Promise<{ output: string }>;
runStructured(params: StructuredParams): Promise<{ output_text: string; raw: unknown; fallback: boolean }>;
isAvailable(userId?: string): Promise<boolean>;
listModels(): string[];
}
H37.2 Provider Implementations
| Provider | SDK | Models | When to use |
|---|---|---|---|
| OpenAI | openai | gpt-4.1, gpt-4.1-mini, gpt-5.4 series, gpt-5.5 | Default / primary |
| Anthropic | @anthropic-ai/sdk | claude-sonnet-5, claude-haiku-5 | Long context, safety-critical |
@google/generative-ai | gemini-2.5-pro, gemini-2.5-flash | Vision, multimodal | |
| Groq | groq-sdk | llama-4, mixtral | Ultra-low latency inference |
H37.3 Routing Between Providers
Use MODE_REGISTRY or surface config to pick provider per use case:
// Example: coaching mode uses OpenAI frontier for orchestration
// but Claude for content generation when context > 32K tokens
coaching: {
...,
providers: {
orchestration: { provider: "openai", profile: "frontier" },
content: { provider: "anthropic", profile: "balanced" },
}
}
H28. Rate Limiting & Telemetry
H38.1 Usage Tracking
model AIDailyUsage {
id String @id @default(cuid())
userId String
date DateTime @db(Date)
count Int @default(0)
tokensIn Int @default(0)
tokensOut Int @default(0)
cost Float @default(0)
@@unique([userId, date])
}
H38.2 Interceptor Pattern
export async function trackUsage(
userId: string,
model: string,
tokensIn: number,
tokensOut: number
): Promise<void> {
const today = new Date().toISOString().split("T")[0];
await prisma.aIDailyUsage.upsert({
where: { userId_date: { userId, date: today } },
create: { userId, date: today, count: 1, tokensIn, tokensOut },
update: { count: { increment: 1 }, tokensIn: { increment: tokensIn }, tokensOut: { increment: tokensOut } },
});
}
H29. Phased Rollout
H3Phase A — Consolidation (now → next sprint)
- Unify 5 prompt paths → 1 router
- Remove dead code (
legacy/, duplicate API routes) - Make coaching route through
resolveCoachingSpecialization().pipeline - Wire
IdeaServiceas structured entity generator for coaching sub-pipelines
H3Phase B — Auth Gating (next)
- Add
UserSettingsmodel + migration - Add
/settings/airoute + components - Implement
withAIGuard()interceptor - Gate all AI surfaces
H3Phase C — Multi-Model (next + 1)
- Expand
modelRegistry.tswith provider-aware profiles - Implement per-user key override in
openai.ts - Add
ModelSelectorcomponent - Wire per-surface model pinning
H3Phase D — Multi-Provider (future)
- Abstract
AIProviderinterface - Implement Anthropic provider
- Implement Google provider
- Dynamic provider routing per mode/surface
H3Phase E — Telemetry & Rate Limiting (future)
- Add
AIDailyUsagetracking - Implement cost estimation
- Usage dashboard for admin
H29b. User-Configurable Personas
Users need to create their own personas — character roles, department specialists, domain experts — that add to the existing built-in set. The built-in set covers the ~20/80 base domain needs; user-created ones fill the long tail.
H39b.1 Current Persona Infrastructure
Two parallel systems exist today:
| Source | Storage | Scope | Used by |
|---|---|---|---|
app/assistant/actions/personas.ts | Static PERSONAS object (4 hardcoded: maestro, general, developer, writer) | Global code | planMlv.ts, fetchMlvContext |
prisma.aIPersona table | Database rows (key, name, role, systemPrompt) | Global DB | runDevLayerAction(), runAIPrompt(), AssistantPanel persona picker |
Both are admin-controlled — users cannot add to either.
H39b.2 Target Architecture
UserPersona (Prisma model) — user-owned, soft-merge with built-in at runtime
↓
PersonaResolver (new) — merges built-in + user personas into a single catalogue
↓
AssistantPanel / Prompt Router — user selects from merged list
H39b.3 Prisma Model
model UserPersona {
id String @id @default(cuid())
userId String
key String // user-scoped slug, e.g. "support-agent"
name String
role String? // short descriptor, e.g. "Customer Support Specialist"
systemPrompt String // the full system prompt for this persona
avatar String? // emoji or URL
tags String[] // for filtering: ["department", "character", "domain", ...]
isPublic Boolean @default(false) // share with team?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@unique([userId, key])
}
H39b.4 Persona Resolution
// app/assistant/personas/personaResolver.ts
export async function resolvePersonaCatalogue(userId?: string | null) {
// 1. Built-in personas from DB (AIPersona table)
const builtIn = await prisma.aIPersona.findMany();
// 2. User-created personas
const userPersonas: UserPersona[] = userId
? await prisma.userPersona.findMany({ where: { userId } })
: [];
// 3. Merge — user personas keyed as "user:key" to avoid collisions
const catalogue = [
...builtIn.map(p => ({ ...p, source: "system" as const })),
...userPersonas.map(p => ({ ...p, key: `user:${p.key}`, source: "user" as const })),
];
return catalogue;
}
H39b.5 UI Surfaces
| Surface | What to show | Action |
|---|---|---|
/settings/ai persona tab | List of user's personas, create/edit/delete | CRUD on UserPersona |
AssistantPanel persona picker | Merged list (system + user: prefixed) | Select persona → loads systemPrompt |
/personas/new route | Form: name, role, system prompt, tags, avatar | Create UserPersona |
/personas/[key]/edit route | Edit form | Update UserPersona |
| Department view | Filter by tags: ["department", "support"] | Scope selection for team leads |
H39b.6 Departments & Specialists (Tag-Driven)
Tags let users organize personas by modality:
type PersonaModality =
| "character" // e.g. "Maestro de Ritmos", historical figures
| "department" // e.g. "Support Agent", "Sales Rep", "Operations"
| "domain" // e.g. "Trip Planning", "Content Writing", "Code Review"
| "specialist" // e.g. "SEO Auditor", "Legal Reviewer", "Translator"
Departments can have default personas assigned by workspace admins, but individual users can override or supplement.
H39b.7 Prompt Injection
When a user selects a persona, its systemPrompt is injected into the prompt chain:
MODE_REGISTRY[mode].description (mode-level instructions)
↓
resolveCoachingSpecialization() (route-level context)
↓
Persona.systemPrompt (persona-level character/role)
↓
User's current prompt (user input)
H39b.8 Built-In Persona Set (20/80 Baseline)
The existing 4 should be expanded to cover the core app domains:
| Key | Name | Domain |
|---|---|---|
| general | General Assistant | Default chat |
| maestro | Maestro de Ritmos | Cash-flow / strategy |
| developer | Developer | Code generation |
| writer | Creative Writer | Content / copy |
| trip-planner | Trip Planner | Trip itinerary |
| document-assistant | Document Assistant | Document editing |
| support-agent | Support Agent | Customer support |
| operations | Operations Manager | Back-office ops |
These stay in prisma.aIPersona (admin-managed). Users extend from here with UserPersona.
H39b.9 Backward Compatibility
- Existing
PERSONASstatic object can be deprecated onceprisma.aIPersonacovers the same 4 - Existing
prisma.aIPersonaqueries continue working for system personas UserPersonatable is additive — no migration needed for existing data- Persona picker in
AssistantPanelgets a "User personas" divider
H39b.10 Persona as Full Context (Not Just a Prompt Drop-In)
A persona is not merely a systemPrompt string — it is a deep characterization that defines the AI's entire contextual worldview for that interaction:
// Expanded UserPersona with context fields
model UserPersona {
// ...existing fields...
systemPrompt String // core instruction
context String? // world-building: backstory, setting, lore, constraints
knowledge String[] // pinned domain knowledge: ["andalusian-geography", "hiking-trails-granada"]
tools String[] // allowed tool keys: ["trip-planner", "document-search", "maps"]
voice String? // tone/style: "poetic", "technical", "warm", "direct"
constraints String[] // hard rules: ["never-book-without-confirmation", "no-medical-advice"]
sampleQueries String[] // example prompts to help user understand the persona
}
This means a single persona can be:
| Use Case | persona.name | persona.context (excerpt) |
|---|---|---|
| 🎭 Theatre | "Flamenco Poet" | "You are a gitano elder in Sacromonte who speaks in verse. You know every cave, every zambra, every family line. You never give direct answers — only stories." |
| 💼 High-end Ops | "Operations Director" | "You run a 5-star DMC in Andalusia. Your vocabulary is P&L, yield, margin, partner SLA, contingency. You audit every plan for operational risk." |
| 🥾 Nature Lover | "Sierra Nevada Guide" | "You have walked every trail in the Alpujarras for 30 years. You know which wildflowers bloom in April, which refugios are open, and where to see ibex at dawn." |
| 👩💻 Developer | "Solutions Architect" | "You design systems for scale. Your default question is 'what happens at 10x traffic?'. You prefer Prisma, Next.js, and serverless." |
The context field enables rich character immersion — the persona is not instructing the AI, it's inhabiting a role. This is critical for:
- Theatre mode: multiple personas conversing with each other, each with distinct voice and knowledge
- Business operations: persona-as-department with specific vocabulary, constraints, and authority
- Domain depth: persona carries domain knowledge without cluttering the user's prompt
H39b.11 Multi-Persona Sessions (Theatre Mode)
A single conversation thread can involve multiple personas:
┌──────────────────────────────┐
│ Session Orchestrator │
│ (tracks which persona is │
│ active, manages turn-taking) │
└──────┬───────────────┬─────────┘
│ │
┌─────────▼─────┐ ┌─────▼──────────┐
│ Persona A │ │ Persona B │
│ (Flamenco Poet)│ │ (Ops Director) │
│ systemPrompt │ │ systemPrompt │
│ + context │ │ + context │
│ + voice │ │ + voice │
│ + constraints │ │ + constraints │
└────────────────┘ └─────────────────┘
Per-prompt persona override: the user can switch the active persona mid-conversation or tag a specific message as being answered by a different persona. The existing AssistantPanel persona picker already supports this pattern — it is per-prompt, not per-session.
Turn-taking protocol (future):
User: "Plan a luxury hiking trip for 4 VIP clients"
→ Ops Director responds with itinerary structure
User: "Now describe the sunrise hike in poetic terms"
→ Flamenco Poet responds with lyrical description
The mode selector in AssistantPanel (general / coaching) and the persona selector work independently:
- Mode controls execution style (template chat vs structured pipeline)
- Persona controls voice, context, domain knowledge, and constraints
- A coaching-mode prompt can use any persona; a general-mode prompt can use any persona
H39b.12 User Configuration Interface
The user manages personas from within their settings, not from admin:
| Route | Purpose | Key fields |
|---|---|---|
/settings/ai/personas | List all user personas | name, role, tags, preview |
/settings/ai/personas/new | Create persona | name, role, systemPrompt, context, voice, tags |
/settings/ai/personas/[key]/edit | Edit persona | full editor with context builder |
/settings/ai/personas/[key]/preview | Test the persona with a sample prompt | live LLM preview |
The context builder UI should guide users through defining:
- Role & setting — "Who are you? Where are you? What is your world?"
- Voice & tone — "Poetic? Technical? Warm? Direct?"
- Knowledge domain — "What do you know deeply? What do you not know?"
- Hard constraints — "What will you never do or say?"
- Sample interactions — "Show 3 example conversations"
This replaces the current single-textarea systemPrompt with a structured form that generates the full persona context.
H39b.13 Existing Front-End Integration
The AssistantPanel already has the two selectors this depends on:
┌─────────────────────────────────┐
│ Mode picker: [general] [coaching] │ ← execution style
│ Persona picker: [persona1] [persona2] │ ← voice/context
└─────────────────────────────────┘
These work independently per-prompt today. The upgrade is:
- Source the persona list from
PersonaResolver(system + user merged) instead ofprisma.aIPersonaalone - Add the full context fields (context, voice, constraints) to the persona resolution chain
- Support per-message persona override in the thread/message model (store active persona per message, not per thread)
- Add a "Theatre Mode" toggle that enables multi-persona turn-taking within a single thread
H210. Backward Compatibility
gpt-4.1remains thecompatprofile default throughout all phases- Existing
OPENAI_API_KEYenv var continues working - All existing
runLLM()andrunStructured()calls keep working without changes - New features (per-user keys, multi-provider) are opt-in at every level
- Kill switch (
AI_FEATURES_ENABLED=false) disables all AI without code changes