Molino⺢ Practice

H1LLM Engine Upgrade — Forward-Looking Spec

Status: 📝 Planning Notes (pre-consolidation) Authoritative source for: multi-model, multi-provider, per-user keys, auth gating, usage capping.

H21. Current Architecture (as of 2026-05)

openai.ts  (singleton client, single OPENAI_API_KEY from env)
    ├── runLLM()       → openai.responses.create        (free-text / "template" execution)
    └── runStructured() → openai.responses.parse         (zod-schema / "structured" execution)
                            └── fallback → runLLM()      (when parse fails)

modelRegistry.ts  (5 static profiles mapped to model strings)
    compat   → "gpt-4.1"     (default — preserves existing behaviour)
    frontier → "gpt-5.5"     (complex reasoning / coding / orchestration)
    balanced → "gpt-5.4"     (affordable quality)
    fast     → "gpt-5.4-mini" (low-latency structured prompts)
    cheap    → "gpt-5.4-nano" (drafts / classification)

H22. Desired End State

                          ┌──────────────────────────────┐
                          │     ModelRouter (new)         │
                          │  resolves: provider + model   │
                          │  + API key for each request   │
                          └──────────┬───────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              ▼                      ▼                      ▼
     openai-provider.ts     anthropic-provider.ts     google-provider.ts
     (OpenAI SDK)           (Anthropic SDK)           (Google AI SDK)
              │                      │                      │
              └──────────────────────┴──────────────────────┘
                                     │
                          ┌──────────▼───────────────────┐
                          │  RateLimitInterceptor         │
                          │  UsageTelemetry               │
                          │  AuthGate (free tier cap)     │
                          └──────────────────────────────┘

H23. Multi-Model Selection

H33.1 Resolution Chain (per request)

  1. User-level override (user_settings.apiKey + user_settings.model)
  2. Mode-level default  (MODE_REGISTRY[mode].modelProfile ??)
  3. Surface-level pin   (assistantSurfaceRegistry[surface].modelProfile ??)
  4. System default      (DEFAULT_ASSISTANT_MODEL_PROFILE = "compat")

H33.2 Model Profiles — expandable

Keep the profile abstraction so surfaces reference intent not concrete model names.

// Updated modelRegistry.ts
export type AssistantModelProfile =
  | "compat"        // gpt-4.1 — legacy preservation
  | "frontier"      // gpt-5.5 — reasoning / orchestration
  | "balanced"      // gpt-5.4 — quality + cost balance
  | "fast"          // gpt-5.4-mini — low-latency
  | "cheap"         // gpt-5.4-nano — drafts
  | "user-chosen";  // resolved from UserSettings.modelPreference

export type ModelProvider = "openai" | "anthropic" | "google" | "groq";

H33.3 Provider-Level Config

export type ProviderConfig = {
  provider: ModelProvider;
  apiKeySource: "env" | "user" | "system";
  models: { profile: AssistantModelProfile; modelId: string }[];
};

H24. Per-User API Key Override

H34.1 Data Model (Prisma)

model UserSettings {
  id              String  @id @default(cuid())
  userId          String  @unique
  openaiApiKey    String? // encrypted at rest
  anthropicApiKey String? // encrypted at rest
  googleApiKey    String? // encrypted at rest
  preferredModel  String? // model profile or raw model ID
  aiEnabled       Boolean @default(true)
  usageLimit      Int?    // max requests per period (null = unlimited)
  usagePeriod     String? // "daily" | "monthly" | null
  usageCount      Int     @default(0)
  usageResetAt    DateTime?
}

H34.2 Key Resolution in `getOpenAI()`

export async function getOpenAIForUser(userId?: string | null): Promise<OpenAI> {
  if (userId) {
    const settings = await prisma.userSettings.findUnique({ where: { userId } });
    if (settings?.openaiApiKey) {
      return new OpenAI({ apiKey: decrypt(settings.openaiApiKey) });
    }
  }
  // fall back to system key
  return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
}

H34.3 UI for User Settings

Routes and components to create:

/settings/ai — API key management, model preference, enable/disable AI
UserAISettingsPanel component — embeddable in any settings page
ApiKeyField component — masked input with "test connection" button
ModelSelector component — dropdown of available models for the user's provider

H25. Dynamic Model List for Selectors

H35.1 User-Configurable Model Presets

Users should be able to define a list of "favorite" models that appear in a quick-selector:

// Stored in UserSettings.favoriteModels: string[]
// Each entry is a model ID like "gpt-5.4-mini" or "claude-sonnet-5"

H35.2 Surface-Bound Model Pinning

Each assistant surface can optionally pin a model profile:

// assistantSurfaceRegistry entry:
trips: {
  key: "trips",
  defaultMode: "coaching",
  modelProfile: "fast",           // ← NEW: pin to fast model
  structuredModelProfile: "compat", // ← NEW: pin structured calls to compat
  // ...existing fields
}

H26. Auth Gating & Usage Limiting

H36.1 Gating Tiers

Tier	Who	AI Access	Model	Rate Limit
0	Unauthenticated	❌ None	—	—
1	Unverified email	⚠️ Sample (3 prompts)	cheap only	3 total, then locked
2	Verified Google/Email	✅ Full	user's choice	configured per plan
3	Own API key set	✅ Unlimited	user's key	no platform limit

H36.2 Enforcement Layer

A middleware or interceptor wrapping every LLM call:

export async function withAIGuard<T>(
  userId: string | null,
  fn: () => Promise<T>
): Promise<T> {
  const tier = await resolveUserTier(userId);

  if (tier === "blocked") {
    throw new AIAccessError("AI features require a verified account.");
  }

  if (tier === "sample") {
    const used = await getUsageCount(userId);
    if (used >= 3) {
      throw new AIAccessError("Free sample used. Sign in with a verified account for full access.");
    }
    await incrementUsage(userId);
  }

  return fn();
}

H36.3 Public Site Kill Switch

// In app/lib/ai/featureFlags.ts
export const AI_FEATURES_ENABLED = process.env.AI_FEATURES_ENABLED !== "false";

// Wrap all AI surfaces:
if (!AI_FEATURES_ENABLED) {
  return <AIDisabledBanner />;
}

H36.4 UI Surfaces to Gate

Surface	Current	Future
`/assistant` assistant panel	visible to all	require verified user
`/trips/[tripId]` AI chat	visible to all	gate behind auth
`/ideas`	visible to all	gate behind auth
`/study-guide/mlv`	visible to all	gate behind auth
`/concept-cards/new`	visible to all	gate behind auth
Floating dock AI toggle	visible to all	hide when not authorized

H27. Multi-Provider Integration

H37.1 Abstract Provider Interface

export interface AIProvider {
  name: ModelProvider;
  runLLM(params: LLMParams): Promise<{ output: string }>;
  runStructured(params: StructuredParams): Promise<{ output_text: string; raw: unknown; fallback: boolean }>;
  isAvailable(userId?: string): Promise<boolean>;
  listModels(): string[];
}

H37.2 Provider Implementations

Provider	SDK	Models	When to use
OpenAI	`openai`	gpt-4.1, gpt-4.1-mini, gpt-5.4 series, gpt-5.5	Default / primary
Anthropic	`@anthropic-ai/sdk`	claude-sonnet-5, claude-haiku-5	Long context, safety-critical
Google	`@google/generative-ai`	gemini-2.5-pro, gemini-2.5-flash	Vision, multimodal
Groq	`groq-sdk`	llama-4, mixtral	Ultra-low latency inference

H37.3 Routing Between Providers

Use MODE_REGISTRY or surface config to pick provider per use case:

// Example: coaching mode uses OpenAI frontier for orchestration
// but Claude for content generation when context > 32K tokens
coaching: {
  ...,
  providers: {
    orchestration: { provider: "openai", profile: "frontier" },
    content: { provider: "anthropic", profile: "balanced" },
  }
}

H28. Rate Limiting & Telemetry

H38.1 Usage Tracking

model AIDailyUsage {
  id        String   @id @default(cuid())
  userId    String
  date      DateTime @db(Date)
  count     Int      @default(0)
  tokensIn  Int      @default(0)
  tokensOut Int      @default(0)
  cost      Float    @default(0)
  @@unique([userId, date])
}

H38.2 Interceptor Pattern

export async function trackUsage(
  userId: string,
  model: string,
  tokensIn: number,
  tokensOut: number
): Promise<void> {
  const today = new Date().toISOString().split("T")[0];
  await prisma.aIDailyUsage.upsert({
    where: { userId_date: { userId, date: today } },
    create: { userId, date: today, count: 1, tokensIn, tokensOut },
    update: { count: { increment: 1 }, tokensIn: { increment: tokensIn }, tokensOut: { increment: tokensOut } },
  });
}

H29. Phased Rollout

H3Phase A — Consolidation (now → next sprint)

Unify 5 prompt paths → 1 router
Remove dead code (legacy/, duplicate API routes)
Make coaching route through resolveCoachingSpecialization().pipeline
Wire IdeaService as structured entity generator for coaching sub-pipelines

H3Phase B — Auth Gating (next)

Add UserSettings model + migration
Add /settings/ai route + components
Implement withAIGuard() interceptor
Gate all AI surfaces

H3Phase C — Multi-Model (next + 1)

Expand modelRegistry.ts with provider-aware profiles
Implement per-user key override in openai.ts
Add ModelSelector component
Wire per-surface model pinning

H3Phase D — Multi-Provider (future)

Abstract AIProvider interface
Implement Anthropic provider
Implement Google provider
Dynamic provider routing per mode/surface

H3Phase E — Telemetry & Rate Limiting (future)

Add AIDailyUsage tracking
Implement cost estimation
Usage dashboard for admin

H29b. User-Configurable Personas

Users need to create their own personas — character roles, department specialists, domain experts — that add to the existing built-in set. The built-in set covers the ~20/80 base domain needs; user-created ones fill the long tail.

H39b.1 Current Persona Infrastructure

Two parallel systems exist today:

Source	Storage	Scope	Used by
`app/assistant/actions/personas.ts`	Static `PERSONAS` object (4 hardcoded: maestro, general, developer, writer)	Global code	`planMlv.ts`, `fetchMlvContext`
`prisma.aIPersona` table	Database rows (key, name, role, systemPrompt)	Global DB	`runDevLayerAction()`, `runAIPrompt()`, `AssistantPanel` persona picker

Both are admin-controlled — users cannot add to either.

H39b.2 Target Architecture

UserPersona (Prisma model) — user-owned, soft-merge with built-in at runtime
    ↓
PersonaResolver (new) — merges built-in + user personas into a single catalogue
    ↓
AssistantPanel / Prompt Router — user selects from merged list

H39b.3 Prisma Model

model UserPersona {
  id           String   @id @default(cuid())
  userId       String
  key          String   // user-scoped slug, e.g. "support-agent"
  name         String
  role         String?  // short descriptor, e.g. "Customer Support Specialist"
  systemPrompt String   // the full system prompt for this persona
  avatar       String?  // emoji or URL
  tags         String[] // for filtering: ["department", "character", "domain", ...]
  isPublic     Boolean  @default(false) // share with team?
  createdAt    DateTime @default(now())
  updatedAt    DateTime @updatedAt

  @@unique([userId, key])
}

H39b.4 Persona Resolution

// app/assistant/personas/personaResolver.ts

export async function resolvePersonaCatalogue(userId?: string | null) {
  // 1. Built-in personas from DB (AIPersona table)
  const builtIn = await prisma.aIPersona.findMany();

  // 2. User-created personas
  const userPersonas: UserPersona[] = userId
    ? await prisma.userPersona.findMany({ where: { userId } })
    : [];

  // 3. Merge — user personas keyed as "user:key" to avoid collisions
  const catalogue = [
    ...builtIn.map(p => ({ ...p, source: "system" as const })),
    ...userPersonas.map(p => ({ ...p, key: `user:${p.key}`, source: "user" as const })),
  ];

  return catalogue;
}

H39b.5 UI Surfaces

Surface	What to show	Action
`/settings/ai` persona tab	List of user's personas, create/edit/delete	CRUD on `UserPersona`
`AssistantPanel` persona picker	Merged list (system + user: prefixed)	Select persona → loads systemPrompt
`/personas/new` route	Form: name, role, system prompt, tags, avatar	Create `UserPersona`
`/personas/[key]/edit` route	Edit form	Update `UserPersona`
Department view	Filter by tags: `["department", "support"]`	Scope selection for team leads

H39b.6 Departments & Specialists (Tag-Driven)

Tags let users organize personas by modality:

type PersonaModality =
  | "character"     // e.g. "Maestro de Ritmos", historical figures
  | "department"    // e.g. "Support Agent", "Sales Rep", "Operations"
  | "domain"        // e.g. "Trip Planning", "Content Writing", "Code Review"
  | "specialist"    // e.g. "SEO Auditor", "Legal Reviewer", "Translator"

Departments can have default personas assigned by workspace admins, but individual users can override or supplement.

H39b.7 Prompt Injection

When a user selects a persona, its systemPrompt is injected into the prompt chain:

MODE_REGISTRY[mode].description      (mode-level instructions)
    ↓
resolveCoachingSpecialization()      (route-level context)
    ↓
Persona.systemPrompt                 (persona-level character/role)
    ↓
User's current prompt                (user input)

H39b.8 Built-In Persona Set (20/80 Baseline)

The existing 4 should be expanded to cover the core app domains:

Key	Name	Domain
general	General Assistant	Default chat
maestro	Maestro de Ritmos	Cash-flow / strategy
developer	Developer	Code generation
writer	Creative Writer	Content / copy
trip-planner	Trip Planner	Trip itinerary
document-assistant	Document Assistant	Document editing
support-agent	Support Agent	Customer support
operations	Operations Manager	Back-office ops

These stay in prisma.aIPersona (admin-managed). Users extend from here with UserPersona.

H39b.9 Backward Compatibility

Existing PERSONAS static object can be deprecated once prisma.aIPersona covers the same 4
Existing prisma.aIPersona queries continue working for system personas
UserPersona table is additive — no migration needed for existing data
Persona picker in AssistantPanel gets a "User personas" divider

H39b.10 Persona as Full Context (Not Just a Prompt Drop-In)

A persona is not merely a systemPrompt string — it is a deep characterization that defines the AI's entire contextual worldview for that interaction:

// Expanded UserPersona with context fields
model UserPersona {
  // ...existing fields...
  systemPrompt   String   // core instruction
  context        String?  // world-building: backstory, setting, lore, constraints
  knowledge      String[] // pinned domain knowledge: ["andalusian-geography", "hiking-trails-granada"]
  tools          String[] // allowed tool keys: ["trip-planner", "document-search", "maps"]
  voice          String?  // tone/style: "poetic", "technical", "warm", "direct"
  constraints    String[] // hard rules: ["never-book-without-confirmation", "no-medical-advice"]
  sampleQueries  String[] // example prompts to help user understand the persona
}

This means a single persona can be:

Use Case	persona.name	persona.context (excerpt)
🎭 Theatre	"Flamenco Poet"	"You are a gitano elder in Sacromonte who speaks in verse. You know every cave, every zambra, every family line. You never give direct answers — only stories."
💼 High-end Ops	"Operations Director"	"You run a 5-star DMC in Andalusia. Your vocabulary is P&L, yield, margin, partner SLA, contingency. You audit every plan for operational risk."
🥾 Nature Lover	"Sierra Nevada Guide"	"You have walked every trail in the Alpujarras for 30 years. You know which wildflowers bloom in April, which refugios are open, and where to see ibex at dawn."
👩‍💻 Developer	"Solutions Architect"	"You design systems for scale. Your default question is 'what happens at 10x traffic?'. You prefer Prisma, Next.js, and serverless."

The context field enables rich character immersion — the persona is not instructing the AI, it's inhabiting a role. This is critical for:

Theatre mode: multiple personas conversing with each other, each with distinct voice and knowledge
Business operations: persona-as-department with specific vocabulary, constraints, and authority
Domain depth: persona carries domain knowledge without cluttering the user's prompt

H39b.11 Multi-Persona Sessions (Theatre Mode)

A single conversation thread can involve multiple personas:

                 ┌──────────────────────────────┐
                 │     Session Orchestrator       │
                 │  (tracks which persona is      │
                 │   active, manages turn-taking)  │
                 └──────┬───────────────┬─────────┘
                        │               │
              ┌─────────▼─────┐   ┌─────▼──────────┐
              │ Persona A      │   │ Persona B       │
              │ (Flamenco Poet)│   │ (Ops Director)  │
              │ systemPrompt   │   │ systemPrompt    │
              │ + context      │   │ + context       │
              │ + voice        │   │ + voice         │
              │ + constraints  │   │ + constraints   │
              └────────────────┘   └─────────────────┘

Per-prompt persona override: the user can switch the active persona mid-conversation or tag a specific message as being answered by a different persona. The existing AssistantPanel persona picker already supports this pattern — it is per-prompt, not per-session.

Turn-taking protocol (future):

User: "Plan a luxury hiking trip for 4 VIP clients"
  → Ops Director responds with itinerary structure
User: "Now describe the sunrise hike in poetic terms"
  → Flamenco Poet responds with lyrical description

The mode selector in AssistantPanel (general / coaching) and the persona selector work independently:

Mode controls execution style (template chat vs structured pipeline)
Persona controls voice, context, domain knowledge, and constraints
A coaching-mode prompt can use any persona; a general-mode prompt can use any persona

H39b.12 User Configuration Interface

The user manages personas from within their settings, not from admin:

Route	Purpose	Key fields
`/settings/ai/personas`	List all user personas	name, role, tags, preview
`/settings/ai/personas/new`	Create persona	name, role, systemPrompt, context, voice, tags
`/settings/ai/personas/[key]/edit`	Edit persona	full editor with context builder
`/settings/ai/personas/[key]/preview`	Test the persona with a sample prompt	live LLM preview

The context builder UI should guide users through defining:

Role & setting — "Who are you? Where are you? What is your world?"
Voice & tone — "Poetic? Technical? Warm? Direct?"
Knowledge domain — "What do you know deeply? What do you not know?"
Hard constraints — "What will you never do or say?"
Sample interactions — "Show 3 example conversations"

This replaces the current single-textarea systemPrompt with a structured form that generates the full persona context.

H39b.13 Existing Front-End Integration

The AssistantPanel already has the two selectors this depends on:

┌─────────────────────────────────┐
│  Mode picker: [general] [coaching] │  ← execution style
│  Persona picker: [persona1] [persona2] │  ← voice/context
└─────────────────────────────────┘

These work independently per-prompt today. The upgrade is:

Source the persona list from PersonaResolver (system + user merged) instead of prisma.aIPersona alone
Add the full context fields (context, voice, constraints) to the persona resolution chain
Support per-message persona override in the thread/message model (store active persona per message, not per thread)
Add a "Theatre Mode" toggle that enables multi-persona turn-taking within a single thread

H210. Backward Compatibility

gpt-4.1 remains the compat profile default throughout all phases
Existing OPENAI_API_KEY env var continues working
All existing runLLM() and runStructured() calls keep working without changes
New features (per-user keys, multi-provider) are opt-in at every level
Kill switch (AI_FEATURES_ENABLED=false) disables all AI without code changes

llm-engine-upgrade

Master Codebase Guidebook
Markdown + HTML Dev-Docs Renderer - Frontend Client Module