MolinoPro

llm-engine-upgrade

Master Codebase Guidebook
Markdown + HTML Dev-Docs Renderer - Frontend Client Module

Default Index
Open README.md
Root: README.md_PRD
Milestones
H1LLM Engine Upgrade — Forward-Looking Spec

Status: 📝 Planning Notes (pre-consolidation) Authoritative source for: multi-model, multi-provider, per-user keys, auth gating, usage capping.


H21. Current Architecture (as of 2026-05)
openai.ts  (singleton client, single OPENAI_API_KEY from env)
    ├── runLLM()       → openai.responses.create        (free-text / "template" execution)
    └── runStructured() → openai.responses.parse         (zod-schema / "structured" execution)
                            └── fallback → runLLM()      (when parse fails)

modelRegistry.ts  (5 static profiles mapped to model strings)
    compat   → "gpt-4.1"     (default — preserves existing behaviour)
    frontier → "gpt-5.5"     (complex reasoning / coding / orchestration)
    balanced → "gpt-5.4"     (affordable quality)
    fast     → "gpt-5.4-mini" (low-latency structured prompts)
    cheap    → "gpt-5.4-nano" (drafts / classification)

H22. Desired End State
                          ┌──────────────────────────────┐
                          │     ModelRouter (new)         │
                          │  resolves: provider + model   │
                          │  + API key for each request   │
                          └──────────┬───────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              ▼                      ▼                      ▼
     openai-provider.ts     anthropic-provider.ts     google-provider.ts
     (OpenAI SDK)           (Anthropic SDK)           (Google AI SDK)
              │                      │                      │
              └──────────────────────┴──────────────────────┘
                                     │
                          ┌──────────▼───────────────────┐
                          │  RateLimitInterceptor         │
                          │  UsageTelemetry               │
                          │  AuthGate (free tier cap)     │
                          └──────────────────────────────┘

H23. Multi-Model Selection
H33.1 Resolution Chain (per request)
  1. User-level override (user_settings.apiKey + user_settings.model)
  2. Mode-level default  (MODE_REGISTRY[mode].modelProfile ??)
  3. Surface-level pin   (assistantSurfaceRegistry[surface].modelProfile ??)
  4. System default      (DEFAULT_ASSISTANT_MODEL_PROFILE = "compat")
H33.2 Model Profiles — expandable

Keep the profile abstraction so surfaces reference intent not concrete model names.

// Updated modelRegistry.ts
export type AssistantModelProfile =
  | "compat"        // gpt-4.1 — legacy preservation
  | "frontier"      // gpt-5.5 — reasoning / orchestration
  | "balanced"      // gpt-5.4 — quality + cost balance
  | "fast"          // gpt-5.4-mini — low-latency
  | "cheap"         // gpt-5.4-nano — drafts
  | "user-chosen";  // resolved from UserSettings.modelPreference

export type ModelProvider = "openai" | "anthropic" | "google" | "groq";
H33.3 Provider-Level Config
export type ProviderConfig = {
  provider: ModelProvider;
  apiKeySource: "env" | "user" | "system";
  models: { profile: AssistantModelProfile; modelId: string }[];
};

H24. Per-User API Key Override
H34.1 Data Model (Prisma)
model UserSettings {
  id              String  @id @default(cuid())
  userId          String  @unique
  openaiApiKey    String? // encrypted at rest
  anthropicApiKey String? // encrypted at rest
  googleApiKey    String? // encrypted at rest
  preferredModel  String? // model profile or raw model ID
  aiEnabled       Boolean @default(true)
  usageLimit      Int?    // max requests per period (null = unlimited)
  usagePeriod     String? // "daily" | "monthly" | null
  usageCount      Int     @default(0)
  usageResetAt    DateTime?
}
H34.2 Key Resolution in `getOpenAI()`
export async function getOpenAIForUser(userId?: string | null): Promise<OpenAI> {
  if (userId) {
    const settings = await prisma.userSettings.findUnique({ where: { userId } });
    if (settings?.openaiApiKey) {
      return new OpenAI({ apiKey: decrypt(settings.openaiApiKey) });
    }
  }
  // fall back to system key
  return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
}
H34.3 UI for User Settings

Routes and components to create:

  • /settings/ai — API key management, model preference, enable/disable AI
  • UserAISettingsPanel component — embeddable in any settings page
  • ApiKeyField component — masked input with "test connection" button
  • ModelSelector component — dropdown of available models for the user's provider

H25. Dynamic Model List for Selectors
H35.1 User-Configurable Model Presets

Users should be able to define a list of "favorite" models that appear in a quick-selector:

// Stored in UserSettings.favoriteModels: string[]
// Each entry is a model ID like "gpt-5.4-mini" or "claude-sonnet-5"
H35.2 Surface-Bound Model Pinning

Each assistant surface can optionally pin a model profile:

// assistantSurfaceRegistry entry:
trips: {
  key: "trips",
  defaultMode: "coaching",
  modelProfile: "fast",           // ← NEW: pin to fast model
  structuredModelProfile: "compat", // ← NEW: pin structured calls to compat
  // ...existing fields
}

H26. Auth Gating & Usage Limiting
H36.1 Gating Tiers
TierWhoAI AccessModelRate Limit
0Unauthenticated❌ None
1Unverified email⚠️ Sample (3 prompts)cheap only3 total, then locked
2Verified Google/Email✅ Fulluser's choiceconfigured per plan
3Own API key set✅ Unlimiteduser's keyno platform limit
H36.2 Enforcement Layer

A middleware or interceptor wrapping every LLM call:

export async function withAIGuard<T>(
  userId: string | null,
  fn: () => Promise<T>
): Promise<T> {
  const tier = await resolveUserTier(userId);

  if (tier === "blocked") {
    throw new AIAccessError("AI features require a verified account.");
  }

  if (tier === "sample") {
    const used = await getUsageCount(userId);
    if (used >= 3) {
      throw new AIAccessError("Free sample used. Sign in with a verified account for full access.");
    }
    await incrementUsage(userId);
  }

  return fn();
}
H36.3 Public Site Kill Switch
// In app/lib/ai/featureFlags.ts
export const AI_FEATURES_ENABLED = process.env.AI_FEATURES_ENABLED !== "false";

// Wrap all AI surfaces:
if (!AI_FEATURES_ENABLED) {
  return <AIDisabledBanner />;
}
H36.4 UI Surfaces to Gate
SurfaceCurrentFuture
/assistant assistant panelvisible to allrequire verified user
/trips/[tripId] AI chatvisible to allgate behind auth
/ideasvisible to allgate behind auth
/study-guide/mlvvisible to allgate behind auth
/concept-cards/newvisible to allgate behind auth
Floating dock AI togglevisible to allhide when not authorized

H27. Multi-Provider Integration
H37.1 Abstract Provider Interface
export interface AIProvider {
  name: ModelProvider;
  runLLM(params: LLMParams): Promise<{ output: string }>;
  runStructured(params: StructuredParams): Promise<{ output_text: string; raw: unknown; fallback: boolean }>;
  isAvailable(userId?: string): Promise<boolean>;
  listModels(): string[];
}
H37.2 Provider Implementations
ProviderSDKModelsWhen to use
OpenAIopenaigpt-4.1, gpt-4.1-mini, gpt-5.4 series, gpt-5.5Default / primary
Anthropic@anthropic-ai/sdkclaude-sonnet-5, claude-haiku-5Long context, safety-critical
Google@google/generative-aigemini-2.5-pro, gemini-2.5-flashVision, multimodal
Groqgroq-sdkllama-4, mixtralUltra-low latency inference
H37.3 Routing Between Providers

Use MODE_REGISTRY or surface config to pick provider per use case:

// Example: coaching mode uses OpenAI frontier for orchestration
// but Claude for content generation when context > 32K tokens
coaching: {
  ...,
  providers: {
    orchestration: { provider: "openai", profile: "frontier" },
    content: { provider: "anthropic", profile: "balanced" },
  }
}

H28. Rate Limiting & Telemetry
H38.1 Usage Tracking
model AIDailyUsage {
  id        String   @id @default(cuid())
  userId    String
  date      DateTime @db(Date)
  count     Int      @default(0)
  tokensIn  Int      @default(0)
  tokensOut Int      @default(0)
  cost      Float    @default(0)
  @@unique([userId, date])
}
H38.2 Interceptor Pattern
export async function trackUsage(
  userId: string,
  model: string,
  tokensIn: number,
  tokensOut: number
): Promise<void> {
  const today = new Date().toISOString().split("T")[0];
  await prisma.aIDailyUsage.upsert({
    where: { userId_date: { userId, date: today } },
    create: { userId, date: today, count: 1, tokensIn, tokensOut },
    update: { count: { increment: 1 }, tokensIn: { increment: tokensIn }, tokensOut: { increment: tokensOut } },
  });
}

H29. Phased Rollout
H3Phase A — Consolidation (now → next sprint)
  • Unify 5 prompt paths → 1 router
  • Remove dead code (legacy/, duplicate API routes)
  • Make coaching route through resolveCoachingSpecialization().pipeline
  • Wire IdeaService as structured entity generator for coaching sub-pipelines
H3Phase B — Auth Gating (next)
  • Add UserSettings model + migration
  • Add /settings/ai route + components
  • Implement withAIGuard() interceptor
  • Gate all AI surfaces
H3Phase C — Multi-Model (next + 1)
  • Expand modelRegistry.ts with provider-aware profiles
  • Implement per-user key override in openai.ts
  • Add ModelSelector component
  • Wire per-surface model pinning
H3Phase D — Multi-Provider (future)
  • Abstract AIProvider interface
  • Implement Anthropic provider
  • Implement Google provider
  • Dynamic provider routing per mode/surface
H3Phase E — Telemetry & Rate Limiting (future)
  • Add AIDailyUsage tracking
  • Implement cost estimation
  • Usage dashboard for admin

H29b. User-Configurable Personas

Users need to create their own personas — character roles, department specialists, domain experts — that add to the existing built-in set. The built-in set covers the ~20/80 base domain needs; user-created ones fill the long tail.

H39b.1 Current Persona Infrastructure

Two parallel systems exist today:

SourceStorageScopeUsed by
app/assistant/actions/personas.tsStatic PERSONAS object (4 hardcoded: maestro, general, developer, writer)Global codeplanMlv.ts, fetchMlvContext
prisma.aIPersona tableDatabase rows (key, name, role, systemPrompt)Global DBrunDevLayerAction(), runAIPrompt(), AssistantPanel persona picker

Both are admin-controlled — users cannot add to either.

H39b.2 Target Architecture
UserPersona (Prisma model) — user-owned, soft-merge with built-in at runtime
    ↓
PersonaResolver (new) — merges built-in + user personas into a single catalogue
    ↓
AssistantPanel / Prompt Router — user selects from merged list
H39b.3 Prisma Model
model UserPersona {
  id           String   @id @default(cuid())
  userId       String
  key          String   // user-scoped slug, e.g. "support-agent"
  name         String
  role         String?  // short descriptor, e.g. "Customer Support Specialist"
  systemPrompt String   // the full system prompt for this persona
  avatar       String?  // emoji or URL
  tags         String[] // for filtering: ["department", "character", "domain", ...]
  isPublic     Boolean  @default(false) // share with team?
  createdAt    DateTime @default(now())
  updatedAt    DateTime @updatedAt

  @@unique([userId, key])
}
H39b.4 Persona Resolution
// app/assistant/personas/personaResolver.ts

export async function resolvePersonaCatalogue(userId?: string | null) {
  // 1. Built-in personas from DB (AIPersona table)
  const builtIn = await prisma.aIPersona.findMany();

  // 2. User-created personas
  const userPersonas: UserPersona[] = userId
    ? await prisma.userPersona.findMany({ where: { userId } })
    : [];

  // 3. Merge — user personas keyed as "user:key" to avoid collisions
  const catalogue = [
    ...builtIn.map(p => ({ ...p, source: "system" as const })),
    ...userPersonas.map(p => ({ ...p, key: `user:${p.key}`, source: "user" as const })),
  ];

  return catalogue;
}
H39b.5 UI Surfaces
SurfaceWhat to showAction
/settings/ai persona tabList of user's personas, create/edit/deleteCRUD on UserPersona
AssistantPanel persona pickerMerged list (system + user: prefixed)Select persona → loads systemPrompt
/personas/new routeForm: name, role, system prompt, tags, avatarCreate UserPersona
/personas/[key]/edit routeEdit formUpdate UserPersona
Department viewFilter by tags: ["department", "support"]Scope selection for team leads
H39b.6 Departments & Specialists (Tag-Driven)

Tags let users organize personas by modality:

type PersonaModality =
  | "character"     // e.g. "Maestro de Ritmos", historical figures
  | "department"    // e.g. "Support Agent", "Sales Rep", "Operations"
  | "domain"        // e.g. "Trip Planning", "Content Writing", "Code Review"
  | "specialist"    // e.g. "SEO Auditor", "Legal Reviewer", "Translator"

Departments can have default personas assigned by workspace admins, but individual users can override or supplement.

H39b.7 Prompt Injection

When a user selects a persona, its systemPrompt is injected into the prompt chain:

MODE_REGISTRY[mode].description      (mode-level instructions)
    ↓
resolveCoachingSpecialization()      (route-level context)
    ↓
Persona.systemPrompt                 (persona-level character/role)
    ↓
User's current prompt                (user input)
H39b.8 Built-In Persona Set (20/80 Baseline)

The existing 4 should be expanded to cover the core app domains:

KeyNameDomain
generalGeneral AssistantDefault chat
maestroMaestro de RitmosCash-flow / strategy
developerDeveloperCode generation
writerCreative WriterContent / copy
trip-plannerTrip PlannerTrip itinerary
document-assistantDocument AssistantDocument editing
support-agentSupport AgentCustomer support
operationsOperations ManagerBack-office ops

These stay in prisma.aIPersona (admin-managed). Users extend from here with UserPersona.

H39b.9 Backward Compatibility
  • Existing PERSONAS static object can be deprecated once prisma.aIPersona covers the same 4
  • Existing prisma.aIPersona queries continue working for system personas
  • UserPersona table is additive — no migration needed for existing data
  • Persona picker in AssistantPanel gets a "User personas" divider
H39b.10 Persona as Full Context (Not Just a Prompt Drop-In)

A persona is not merely a systemPrompt string — it is a deep characterization that defines the AI's entire contextual worldview for that interaction:

// Expanded UserPersona with context fields
model UserPersona {
  // ...existing fields...
  systemPrompt   String   // core instruction
  context        String?  // world-building: backstory, setting, lore, constraints
  knowledge      String[] // pinned domain knowledge: ["andalusian-geography", "hiking-trails-granada"]
  tools          String[] // allowed tool keys: ["trip-planner", "document-search", "maps"]
  voice          String?  // tone/style: "poetic", "technical", "warm", "direct"
  constraints    String[] // hard rules: ["never-book-without-confirmation", "no-medical-advice"]
  sampleQueries  String[] // example prompts to help user understand the persona
}

This means a single persona can be:

Use Casepersona.namepersona.context (excerpt)
🎭 Theatre"Flamenco Poet""You are a gitano elder in Sacromonte who speaks in verse. You know every cave, every zambra, every family line. You never give direct answers — only stories."
💼 High-end Ops"Operations Director""You run a 5-star DMC in Andalusia. Your vocabulary is P&L, yield, margin, partner SLA, contingency. You audit every plan for operational risk."
🥾 Nature Lover"Sierra Nevada Guide""You have walked every trail in the Alpujarras for 30 years. You know which wildflowers bloom in April, which refugios are open, and where to see ibex at dawn."
👩‍💻 Developer"Solutions Architect""You design systems for scale. Your default question is 'what happens at 10x traffic?'. You prefer Prisma, Next.js, and serverless."

The context field enables rich character immersion — the persona is not instructing the AI, it's inhabiting a role. This is critical for:

  • Theatre mode: multiple personas conversing with each other, each with distinct voice and knowledge
  • Business operations: persona-as-department with specific vocabulary, constraints, and authority
  • Domain depth: persona carries domain knowledge without cluttering the user's prompt
H39b.11 Multi-Persona Sessions (Theatre Mode)

A single conversation thread can involve multiple personas:

                 ┌──────────────────────────────┐
                 │     Session Orchestrator       │
                 │  (tracks which persona is      │
                 │   active, manages turn-taking)  │
                 └──────┬───────────────┬─────────┘
                        │               │
              ┌─────────▼─────┐   ┌─────▼──────────┐
              │ Persona A      │   │ Persona B       │
              │ (Flamenco Poet)│   │ (Ops Director)  │
              │ systemPrompt   │   │ systemPrompt    │
              │ + context      │   │ + context       │
              │ + voice        │   │ + voice         │
              │ + constraints  │   │ + constraints   │
              └────────────────┘   └─────────────────┘

Per-prompt persona override: the user can switch the active persona mid-conversation or tag a specific message as being answered by a different persona. The existing AssistantPanel persona picker already supports this pattern — it is per-prompt, not per-session.

Turn-taking protocol (future):

User: "Plan a luxury hiking trip for 4 VIP clients"
  → Ops Director responds with itinerary structure
User: "Now describe the sunrise hike in poetic terms"
  → Flamenco Poet responds with lyrical description

The mode selector in AssistantPanel (general / coaching) and the persona selector work independently:

  • Mode controls execution style (template chat vs structured pipeline)
  • Persona controls voice, context, domain knowledge, and constraints
  • A coaching-mode prompt can use any persona; a general-mode prompt can use any persona
H39b.12 User Configuration Interface

The user manages personas from within their settings, not from admin:

RoutePurposeKey fields
/settings/ai/personasList all user personasname, role, tags, preview
/settings/ai/personas/newCreate personaname, role, systemPrompt, context, voice, tags
/settings/ai/personas/[key]/editEdit personafull editor with context builder
/settings/ai/personas/[key]/previewTest the persona with a sample promptlive LLM preview

The context builder UI should guide users through defining:

  1. Role & setting — "Who are you? Where are you? What is your world?"
  2. Voice & tone — "Poetic? Technical? Warm? Direct?"
  3. Knowledge domain — "What do you know deeply? What do you not know?"
  4. Hard constraints — "What will you never do or say?"
  5. Sample interactions — "Show 3 example conversations"

This replaces the current single-textarea systemPrompt with a structured form that generates the full persona context.

H39b.13 Existing Front-End Integration

The AssistantPanel already has the two selectors this depends on:

┌─────────────────────────────────┐
│  Mode picker: [general] [coaching] │  ← execution style
│  Persona picker: [persona1] [persona2] │  ← voice/context
└─────────────────────────────────┘

These work independently per-prompt today. The upgrade is:

  1. Source the persona list from PersonaResolver (system + user merged) instead of prisma.aIPersona alone
  2. Add the full context fields (context, voice, constraints) to the persona resolution chain
  3. Support per-message persona override in the thread/message model (store active persona per message, not per thread)
  4. Add a "Theatre Mode" toggle that enables multi-persona turn-taking within a single thread

H210. Backward Compatibility
  • gpt-4.1 remains the compat profile default throughout all phases
  • Existing OPENAI_API_KEY env var continues working
  • All existing runLLM() and runStructured() calls keep working without changes
  • New features (per-user keys, multi-provider) are opt-in at every level
  • Kill switch (AI_FEATURES_ENABLED=false) disables all AI without code changes