AI Chat

The Broomva chat at broomva.tech/chat is a production-grade AI conversation interface built on Next.js 16 with the Vercel AI SDK v6. It supports multiple model providers, persistent memory, tool integration, and rich content rendering.

Model support

The chat supports models from multiple providers through a unified interface:

Provider	Models	Notes
Anthropic	Claude Opus 4, Sonnet 4, Haiku	Primary provider, full tool use support, extended thinking
OpenAI	GPT-4o, GPT-4o-mini, o1, o3	Function calling, vision, reasoning models
Google	Gemini 2.5 Pro, Flash	Multimodal, long context (1M+ tokens)
OpenRouter	100+ models	Access to Llama, Mistral, DeepSeek, and community models
Ollama	Any local model	Self-hosted, no API key needed, zero credit cost

Model selection is available from the dropdown at the top of the chat interface. Different models have different capabilities and costs -- see Billing for credit consumption rates.

Free-tier users have access to community models through OpenRouter and self-hosted models through Ollama. Pro and Team plans unlock access to Claude, GPT-4o, Gemini, and all premium models.

Model routing

The platform uses the AI SDK v6 customProvider abstraction to route model requests. Each provider is configured with its own API keys and rate limits. The chat application calls streamText() with the selected model and automatically handles provider-specific differences (tool call formats, system prompt placement, streaming protocols).

When a model is unavailable or returns an error, the UI surfaces the error inline in the conversation rather than failing silently. Users can retry with a different model without losing their message.

Conversations

Every conversation is automatically persisted to your account via Drizzle ORM and a PostgreSQL database. The sidebar shows your conversation history with:

Search -- full-text search across all conversations
Projects -- organize conversations into project folders
Sharing -- generate public links for individual conversations
Branching -- edit a previous message and explore an alternative path without losing the original thread

Conversation lifecycle

When you send a message, the following sequence occurs:

The message is persisted to the database with a chatId (generated if new)
The full message history for that chatId is loaded
If memory vault is enabled, relevant memories are injected as system context
The message array is sent to the selected model provider via streamText()
The streaming response is relayed to the client via Server-Sent Events
On completion, the assistant message and usage metadata are persisted

Each message records token counts (input and output), the model used, and a timestamp. This data feeds into the usage analytics visible in the console.

Rich content rendering

The chat renders AI responses with full formatting support:

Markdown -- headings, lists, bold, italic, links
Code blocks -- syntax highlighting for 50+ languages via Shiki, with copy-to-clipboard
Mathematics -- LaTeX rendering for inline ( $...$ ) and display ($$...$$) math
Diagrams -- Mermaid diagram rendering for flowcharts, sequence diagrams, and more
Tables -- GFM-style tables with proper alignment
Images -- inline image rendering and generation

All rendering is handled by the Streamdown library, which processes streaming markdown tokens in real time as the model generates its response. This means code blocks are syntax-highlighted as they stream in, not after the response completes.

Memory vault

The memory vault gives the AI persistent context across conversations. When enabled, the platform maintains a knowledge graph backed by the Lago persistence substrate:

Automatic extraction -- key facts, preferences, and decisions are extracted from conversations
Cross-session recall -- the AI can reference information from previous conversations
User control -- you can view, edit, and delete stored memories from the settings panel

Memory is scoped to your user account and, optionally, to your organization. Organization-level memories are shared across all members.

How memory works

Lago stores memories as events in its append-only journal. Each memory event contains:

The extracted fact or preference
A relevance score
The source conversation ID
A timestamp

When a new conversation starts, the system queries the knowledge index for memories relevant to the current context. These are injected as system context before the first model call, giving the AI awareness of prior interactions.

Memory is opt-in. You can enable or disable it at any time from the settings panel. Disabling memory does not delete existing memories -- it only stops the system from reading or writing them.

Deep research mode

Deep research mode enables the AI to perform multi-step investigation using web search and document analysis. When activated, the AI will:

Break your question into sub-queries
Search the web using Tavily for relevant sources
Read and analyze the retrieved documents
Synthesize findings into a comprehensive answer with citations

This is useful for questions that require current information or cross-referencing multiple sources. Deep research consumes more credits than standard chat because it involves multiple model calls and tool invocations per query.

Research flow

The deep research pipeline uses the AI SDK's tool-calling interface to orchestrate a multi-step process:

User question
  → Query decomposition (1 model call)
  → Web search per sub-query (Tavily API)
  → Document retrieval and analysis (1 model call per source)
  → Synthesis with citations (1 model call)
  → Rendered response with source links

Each step is visible in the UI as a tool-call event, so you can see exactly what the AI is searching for and reading.

MCP tool integration

The chat supports the Model Context Protocol (MCP) for connecting external tools and services. MCP allows the AI to:

Read and write files in connected repositories
Query databases and APIs
Execute code in sandboxed environments
Interact with external services (Slack, GitHub, Linear, etc.)

MCP servers can be configured per-user or per-organization from the settings panel. The platform uses the @ai-sdk/mcp adapter to bridge MCP tools into the AI SDK's tool-calling interface.

Tool execution in the UI

When the AI invokes a tool, the chat UI renders:

A tool-call card showing the tool name, arguments, and a loading state
A tool-result card showing the returned data
The AI's follow-up response that incorporates the tool result

Tool calls are streamed in real time using the AI SDK's UiPart event format. The stream emits tool-call and tool-result events alongside text-delta events, allowing the UI to render tools and text interleaved.

Attachments

You can upload files directly into the conversation:

Images -- PNG, JPG, WebP, GIF (analyzed by vision-capable models)
Documents -- PDF, TXT, CSV, MD (content extracted and included in context)
Code files -- any text-based file (syntax-highlighted in the message)

Files are stored in Vercel Blob storage and referenced by the AI during the conversation. Image attachments are automatically compressed on the client side before upload.

Settings

Chat settings are available from the gear icon in the sidebar:

Default model -- set your preferred model for new conversations
System prompt -- customize the default system instructions
Memory -- enable/disable the memory vault
Theme -- the interface uses the Arcan Glass design system with dark mode as the default

AI Chat

On this page