LLM Integration

LLMUtils Overview

The LLMUtils class in src/utils/llm provides a unified interface for interacting with different LLM providers, supporting both OpenAI and OpenRouter APIs.

// Initialize LLMUtils
import { LLMUtils } from "../utils/llm";
const llmUtils = new LLMUtils();

// Environment variables needed
OPENAI_API_KEY="your-openai-api-key"
OPENROUTER_API_KEY="your-openrouter-api-key"
APP_URL="http://localhost:3000"  // Required for OpenRouter

Text Generation

Generate text responses using different LLM models:

// Basic text generation
const response = await llmUtils.getTextFromLLM(
  prompt,
  "anthropic/claude-3-sonnet"
);

// Streaming responses
await llmUtils.getTextFromLLMStream(
  prompt,
  "anthropic/claude-3-sonnet",
  (token) => {
    // Handle each token as it arrives
    console.log(token);
  }
);

Structured Output

Get structured JSON responses using Zod schemas for type safety:

import { z } from "zod";
import { LLMSize } from "../types";

// Define your schema
const analysisSchema = z.object({
  sentiment: z.string(),
  topics: z.array(z.string()),
  confidence: z.number(),
  summary: z.string()
});

// Get structured response
const analysis = await llmUtils.getObjectFromLLM(
  prompt,
  analysisSchema,
  LLMSize.LARGE
);

// Type-safe access to fields
console.log(analysis.sentiment);
console.log(analysis.topics);

Boolean Decisions

Get simple true/false decisions from the LLM:

// Get boolean response
const shouldRespond = await llmUtils.getBooleanFromLLM(
  "Should the agent respond to this message?",
  LLMSize.SMALL
);

if (shouldRespond) {
  // Handle response
}

Image Analysis

Process images and get text descriptions or structured analysis:

// Get image descriptions
const description = await llmUtils.getImageDescriptions(imageUrls);

// Analyze images with text context
const response = await llmUtils.getTextWithImageFromLLM(
  prompt,
  imageUrls,
  "anthropic/claude-3-sonnet"
);

// Get structured output from images
const analysis = await llmUtils.getObjectFromLLMWithImages(
  prompt,
  analysisSchema,
  imageUrls,
  LLMSize.LARGE
);

Model Selection

LLMSize.SMALL

  • Uses gpt-4o-mini
  • Faster response times
  • Lower cost per request
  • Good for simple decisions

LLMSize.LARGE

  • Uses gpt-4o
  • Better reasoning
  • More nuanced responses
  • Complex analysis tasks

Best Practices

  • Use structured output for predictable responses
  • Stream responses for better user experience
  • Choose appropriate model size for the task
  • Handle API errors gracefully
  • Monitor token usage and costs
  • Cache responses when possible