On this page Извлекайте структурированные данные из ответов LLM с валидацией Pydantic, автоматически повторяйте неудачные извлечения, разбирайте сложный JSON с типобезопасностью и транслируйте частичные результаты с помощью Instructor — проверенной в боях библиотеки для структурированного вывода
Skill metadata¶
|---|---
Source| Опциональный — установка hermes skills install official/mlops/instructor
Path| optional-skills/mlops/instructor
Version| 1.0.0
Author| Orchestra Research
License| MIT
Dependencies| instructor, pydantic, openai, anthropic
Tags| Prompt Engineering, Instructor, Structured Output, Pydantic, Data Extraction, JSON Parsing, Type Safety, Validation, Streaming, OpenAI, Anthropic
Reference: full SKILL.md¶
info Ниже приведено полное описание навыка, которое Hermes загружает при его активации. Это те инструкции, которые видит агент, когда навык активен.
Instructor: Структурированные выводы LLM¶
When to Use This Skill¶
Используйте Instructor, когда вам нужно: * Надёжно извлекать структурированные данные из ответов LLM * Автоматически валидировать выводы по схемам Pydantic * Повторять неудачные извлечения с автоматической обработкой ошибок * Разбирать сложный JSON с типобезопасностью и валидацией * Транслировать частичные результаты для обработки в реальном времени * Поддерживать нескольких LLM-провайдеров с единым API
GitHub Stars : 15 000+ | Проверено в бою : 100 000+ разработчиков
Installation¶
[code]
# Base installation
pip install instructor
# With specific providers
pip install "instructor[anthropic]" # Anthropic Claude
pip install "instructor[openai]" # OpenAI
pip install "instructor[all]" # All providers
[/code]
Quick Start¶
Basic Example: Extract User Data¶
[code]
import instructor
from pydantic import BaseModel
from anthropic import Anthropic
# Define output structure
class User(BaseModel):
name: str
age: int
email: str
# Create instructor client
client = instructor.from_anthropic(Anthropic())
# Extract structured data
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John Doe is 30 years old. His email is john@example.com"
}],
response_model=User
)
print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "john@example.com"
[/code]
With OpenAI¶
[code] from openai import OpenAI
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]
)
[/code]
Core Concepts¶
1\. Response Models (Pydantic)¶
Модели ответов определяют структуру и правила валидации для выводов LLM.
Basic Model¶
[code] from pydantic import BaseModel, Field
class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of relevant tags")
article = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Analyze this article: [article text]"
}],
response_model=Article
)
[/code] Преимущества: * Типобезопасность благодаря подсказкам типов Python * Автоматическая валидация (word_count > 0) * Самодокументируемость с описаниями Field * Поддержка автодополнения в IDE
Nested Models¶
[code]
class Address(BaseModel):
street: str
city: str
country: str
class Person(BaseModel):
name: str
age: int
address: Address # Nested model
person = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "John lives at 123 Main St, Boston, USA"
}],
response_model=Person
)
print(person.address.city) # "Boston"
[/code]
Optional Fields¶
[code] from typing import Optional
class Product(BaseModel):
name: str
price: float
discount: Optional[float] = None # Optional
description: str = Field(default="No description") # Default value
# LLM doesn't need to provide discount or description
[/code]
Enums for Constraints¶
[code] from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Review(BaseModel):
text: str
sentiment: Sentiment # Only these 3 values allowed
review = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "This product is amazing!"
}],
response_model=Review
)
print(review.sentiment) # Sentiment.POSITIVE
[/code]
2\. Validation¶
Pydantic автоматически валидирует выводы LLM. Если валидация не проходит, Instructor повторяет попытку.
Built-in Validators¶
[code] from pydantic import Field, EmailStr, HttpUrl
class Contact(BaseModel):
name: str = Field(min_length=2, max_length=100)
age: int = Field(ge=0, le=120) # 0 <= age <= 120
email: EmailStr # Validates email format
website: HttpUrl # Validates URL format
# If LLM provides invalid data, Instructor retries automatically
[/code]
Custom Validators¶
[code] from pydantic import field_validator
class Event(BaseModel):
name: str
date: str
attendees: int
@field_validator('date')
def validate_date(cls, v):
"""Ensure date is in YYYY-MM-DD format."""
import re
if not re.match(r'\d{4}-\d{2}-\d{2}', v):
raise ValueError('Date must be YYYY-MM-DD format')
return v
@field_validator('attendees')
def validate_attendees(cls, v):
"""Ensure positive attendees."""
if v < 1:
raise ValueError('Must have at least 1 attendee')
return v
[/code]
Model-Level Validation¶
[code] from pydantic import model_validator
class DateRange(BaseModel):
start_date: str
end_date: str
@model_validator(mode='after')
def check_dates(self):
"""Ensure end_date is after start_date."""
from datetime import datetime
start = datetime.strptime(self.start_date, '%Y-%m-%d')
end = datetime.strptime(self.end_date, '%Y-%m-%d')
if end < start:
raise ValueError('end_date must be after start_date')
return self
[/code]
3\. Automatic Retrying¶
Instructor автоматически повторяет попытки при неудачной валидации, передавая обратную связь об ошибке LLM.
[code]
# Retries up to 3 times if validation fails
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Extract user from: John, age unknown"
}],
response_model=User,
max_retries=3 # Default is 3
)
# If age can't be extracted, Instructor tells the LLM:
# "Validation error: age - field required"
# LLM tries again with better extraction
[/code] Как это работает: 1. LLM генерирует вывод 2. Pydantic выполняет валидацию 3. Если недействительно: сообщение об ошибке отправляется обратно LLM 4. LLM пробует снова с обратной связью об ошибке 5. Повторяется до max_retries раз
4\. Streaming¶
Транслируйте частичные результаты для обработки в реальном времени.
Streaming Partial Objects¶
[code] from instructor import Partial
class Story(BaseModel):
title: str
content: str
tags: list[str]
# Stream partial updates as LLM generates
for partial_story in client.messages.create_partial(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Write a short sci-fi story"
}],
response_model=Story
):
print(f"Title: {partial_story.title}")
print(f"Content so far: {partial_story.content[:100]}...")
# Update UI in real-time
[/code]
Streaming Iterables¶
[code]
class Task(BaseModel):
title: str
priority: str
# Stream list items as they're generated
tasks = client.messages.create_iterable(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Generate 10 project tasks"
}],
response_model=Task
)
for task in tasks:
print(f"- {task.title} ({task.priority})")
# Process each task as it arrives
[/code]
Provider Configuration¶
Anthropic Claude¶
[code]
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(
Anthropic(api_key="your-api-key")
)
# Use with Claude models
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
[/code]
OpenAI¶
[code] from openai import OpenAI
client = instructor.from_openai(
OpenAI(api_key="your-api-key")
)
response = client.chat.completions.create(
model="gpt-4o-mini",
response_model=YourModel,
messages=[...]
)
[/code]
Local Models (Ollama)¶
[code] from openai import OpenAI
# Point to local Ollama server
client = instructor.from_openai(
OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Required but ignored
),
mode=instructor.Mode.JSON
)
response = client.chat.completions.create(
model="llama3.1",
response_model=YourModel,
messages=[...]
)
[/code]
Common Patterns¶
Pattern 1: Data Extraction from Text¶
[code]
class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int
headquarters: str
text = """
Tesla, Inc. was founded in 2003. It operates in the automotive and energy
industry with approximately 140,000 employees. The company is headquartered
in Austin, Texas.
"""
company = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract company information from: {text}"
}],
response_model=CompanyInfo
)
[/code]
Pattern 2: Classification¶
[code]
class Category(str, Enum):
TECHNOLOGY = "technology"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
OTHER = "other"
class ArticleClassification(BaseModel):
category: Category
confidence: float = Field(ge=0.0, le=1.0)
keywords: list[str]
classification = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Classify this article: [article text]"
}],
response_model=ArticleClassification
)
[/code]
Pattern 3: Multi-Entity Extraction¶
[code]
class Person(BaseModel):
name: str
role: str
class Organization(BaseModel):
name: str
industry: str
class Entities(BaseModel):
people: list[Person]
organizations: list[Organization]
locations: list[str]
text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."
entities = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract all entities from: {text}"
}],
response_model=Entities
)
for person in entities.people:
print(f"{person.name} - {person.role}")
[/code]
Pattern 4: Structured Analysis¶
[code]
class SentimentAnalysis(BaseModel):
overall_sentiment: Sentiment
positive_aspects: list[str]
negative_aspects: list[str]
suggestions: list[str]
score: float = Field(ge=-1.0, le=1.0)
review = "The product works well but setup was confusing..."
analysis = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Analyze this review: {review}"
}],
response_model=SentimentAnalysis
)
[/code]
Pattern 5: Batch Processing¶
[code]
def extract_person(text: str) -> Person:
return client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract person from: {text}"
}],
response_model=Person
)
texts = [
"John Doe is a 30-year-old engineer",
"Jane Smith, 25, works in marketing",
"Bob Johnson, age 40, software developer"
]
people = [extract_person(text) for text in texts]
[/code]
Advanced Features¶
Union Types¶
[code] from typing import Union
class TextContent(BaseModel):
type: str = "text"
content: str
class ImageContent(BaseModel):
type: str = "image"
url: HttpUrl
caption: str
class Post(BaseModel):
title: str
content: Union[TextContent, ImageContent] # Either type
# LLM chooses appropriate type based on content
[/code]
Dynamic Models¶
[code] from pydantic import create_model
# Create model at runtime
DynamicUser = create_model(
'User',
name=(str, ...),
age=(int, Field(ge=0)),
email=(EmailStr, ...)
)
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=DynamicUser
)
[/code]
Custom Modes¶
[code]
# For providers without native structured outputs
client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.JSON # JSON mode
)
# Available modes:
# - Mode.ANTHROPIC_TOOLS (recommended for Claude)
# - Mode.JSON (fallback)
# - Mode.TOOLS (OpenAI tools)
[/code]
Context Management¶
[code]
# Single-use client
with instructor.from_anthropic(Anthropic()) as client:
result = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
# Client closed automatically
[/code]
Error Handling¶
Handling Validation Errors¶
[code] from pydantic import ValidationError
try:
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=User,
max_retries=3
)
except ValidationError as e:
print(f"Failed after retries: {e}")
# Handle gracefully
except Exception as e:
print(f"API error: {e}")
[/code]
Custom Error Messages¶
[code]
class ValidatedUser(BaseModel):
name: str = Field(description="Full name, 2-100 characters")
age: int = Field(description="Age between 0 and 120", ge=0, le=120)
email: EmailStr = Field(description="Valid email address")
class Config:
# Custom error messages
json_schema_extra = {
"examples": [
{
"name": "John Doe",
"age": 30,
"email": "john@example.com"
}
]
}
[/code]
Best Practices¶
1\. Clear Field Descriptions¶
[code]
# ❌ Bad: Vague
class Product(BaseModel):
name: str
price: float
# ✅ Good: Descriptive
class Product(BaseModel):
name: str = Field(description="Product name from the text")
price: float = Field(description="Price in USD, without currency symbol")
[/code]
2\. Use Appropriate Validation¶
[code]
# ✅ Good: Constrain values
class Rating(BaseModel):
score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")
review: str = Field(min_length=10, description="Review text, at least 10 chars")
[/code]
3\. Provide Examples in Prompts¶
[code]
messages = [{
"role": "user",
"content": """Extract person info from: "John, 30, engineer"
Example format:
{
"name": "John Doe",
"age": 30,
"occupation": "engineer"
}"""
}]
[/code]
4\. Use Enums for Fixed Categories¶
[code]
# ✅ Good: Enum ensures valid values
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
class Application(BaseModel):
status: Status # LLM must choose from enum
[/code]
5\. Handle Missing Data Gracefully¶
[code]
class PartialData(BaseModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default_value"
# LLM only needs to provide required_field
[/code]
Comparison to Alternatives¶
| Возможность | Instructor | Ручной JSON | LangChain | DSPy |
|---|---|---|---|---|
| Типобезопасность | ✅ Да | ❌ Нет | ⚠️ Частично | ✅ Да |
| Автовалидация | ✅ Да | ❌ Нет | ❌ Нет | ⚠️ Ограничено |
| Автоповтор | ✅ Да | ❌ Нет | ❌ Нет | ✅ Да |
| Стриминг | ✅ Да | ❌ Нет | ✅ Да | ❌ Нет |
| Мультипровайдерность | ✅ Да | ⚠️ Вручную | ✅ Да | ✅ Да |
| Кривая обучения | Низкая | Низкая | Средняя | Высокая |
| Когда выбирать Instructor: | ||||
| * Нужны структурированные проверенные выводы | ||||
| * Нужна типобезопасность и поддержка IDE | ||||
| * Требуются автоматические повторные попытки | ||||
| * Создание систем извлечения данных |
Когда выбирать альтернативы: * DSPy: Нужна оптимизация промптов * LangChain: Построение сложных цепочек * Вручную: Простые разовые извлечения
Resources¶
- Документация : https://python.useinstructor.com
- GitHub : https://github.com/jxnl/instructor (15k+ звёзд)
- Поваренная книга : https://python.useinstructor.com/examples
- Discord : Доступна поддержка сообщества
See Also¶
references/validation.md- Продвинутые паттерны валидацииreferences/providers.md- Конфигурация для конкретных провайдеров-
references/examples.md- Примеры из реальной жизни - Reference: full SKILL.md
- When to Use This Skill
- Installation
- Quick Start
- Core Concepts
- Provider Configuration
- Common Patterns
- Advanced Features
- Error Handling
- Best Practices
- Comparison to Alternatives
- Resources
- See Also