On this page Извлекайте структурированные данные из ответов LLM с валидацией Pydantic, автоматически повторяйте неудачные извлечения, разбирайте сложный JSON с типобезопасностью и транслируйте частичные результаты с помощью Instructor — проверенной в боях библиотеки для структурированного вывода

Skill metadata¶

|---|---
Source| Опциональный — установка hermes skills install official/mlops/instructor
Path| optional-skills/mlops/instructor
Version| 1.0.0
Author| Orchestra Research
License| MIT
Dependencies| instructor, pydantic, openai, anthropic
Tags| Prompt Engineering, Instructor, Structured Output, Pydantic, Data Extraction, JSON Parsing, Type Safety, Validation, Streaming, OpenAI, Anthropic

Reference: full SKILL.md¶

info Ниже приведено полное описание навыка, которое Hermes загружает при его активации. Это те инструкции, которые видит агент, когда навык активен.

Instructor: Структурированные выводы LLM¶

When to Use This Skill¶

Используйте Instructor, когда вам нужно: * Надёжно извлекать структурированные данные из ответов LLM * Автоматически валидировать выводы по схемам Pydantic * Повторять неудачные извлечения с автоматической обработкой ошибок * Разбирать сложный JSON с типобезопасностью и валидацией * Транслировать частичные результаты для обработки в реальном времени * Поддерживать нескольких LLM-провайдеров с единым API

GitHub Stars : 15 000+ | Проверено в бою : 100 000+ разработчиков

Installation¶

[code] # Base installation
pip install instructor

# With specific providers  
pip install "instructor[anthropic]"  # Anthropic Claude  
pip install "instructor[openai]"     # OpenAI  
pip install "instructor[all]"        # All providers

[/code]

Quick Start¶

Basic Example: Extract User Data¶

[code] import instructor
from pydantic import BaseModel
from anthropic import Anthropic

# Define output structure  
class User(BaseModel):  
    name: str  
    age: int  
    email: str

# Create instructor client  
client = instructor.from_anthropic(Anthropic())

# Extract structured data  
user = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "John Doe is 30 years old. His email is john@example.com"  
    }],  
    response_model=User  
)

print(user.name)   # "John Doe"  
print(user.age)    # 30  
print(user.email)  # "john@example.com"

[/code]

With OpenAI¶

[code] from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(  
    model="gpt-4o-mini",  
    response_model=User,  
    messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]  
)

[/code]

Core Concepts¶

1\. Response Models (Pydantic)¶

Модели ответов определяют структуру и правила валидации для выводов LLM.

Basic Model¶

[code] from pydantic import BaseModel, Field

class Article(BaseModel):  
    title: str = Field(description="Article title")  
    author: str = Field(description="Author name")  
    word_count: int = Field(description="Number of words", gt=0)  
    tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "Analyze this article: [article text]"  
    }],  
    response_model=Article  
)

[/code] Преимущества: * Типобезопасность благодаря подсказкам типов Python * Автоматическая валидация (word_count > 0) * Самодокументируемость с описаниями Field * Поддержка автодополнения в IDE

Nested Models¶

[code] class Address(BaseModel):
street: str
city: str
country: str

class Person(BaseModel):  
    name: str  
    age: int  
    address: Address  # Nested model

person = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "John lives at 123 Main St, Boston, USA"  
    }],  
    response_model=Person  
)

print(person.address.city)  # "Boston"

[/code]

Optional Fields¶

[code] from typing import Optional

class Product(BaseModel):  
    name: str  
    price: float  
    discount: Optional[float] = None  # Optional  
    description: str = Field(default="No description")  # Default value

# LLM doesn't need to provide discount or description

[/code]

Enums for Constraints¶

[code] from enum import Enum

class Sentiment(str, Enum):  
    POSITIVE = "positive"  
    NEGATIVE = "negative"  
    NEUTRAL = "neutral"

class Review(BaseModel):  
    text: str  
    sentiment: Sentiment  # Only these 3 values allowed

review = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "This product is amazing!"  
    }],  
    response_model=Review  
)

print(review.sentiment)  # Sentiment.POSITIVE

[/code]

2\. Validation¶

Pydantic автоматически валидирует выводы LLM. Если валидация не проходит, Instructor повторяет попытку.

Built-in Validators¶

[code] from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):  
    name: str = Field(min_length=2, max_length=100)  
    age: int = Field(ge=0, le=120)  # 0 <= age <= 120  
    email: EmailStr  # Validates email format  
    website: HttpUrl  # Validates URL format

# If LLM provides invalid data, Instructor retries automatically

[/code]

Custom Validators¶

[code] from pydantic import field_validator

class Event(BaseModel):  
    name: str  
    date: str  
    attendees: int

    @field_validator('date')  
    def validate_date(cls, v):  
        """Ensure date is in YYYY-MM-DD format."""  
        import re  
        if not re.match(r'\d{4}-\d{2}-\d{2}', v):  
            raise ValueError('Date must be YYYY-MM-DD format')  
        return v

    @field_validator('attendees')  
    def validate_attendees(cls, v):  
        """Ensure positive attendees."""  
        if v < 1:  
            raise ValueError('Must have at least 1 attendee')  
        return v

[/code]

Model-Level Validation¶

[code] from pydantic import model_validator

class DateRange(BaseModel):  
    start_date: str  
    end_date: str

    @model_validator(mode='after')  
    def check_dates(self):  
        """Ensure end_date is after start_date."""  
        from datetime import datetime  
        start = datetime.strptime(self.start_date, '%Y-%m-%d')  
        end = datetime.strptime(self.end_date, '%Y-%m-%d')

        if end < start:  
            raise ValueError('end_date must be after start_date')  
        return self

[/code]

3\. Automatic Retrying¶

Instructor автоматически повторяет попытки при неудачной валидации, передавая обратную связь об ошибке LLM. [code] # Retries up to 3 times if validation fails
user = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Extract user from: John, age unknown"
}],
response_model=User,
max_retries=3 # Default is 3
)

# If age can't be extracted, Instructor tells the LLM:  
# "Validation error: age - field required"  
# LLM tries again with better extraction

[/code] Как это работает: 1. LLM генерирует вывод 2. Pydantic выполняет валидацию 3. Если недействительно: сообщение об ошибке отправляется обратно LLM 4. LLM пробует снова с обратной связью об ошибке 5. Повторяется до max_retries раз

4\. Streaming¶

Транслируйте частичные результаты для обработки в реальном времени.

Streaming Partial Objects¶

[code] from instructor import Partial

class Story(BaseModel):  
    title: str  
    content: str  
    tags: list[str]

# Stream partial updates as LLM generates  
for partial_story in client.messages.create_partial(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "Write a short sci-fi story"  
    }],  
    response_model=Story  
):  
    print(f"Title: {partial_story.title}")  
    print(f"Content so far: {partial_story.content[:100]}...")  
    # Update UI in real-time

[/code]

Streaming Iterables¶

[code] class Task(BaseModel):
title: str
priority: str

# Stream list items as they're generated  
tasks = client.messages.create_iterable(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "Generate 10 project tasks"  
    }],  
    response_model=Task  
)

for task in tasks:  
    print(f"- {task.title} ({task.priority})")  
    # Process each task as it arrives

[/code]

Provider Configuration¶

Anthropic Claude¶

[code] import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(  
    Anthropic(api_key="your-api-key")  
)

# Use with Claude models  
response = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[...],  
    response_model=YourModel  
)

[/code]

OpenAI¶

[code] from openai import OpenAI

client = instructor.from_openai(  
    OpenAI(api_key="your-api-key")  
)

response = client.chat.completions.create(  
    model="gpt-4o-mini",  
    response_model=YourModel,  
    messages=[...]  
)

[/code]

Local Models (Ollama)¶

[code] from openai import OpenAI

# Point to local Ollama server  
client = instructor.from_openai(  
    OpenAI(  
        base_url="http://localhost:11434/v1",  
        api_key="ollama"  # Required but ignored  
    ),  
    mode=instructor.Mode.JSON  
)

response = client.chat.completions.create(  
    model="llama3.1",  
    response_model=YourModel,  
    messages=[...]  
)

[/code]

Common Patterns¶

Pattern 1: Data Extraction from Text¶

[code] class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int
headquarters: str

text = """  
Tesla, Inc. was founded in 2003. It operates in the automotive and energy  
industry with approximately 140,000 employees. The company is headquartered  
in Austin, Texas.  
"""

company = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": f"Extract company information from: {text}"  
    }],  
    response_model=CompanyInfo  
)

[/code]

Pattern 2: Classification¶

[code] class Category(str, Enum):
TECHNOLOGY = "technology"
FINANCE = "finance"
HEALTHCARE = "healthcare"
EDUCATION = "education"
OTHER = "other"

class ArticleClassification(BaseModel):  
    category: Category  
    confidence: float = Field(ge=0.0, le=1.0)  
    keywords: list[str]

classification = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": "Classify this article: [article text]"  
    }],  
    response_model=ArticleClassification  
)

[/code]

Pattern 3: Multi-Entity Extraction¶

[code] class Person(BaseModel):
name: str
role: str

class Organization(BaseModel):  
    name: str  
    industry: str

class Entities(BaseModel):  
    people: list[Person]  
    organizations: list[Organization]  
    locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": f"Extract all entities from: {text}"  
    }],  
    response_model=Entities  
)

for person in entities.people:  
    print(f"{person.name} - {person.role}")

[/code]

Pattern 4: Structured Analysis¶

[code] class SentimentAnalysis(BaseModel):
overall_sentiment: Sentiment
positive_aspects: list[str]
negative_aspects: list[str]
suggestions: list[str]
score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[{  
        "role": "user",  
        "content": f"Analyze this review: {review}"  
    }],  
    response_model=SentimentAnalysis  
)

[/code]

Pattern 5: Batch Processing¶

[code] def extract_person(text: str) -> Person:
return client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Extract person from: {text}"
}],
response_model=Person
)

texts = [  
    "John Doe is a 30-year-old engineer",  
    "Jane Smith, 25, works in marketing",  
    "Bob Johnson, age 40, software developer"  
]

people = [extract_person(text) for text in texts]

[/code]

Advanced Features¶

Union Types¶

[code] from typing import Union

class TextContent(BaseModel):  
    type: str = "text"  
    content: str

class ImageContent(BaseModel):  
    type: str = "image"  
    url: HttpUrl  
    caption: str

class Post(BaseModel):  
    title: str  
    content: Union[TextContent, ImageContent]  # Either type

# LLM chooses appropriate type based on content

[/code]

Dynamic Models¶

[code] from pydantic import create_model

# Create model at runtime  
DynamicUser = create_model(  
    'User',  
    name=(str, ...),  
    age=(int, Field(ge=0)),  
    email=(EmailStr, ...)  
)

user = client.messages.create(  
    model="claude-sonnet-4-5-20250929",  
    max_tokens=1024,  
    messages=[...],  
    response_model=DynamicUser  
)

[/code]

Custom Modes¶

[code] # For providers without native structured outputs
client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.JSON # JSON mode
)

# Available modes:  
# - Mode.ANTHROPIC_TOOLS (recommended for Claude)  
# - Mode.JSON (fallback)  
# - Mode.TOOLS (OpenAI tools)

[/code]

Context Management¶

[code] # Single-use client
with instructor.from_anthropic(Anthropic()) as client:
result = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[...],
response_model=YourModel
)
# Client closed automatically

[/code]

Error Handling¶

Handling Validation Errors¶

[code] from pydantic import ValidationError

try:  
    user = client.messages.create(  
        model="claude-sonnet-4-5-20250929",  
        max_tokens=1024,  
        messages=[...],  
        response_model=User,  
        max_retries=3  
    )  
except ValidationError as e:  
    print(f"Failed after retries: {e}")  
    # Handle gracefully

except Exception as e:  
    print(f"API error: {e}")

[/code]

Custom Error Messages¶

[code] class ValidatedUser(BaseModel):
name: str = Field(description="Full name, 2-100 characters")
age: int = Field(description="Age between 0 and 120", ge=0, le=120)
email: EmailStr = Field(description="Valid email address")

    class Config:  
        # Custom error messages  
        json_schema_extra = {  
            "examples": [  
                {  
                    "name": "John Doe",  
                    "age": 30,  
                    "email": "john@example.com"  
                }  
            ]  
        }

[/code]

Best Practices¶

1\. Clear Field Descriptions¶

[code] # ❌ Bad: Vague
class Product(BaseModel):
name: str
price: float

# ✅ Good: Descriptive  
class Product(BaseModel):  
    name: str = Field(description="Product name from the text")  
    price: float = Field(description="Price in USD, without currency symbol")

[/code]

2\. Use Appropriate Validation¶

[code] # ✅ Good: Constrain values
class Rating(BaseModel):
score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")
review: str = Field(min_length=10, description="Review text, at least 10 chars")

[/code]

3\. Provide Examples in Prompts¶

[code] messages = [{
"role": "user",
"content": """Extract person info from: "John, 30, engineer"

Example format:  
{  
  "name": "John Doe",  
  "age": 30,  
  "occupation": "engineer"  
}"""  
}]

[/code]

4\. Use Enums for Fixed Categories¶

[code] # ✅ Good: Enum ensures valid values
class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"

class Application(BaseModel):  
    status: Status  # LLM must choose from enum

[/code]

5\. Handle Missing Data Gracefully¶

[code] class PartialData(BaseModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default_value"

# LLM only needs to provide required_field

[/code]

Comparison to Alternatives¶

Возможность	Instructor	Ручной JSON	LangChain	DSPy
Типобезопасность	✅ Да	❌ Нет	⚠️ Частично	✅ Да
Автовалидация	✅ Да	❌ Нет	❌ Нет	⚠️ Ограничено
Автоповтор	✅ Да	❌ Нет	❌ Нет	✅ Да
Стриминг	✅ Да	❌ Нет	✅ Да	❌ Нет
Мультипровайдерность	✅ Да	⚠️ Вручную	✅ Да	✅ Да
Кривая обучения	Низкая	Низкая	Средняя	Высокая
Когда выбирать Instructor:
* Нужны структурированные проверенные выводы
* Нужна типобезопасность и поддержка IDE
* Требуются автоматические повторные попытки
* Создание систем извлечения данных

Когда выбирать альтернативы: * DSPy: Нужна оптимизация промптов * LangChain: Построение сложных цепочек * Вручную: Простые разовые извлечения

Resources¶

Документация : https://python.useinstructor.com
GitHub : https://github.com/jxnl/instructor (15k+ звёзд)
Поваренная книга : https://python.useinstructor.com/examples
Discord : Доступна поддержка сообщества

Skill metadata​¶

Reference: full SKILL.md​¶

Instructor: Структурированные выводы LLM¶

When to Use This Skill​¶

Installation​¶

Quick Start​¶

Basic Example: Extract User Data​¶

With OpenAI​¶

Core Concepts​¶

1\. Response Models (Pydantic)​¶

Basic Model​¶

Nested Models​¶

Optional Fields​¶

Enums for Constraints​¶

2\. Validation​¶

Built-in Validators​¶

Custom Validators​¶

Model-Level Validation​¶

3\. Automatic Retrying​¶

4\. Streaming​¶

Streaming Partial Objects​¶

Streaming Iterables​¶

Provider Configuration​¶

Anthropic Claude​¶

OpenAI​¶

Local Models (Ollama)​¶

Common Patterns​¶

Pattern 1: Data Extraction from Text​¶

Pattern 2: Classification​¶

Pattern 3: Multi-Entity Extraction​¶

Pattern 4: Structured Analysis​¶

Pattern 5: Batch Processing​¶

Advanced Features​¶

Union Types​¶

Dynamic Models​¶

Custom Modes​¶

Context Management​¶

Error Handling​¶

Handling Validation Errors​¶

Custom Error Messages​¶

Best Practices​¶

1\. Clear Field Descriptions​¶

2\. Use Appropriate Validation​¶

3\. Provide Examples in Prompts​¶

4\. Use Enums for Fixed Categories​¶

5\. Handle Missing Data Gracefully​¶

Comparison to Alternatives​¶

Resources​¶

See Also​¶

Skill metadata¶

Reference: full SKILL.md¶

When to Use This Skill¶

Installation¶

Quick Start¶

Basic Example: Extract User Data¶

With OpenAI¶

Core Concepts¶

1\. Response Models (Pydantic)¶

Basic Model¶

Nested Models¶

Optional Fields¶

Enums for Constraints¶

2\. Validation¶

Built-in Validators¶

Custom Validators¶

Model-Level Validation¶

3\. Automatic Retrying¶

4\. Streaming¶

Streaming Partial Objects¶

Streaming Iterables¶

Provider Configuration¶

Anthropic Claude¶

OpenAI¶

Local Models (Ollama)¶

Common Patterns¶

Pattern 1: Data Extraction from Text¶

Pattern 2: Classification¶

Pattern 3: Multi-Entity Extraction¶

Pattern 4: Structured Analysis¶

Pattern 5: Batch Processing¶

Advanced Features¶

Union Types¶

Dynamic Models¶

Custom Modes¶

Context Management¶

Error Handling¶

Handling Validation Errors¶

Custom Error Messages¶

Best Practices¶

1\. Clear Field Descriptions¶

2\. Use Appropriate Validation¶

3\. Provide Examples in Prompts¶

4\. Use Enums for Fixed Categories¶

5\. Handle Missing Data Gracefully¶

Comparison to Alternatives¶

Resources¶

See Also¶