Back to Notes

Pydantic

Pydantic

Data validation library using Python type hints. Standard in FastAPI and LangChain. v2 (current) is Rust-backed — ~5-50x faster than v1.

pip install pydantic   # v2 by default

BaseModel — Core

from pydantic import BaseModel, Field, EmailStr
from typing import Optional
from datetime import datetime

class User(BaseModel):
    id: int
    name: str
    email: EmailStr                     # validates email format
    age: int = Field(gt=0, lt=150)      # validation constraints
    bio: Optional[str] = None           # optional, default None
    created_at: datetime = Field(default_factory=datetime.now)
    tags: list[str] = []

# Instantiation — validates immediately, raises ValidationError on failure
user = User(id=1, name="Alice", email="alice@example.com", age=30)

# Serialization
user.model_dump()                    # dict
user.model_dump_json()               # JSON string
user.model_dump(exclude={"bio"})     # exclude fields
user.model_dump(include={"id", "name"})

# Deserialization
data = {"id": 1, "name": "Alice", "email": "alice@example.com", "age": 30}
user = User.model_validate(data)     # from dict
user = User.model_validate_json('{"id": 1, ...}')  # from JSON string

Field — Validation & Metadata

from pydantic import Field

class Product(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    price: float = Field(gt=0, description="Price in USD")
    quantity: int = Field(ge=0, le=10000, default=0)
    sku: str = Field(pattern=r'^[A-Z]{3}-\d{4}$')   # regex
    tags: list[str] = Field(default_factory=list, max_length=10)

Common constraints: gt, ge, lt, le, min_length, max_length, pattern, default, default_factory.


Validators

from pydantic import field_validator, model_validator

class Order(BaseModel):
    items: list[str]
    total: float
    discount: float = 0.0

    @field_validator("discount")
    @classmethod
    def discount_valid(cls, v):
        if v < 0 or v > 1:
            raise ValueError("discount must be between 0 and 1")
        return v

    @model_validator(mode="after")
    def total_positive_after_discount(self):
        if self.total * (1 - self.discount) < 0:
            raise ValueError("discounted total cannot be negative")
        return self
  • @field_validator — single field, runs before assignment
  • @model_validator(mode="after") — runs after all fields set, has access to self
  • @model_validator(mode="before") — runs on raw input dict before field validation

Nested Models

class Address(BaseModel):
    street: str
    city: str
    country: str = "India"

class Company(BaseModel):
    name: str
    address: Address
    employees: list[User] = []

company = Company(
    name="Acme",
    address={"street": "123 Main St", "city": "Pune"},  # dict → Address auto-coerced
)
print(company.address.city)  # "Pune"

Config — Model Behavior

class Settings(BaseModel):
    model_config = {
        "str_strip_whitespace": True,       # strip leading/trailing whitespace
        "str_to_lower": True,               # convert strings to lowercase
        "extra": "forbid",                  # raise if extra fields passed
        "frozen": True,                     # make model immutable (hashable)
        "populate_by_name": True,           # allow field name OR alias
        "from_attributes": True,            # allow ORM model → Pydantic (SQLAlchemy)
    }

Aliases — API Field Names

class APIResponse(BaseModel):
    user_id: int = Field(alias="userId")          # JSON uses camelCase
    created_at: datetime = Field(alias="createdAt")

    model_config = {"populate_by_name": True}     # allow both "user_id" and "userId"

data = {"userId": 1, "createdAt": "2026-04-16T00:00:00"}
r = APIResponse.model_validate(data)
print(r.user_id)  # 1

# Serialize with alias
r.model_dump(by_alias=True)  # {"userId": 1, "createdAt": ...}

Discriminated Unions

from typing import Literal, Union, Annotated
from pydantic import Discriminator

class Cat(BaseModel):
    type: Literal["cat"]
    meows: bool

class Dog(BaseModel):
    type: Literal["dog"]
    barks: bool

class Pet(BaseModel):
    animal: Annotated[Union[Cat, Dog], Discriminator("type")]

pet = Pet.model_validate({"animal": {"type": "cat", "meows": True}})
isinstance(pet.animal, Cat)  # True

Pydantic Settings — Environment Variables

from pydantic_settings import BaseSettings  # pip install pydantic-settings

class Settings(BaseSettings):
    database_url: str
    redis_url: str = "redis://localhost:6379"
    debug: bool = False
    api_key: str

    model_config = {"env_file": ".env", "env_prefix": "APP_"}

settings = Settings()  # reads from environment / .env file
# APP_DATABASE_URL=... in .env

Pydantic in LangChain

LangChain uses Pydantic for structured outputs:

from langchain_core.output_parsers import PydanticOutputParser

class Sentiment(BaseModel):
    label: Literal["positive", "negative", "neutral"]
    score: float = Field(ge=0.0, le=1.0)
    reasoning: str

parser = PydanticOutputParser(pydantic_object=Sentiment)
chain = prompt | llm | parser
result: Sentiment = await chain.ainvoke({"text": "This product is great!"})
print(result.label)   # "positive"

ValidationError — Handling

from pydantic import ValidationError

try:
    user = User(id="not-an-int", name="", email="invalid")
except ValidationError as e:
    print(e.error_count())   # 3 errors
    for err in e.errors():
        print(err["loc"], err["msg"], err["type"])
    # ('id',) 'Input should be a valid integer' 'int_parsing'
    # ('name',) 'String should have at least 1 character' 'string_too_short'
    # ('email',) 'value is not a valid email address' 'value_error'

Interview Talking Points

  • "Pydantic v2 is Rust-backed — validation is 5-50x faster than v1. Critical for high-throughput APIs."
  • "FastAPI uses Pydantic for both request validation AND response serialization — one model definition, two uses."
  • "I use model_config = {'extra': 'forbid'} in API request models — any unexpected field raises an error, catching API misuse early."

Related

  • [[Python/Libraries/FastAPI]] — primary use of Pydantic
  • [[Python/Libraries/Asyncio]] — async integration
  • [[Python/Language Core/Object Oriented Programming]] — Pydantic extends Pythonic OOP
  • [[AI & ML/Langchain]] — LangChain uses Pydantic for structured outputs