Pydantic
Pydantic
Data validation library using Python type hints. Standard in FastAPI and LangChain. v2 (current) is Rust-backed — ~5-50x faster than v1.
pip install pydantic # v2 by default
BaseModel — Core
from pydantic import BaseModel, Field, EmailStr
from typing import Optional
from datetime import datetime
class User(BaseModel):
id: int
name: str
email: EmailStr # validates email format
age: int = Field(gt=0, lt=150) # validation constraints
bio: Optional[str] = None # optional, default None
created_at: datetime = Field(default_factory=datetime.now)
tags: list[str] = []
# Instantiation — validates immediately, raises ValidationError on failure
user = User(id=1, name="Alice", email="alice@example.com", age=30)
# Serialization
user.model_dump() # dict
user.model_dump_json() # JSON string
user.model_dump(exclude={"bio"}) # exclude fields
user.model_dump(include={"id", "name"})
# Deserialization
data = {"id": 1, "name": "Alice", "email": "alice@example.com", "age": 30}
user = User.model_validate(data) # from dict
user = User.model_validate_json('{"id": 1, ...}') # from JSON string
Field — Validation & Metadata
from pydantic import Field
class Product(BaseModel):
name: str = Field(min_length=1, max_length=100)
price: float = Field(gt=0, description="Price in USD")
quantity: int = Field(ge=0, le=10000, default=0)
sku: str = Field(pattern=r'^[A-Z]{3}-\d{4}$') # regex
tags: list[str] = Field(default_factory=list, max_length=10)
Common constraints: gt, ge, lt, le, min_length, max_length, pattern, default, default_factory.
Validators
from pydantic import field_validator, model_validator
class Order(BaseModel):
items: list[str]
total: float
discount: float = 0.0
@field_validator("discount")
@classmethod
def discount_valid(cls, v):
if v < 0 or v > 1:
raise ValueError("discount must be between 0 and 1")
return v
@model_validator(mode="after")
def total_positive_after_discount(self):
if self.total * (1 - self.discount) < 0:
raise ValueError("discounted total cannot be negative")
return self
@field_validator— single field, runs before assignment@model_validator(mode="after")— runs after all fields set, has access toself@model_validator(mode="before")— runs on raw input dict before field validation
Nested Models
class Address(BaseModel):
street: str
city: str
country: str = "India"
class Company(BaseModel):
name: str
address: Address
employees: list[User] = []
company = Company(
name="Acme",
address={"street": "123 Main St", "city": "Pune"}, # dict → Address auto-coerced
)
print(company.address.city) # "Pune"
Config — Model Behavior
class Settings(BaseModel):
model_config = {
"str_strip_whitespace": True, # strip leading/trailing whitespace
"str_to_lower": True, # convert strings to lowercase
"extra": "forbid", # raise if extra fields passed
"frozen": True, # make model immutable (hashable)
"populate_by_name": True, # allow field name OR alias
"from_attributes": True, # allow ORM model → Pydantic (SQLAlchemy)
}
Aliases — API Field Names
class APIResponse(BaseModel):
user_id: int = Field(alias="userId") # JSON uses camelCase
created_at: datetime = Field(alias="createdAt")
model_config = {"populate_by_name": True} # allow both "user_id" and "userId"
data = {"userId": 1, "createdAt": "2026-04-16T00:00:00"}
r = APIResponse.model_validate(data)
print(r.user_id) # 1
# Serialize with alias
r.model_dump(by_alias=True) # {"userId": 1, "createdAt": ...}
Discriminated Unions
from typing import Literal, Union, Annotated
from pydantic import Discriminator
class Cat(BaseModel):
type: Literal["cat"]
meows: bool
class Dog(BaseModel):
type: Literal["dog"]
barks: bool
class Pet(BaseModel):
animal: Annotated[Union[Cat, Dog], Discriminator("type")]
pet = Pet.model_validate({"animal": {"type": "cat", "meows": True}})
isinstance(pet.animal, Cat) # True
Pydantic Settings — Environment Variables
from pydantic_settings import BaseSettings # pip install pydantic-settings
class Settings(BaseSettings):
database_url: str
redis_url: str = "redis://localhost:6379"
debug: bool = False
api_key: str
model_config = {"env_file": ".env", "env_prefix": "APP_"}
settings = Settings() # reads from environment / .env file
# APP_DATABASE_URL=... in .env
Pydantic in LangChain
LangChain uses Pydantic for structured outputs:
from langchain_core.output_parsers import PydanticOutputParser
class Sentiment(BaseModel):
label: Literal["positive", "negative", "neutral"]
score: float = Field(ge=0.0, le=1.0)
reasoning: str
parser = PydanticOutputParser(pydantic_object=Sentiment)
chain = prompt | llm | parser
result: Sentiment = await chain.ainvoke({"text": "This product is great!"})
print(result.label) # "positive"
ValidationError — Handling
from pydantic import ValidationError
try:
user = User(id="not-an-int", name="", email="invalid")
except ValidationError as e:
print(e.error_count()) # 3 errors
for err in e.errors():
print(err["loc"], err["msg"], err["type"])
# ('id',) 'Input should be a valid integer' 'int_parsing'
# ('name',) 'String should have at least 1 character' 'string_too_short'
# ('email',) 'value is not a valid email address' 'value_error'
Interview Talking Points
- "Pydantic v2 is Rust-backed — validation is 5-50x faster than v1. Critical for high-throughput APIs."
- "FastAPI uses Pydantic for both request validation AND response serialization — one model definition, two uses."
- "I use
model_config = {'extra': 'forbid'}in API request models — any unexpected field raises an error, catching API misuse early."
Related
- [[Python/Libraries/FastAPI]] — primary use of Pydantic
- [[Python/Libraries/Asyncio]] — async integration
- [[Python/Language Core/Object Oriented Programming]] — Pydantic extends Pythonic OOP
- [[AI & ML/Langchain]] — LangChain uses Pydantic for structured outputs