Daffy¶
Validate pandas and Polars DataFrames with Python decorators.
Daffy catches missing columns, wrong data types, and invalid values at runtime — before they cause errors downstream in your data pipeline. Just add decorators to your functions.
Also supports Modin and PyArrow DataFrames.
Lightweight
Column & dtype validation with minimal overhead
Value Constraints
Nullability, uniqueness, range checks
Row Validation
Deep validation with Pydantic models
Multi-Backend
Works with pandas, Polars, Modin, PyArrow
Quick Example¶
from daffy import df_in, df_out
@df_in(columns=["price", "bedrooms", "location"])
@df_out(columns=["price_per_room", "price_category"])
def analyze_housing(houses_df):
# Transform raw housing data into price analysis
return analyzed_df
If a column is missing, has wrong dtype, or violates a constraint — Daffy fails fast with a clear error message at the function boundary.
Installation¶
Works with whatever DataFrame library you already have installed. Python 3.10–3.14.
Why Daffy?¶
Most DataFrame validation tools are schema-first (define schemas separately) or pipeline-wide (run suites over datasets). Daffy is decorator-first: validate inputs and outputs where transformations happen.
| ✓ | Non-intrusive — Just add decorators — no refactoring, no custom DataFrame types, no schema files |
| ✓ | Easy to adopt — Add in 30 seconds, remove just as fast if needed |
| ✓ | In-process — No external stores, orchestrators, or infrastructure |
| ✓ | Pay for what you use — Column validation is essentially free; opt into row validation when needed |
Next Steps¶
Getting Started
Quick introduction to Daffy's core features
Usage Guide
Core validation features for everyday use
Recipes & Patterns
Real-world examples and best practices
API Reference
Decorator signatures and parameters