API Reference¶

@df_in¶

Validates DataFrame parameters passed to a function.

@df_in(
    name: str | None = None,
    columns: list[str] | dict[str, str | dict] | None = None,
    strict: bool | None = None,
    lazy: bool | None = None,
    composite_unique: list[list[str]] | None = None,
    row_validator: type[BaseModel] | None = None,
    min_rows: int | None = None,
    max_rows: int | None = None,
    exact_rows: int | None = None,
    allow_empty: bool | None = None
)

Parameters:

Parameter	Type	Description
`name`	`str` or `None`	Name of the parameter to validate. If not specified, validates the first DataFrame parameter.
`columns`	`list` or `dict` or `None`	Column specification. See Column Specifications.
`strict`	`bool` or `None`	If `True`, raises error for unexpected columns. Defaults to project config or `False`.
`lazy`	`bool` or `None`	If `True`, collects all validation errors before raising. Defaults to project config or `False`.
`composite_unique`	`list[list[str]]` or `None`	Column combinations that must be unique (for example `[['first_name', 'last_name']]`).
`row_validator`	`type[BaseModel]` or `None`	Pydantic model for row-level validation.
`min_rows`	`int` or `None`	Minimum number of rows required.
`max_rows`	`int` or `None`	Maximum number of rows allowed.
`exact_rows`	`int` or `None`	Exact number of rows required.
`allow_empty`	`bool` or `None`	Whether empty DataFrames are allowed. Defaults to project config or `True`.

Examples:

# Column list
@df_in(["a", "b", "c"])

# With dtypes
@df_in({"a": "int64", "b": "object"})

# With constraints
@df_in({"price": {"dtype": "float64", "nullable": False, "checks": {"gt": 0}}})

# Multiple DataFrames — use name to target specific parameters
@df_in(name="orders", columns=["id", "total"])
@df_in(name="customers", columns=["id", "name"])

# Row validation
@df_in(row_validator=OrderModel)

# Shape/lazy controls
@df_in({"email": {"nullable": False}}, lazy=True, min_rows=1, allow_empty=False)

@df_out¶

Validates the DataFrame returned by a function.

@df_out(
    columns: list[str] | dict[str, str | dict] | None = None,
    strict: bool | None = None,
    lazy: bool | None = None,
    composite_unique: list[list[str]] | None = None,
    row_validator: type[BaseModel] | None = None,
    min_rows: int | None = None,
    max_rows: int | None = None,
    exact_rows: int | None = None,
    allow_empty: bool | None = None
)

Parameters:

Parameter	Type	Description
`columns`	`list` or `dict` or `None`	Column specification. See Column Specifications.
`strict`	`bool` or `None`	If `True`, raises error for unexpected columns. Defaults to project config or `False`.
`lazy`	`bool` or `None`	If `True`, collects all validation errors before raising. Defaults to project config or `False`.
`composite_unique`	`list[list[str]]` or `None`	Column combinations that must be unique (for example `[['country', 'city']]`).
`row_validator`	`type[BaseModel]` or `None`	Pydantic model for row-level validation.
`min_rows`	`int` or `None`	Minimum number of rows required.
`max_rows`	`int` or `None`	Maximum number of rows allowed.
`exact_rows`	`int` or `None`	Exact number of rows required.
`allow_empty`	`bool` or `None`	Whether empty DataFrames are allowed. Defaults to project config or `True`.

Examples:

@df_out(["result", "score"])

@df_out({"score": "float64"}, strict=True)

@df_out(row_validator=ResultModel)

@df_out(min_rows=10, max_rows=100, lazy=True)

@df_log¶

Logs DataFrame structure when entering and exiting a function.

@df_log(
    level: int = logging.DEBUG,
    include_dtypes: bool = False
)

Parameters:

Parameter	Type	Description
`level`	`int`	Logging level for emitted messages. Default: `logging.DEBUG`.
`include_dtypes`	`bool`	If `True`, includes column dtypes in log output. Default: `False`.

Examples:

@df_log(level=logging.INFO)
def process(df):
    return df
# Logs: Function process parameters contained a DataFrame: columns: ['a', 'b']
# Logs: Function process returned a DataFrame: columns: ['a', 'b']

@df_log(include_dtypes=True)
def process(df):
    return df
# Logs: ... columns: ['a', 'b'] with dtypes ['int64', 'object']

Column Specifications¶

Columns can be specified in several formats:

List Format¶

Simple list of required column names:

columns=["a", "b", "c"]

Dict with Dtypes¶

Map column names to expected dtypes:

columns={"a": "int64", "b": "object", "c": "float64"}

Rich Column Spec¶

Full control over column validation:

columns={
    "column_name": {
        "dtype": "float64",      # Expected dtype (optional)
        "nullable": False,       # Allow null values? Default: True
        "unique": True,          # Require unique values? Default: False
        "required": True,        # Is column required? Default: True
        "checks": {              # Value checks (optional)
            "gt": 0,
            "lt": 100
        }
    }
}

Regex Patterns¶

Match multiple columns with regex:

columns=["id", "r/feature_\\d+/"]  # Matches feature_1, feature_2, etc.

columns={"r/score_\\d+/": {"dtype": "float64", "checks": {"between": (0, 100)}}}

Value Checks¶

Available built-in checks for the checks parameter:

Check	Argument	Description
`gt`	`number`	Greater than
`ge`	`number`	Greater than or equal
`lt`	`number`	Less than
`le`	`number`	Less than or equal
`eq`	`value`	Equal to
`ne`	`value`	Not equal to
`between`	`(lo, hi)`	Value in range (inclusive)
`isin`	`list`	Value in set
`notin`	`list`	Value not in set
`notnull`	`True`	No null values
`str_regex`	`pattern`	String matches regex
`str_startswith`	`str`	String starts with prefix
`str_endswith`	`str`	String ends with suffix
`str_contains`	`str`	String contains substring
`str_length`	`(lo, hi)`	String length in range

Custom checks are also supported by passing a callable as the check value.

Examples:

columns={
    "price": {"checks": {"gt": 0, "lt": 10000}},
    "score": {"checks": {"between": (0, 100)}},
    "status": {"checks": {"isin": ["active", "pending"]}},
    "email": {"checks": {"str_regex": r"^[^@]+@[^@]+\.[^@]+$"}}
}

Configuration¶

Configure Daffy in pyproject.toml:

[tool.daffy]
strict = false                 # Default strict mode (default: false)
row_validation_max_errors = 5  # Max errors shown in row validation (default: 5)
checks_max_samples = 5         # Max sample values in check errors (default: 5)

Configuration is read from the pyproject.toml in the current working directory or any parent directory.