Changelog¶

All notable changes to this project will be documented in this file.

2.8.0¶

Changed¶

Significant runtime performance improvements to minimize decorator overhead
Eliminated repetitive Python introspection (inspect.signature) from the execution path by resolving parameters during decoration
Vectorized validation checks (nullable and unique) across multiple columns simultaneously using Narwhals expression API (.select())
Avoided redundant dataframe wrappers when running value checks
Optimized the validation builder for faster pipeline setup and eliminated unused dtype resolution overhead

2.7.0¶

Added¶

@df_in(["col1", "col2"]) shorthand: pass columns as the first positional argument
Opt-in strict_specs mode ([tool.daffy] strict_specs = true): raises TypeError on invalid column spec keys or types instead of silently ignoring them

Changed¶

Config discovery now searches pyproject.toml from the current directory up through parent directories
Configuration caching is now keyed by current working directory for deterministic behavior across cwd changes
Unnamed df_in now selects the first DataFrame-like argument instead of always taking the first positional argument

2.6.1¶

Changed¶

Optional dependency detection now recognizes all advertised backends (Pandas, Polars, Modin, PyArrow)
Early import failure now triggers only when none of the supported DataFrame libraries are installed
Updated optional dependency tests and isolation docs/scripts to align with the expanded backend detection contract
Synced API docs and README examples with current df_in/df_out/df_log signatures and built-in check names

Added¶

Declared pydantic optional dependency extra so pip install 'daffy[pydantic]' matches runtime install guidance
Added docs contract tests to detect signature/example drift in README.md and docs/api.md

2.6.0¶

Added¶

Support for pandas 3.x (requires Python 3.11+)

2.5.1¶

Internal Improvements¶

Simplified config loading by using loops over key lists instead of repetitive if-statements
Added _get_int_config helper to deduplicate integer config getters
Extracted _log_dataframe helper from duplicate logging functions
Replaced repeated expand+add blocks in pipeline builder with data-driven loop

2.5.0¶

Internal Improvements¶

Major architecture refactoring: Introduced validation pipeline with protocol-based validators
New ValidationPipeline orchestrates all validation steps
New ValidationContext provides single narwhals conversion point for efficiency
Validators follow a clean protocol (Validator, SkippableValidator)
Pipeline builder assembles validators from column specs
Removed dead code and unused helper functions
Removed unused type aliases (ColumnsList, ColumnsDict)
Replaced skylos with vulture for dead code detection in CI

2.4.1¶

Internal Improvements¶

Enabled ALL Ruff linting rules with selective ignores for comprehensive code quality checks
Refactored validate_dataframe to reduce cyclomatic complexity
Migrated from os.path to pathlib in configuration module
Performance optimization: extracted try/except from validation loop (PERF203)

2.4.0¶

New Features¶

Shape validation: enforce row count constraints on DataFrames
min_rows: require a minimum number of rows
max_rows: limit the maximum number of rows
exact_rows: require an exact number of rows
allow_empty: control whether empty DataFrames (0 rows) are allowed
New allow_empty configuration option in pyproject.toml (default: true)

Internal Improvements¶

Added property-based tests with Hypothesis for mathematical invariants
Added boundary-value and default-parameter tests for improved test coverage

2.3.1¶

Bug Fixes¶

Fixed incomplete DataFrameType union that only included Pandas and Polars, missing PyArrow and Modin. Now uses narwhals' IntoDataFrame Protocol which correctly accepts all supported DataFrame types.

2.3.0¶

Breaking Changes¶

Dropped Python 3.9 support - Python 3.10 is now the minimum required version (Python 3.9 reached end-of-life in October 2025)

Internal Improvements¶

Modernized type annotations using Python 3.10 syntax (X | Y instead of Union[X, Y], explicit TypeAlias)
Updated ruff target version to Python 3.10

2.2.0¶

New Features¶

Custom check functions: use lambdas for validation logic not covered by built-in checks
The function receives a Narwhals Series and returns a boolean Series (True = valid)
Example: {"price": {"checks": {"no_outliers": lambda s: s < s.mean() * 10}}}
The dictionary key becomes the check name in error messages
Lazy validation: collect all errors before raising with lazy=True parameter
Use @df_in(columns=..., lazy=True) or @df_out(columns=..., lazy=True)
Configurable via [tool.daffy] lazy = true in pyproject.toml
Useful for debugging DataFrames with multiple issues
Composite uniqueness: validate that column combinations are unique
Use composite_unique=[["col1", "col2"]] to ensure column combinations are unique
Works alongside single-column unique=True validation

Bug Fixes¶

Fixed value checks (gt, lt, ge, le, between, eq, ne, etc.) not working with PyArrow tables

Internal Improvements¶

Config validation: pyproject.toml settings now validated for correct types (prevents silent bugs like strict = "false" being treated as truthy)
Regex pattern caching: compiled patterns are now cached, improving performance when validating many DataFrames with the same column patterns
Empty regex patterns (r//) now raise a clear error instead of matching everything
Improved error handling for custom check functions

2.1.0¶

New Features¶

Added new value checks: notin, str_startswith, str_endswith, str_contains, str_length

2.0.2¶

Documentation¶

Released MkDocs documentation at daffy.readthedocs.io
Added missing attr_list, md_in_html, and pymdownx.emoji markdown extensions
Improved SEO: updated README title/intro, added PyPI keywords, added GitHub topics

2.0.1¶

Documentation¶

Simplified installation instructions - Daffy detects and works with whatever DataFrame library you have installed
Updated PyPI keywords to include modin, pyarrow, and narwhals
Added note that examples work with all supported backends (Pandas, Polars, Modin, PyArrow)

2.0.0¶

Major Refactoring¶

Migrated to Narwhals - Major internal refactoring to use Narwhals as a unified DataFrame abstraction layer
Narwhals is a lightweight compatibility layer used by Plotly, Altair, Bokeh, and Marimo
Now supports Modin and PyArrow in addition to Pandas and Polars
All existing functionality for Pandas and Polars remains unchanged
Public API (df_in, df_out, df_log) is fully backwards compatible

Changed¶

df_log with include_dtypes=True now shows unified Narwhals dtype representation (e.g., [String, Int64]) for both Pandas and Polars DataFrames
Previously Pandas showed ['object', 'int64'] while Polars showed [String, Int64]
This provides consistent logging output across all DataFrame libraries

Dependencies¶

Added narwhals>=2.14.0 as a required dependency

Removed¶

row_validation_convert_nans configuration option - NaN-to-None conversion is no longer needed. Pydantic handles NaN values correctly: accepts them for Optional[float] fields and correctly fails for constraints like Field(gt=0)

1.4.0¶

New Features¶

Vectorized value checks - Fast validation of column values using vectorized DataFrame operations
Comparison checks: gt, ge, lt, le (greater than, greater or equal, less than, less or equal)
Range check: between for inclusive range validation
Equality checks: eq, ne for exact value matching
Set membership: isin for validating values against a list
Null check: notnull for ensuring no null values
Regex matching: str_regex for pattern validation on string columns
Use rich column spec format: {"price": {"checks": {"gt": 0, "lt": 10000}}}
Combine multiple checks: {"age": {"checks": {"ge": 0, "le": 120}}}
Works with other validations: {"price": {"dtype": "float64", "nullable": False, "checks": {"gt": 0}}}
Supports regex column patterns: {"r/score_\\d+/": {"checks": {"between": (0, 100)}}}
Checks are skipped for missing optional columns
Error messages include sample failing values for debugging
Configurable via checks_max_samples in pyproject.toml

1.3.0¶

New Features¶

Optional columns - Mark columns as optional with required=False
Use rich column spec format: {"column": {"required": False}}
Optional columns are not required to exist in the DataFrame
If present, all other validations (dtype, nullable, unique) still apply
Works with regex patterns: {"r/extra_\\d+/": {"required": False}}
Default behavior unchanged (required=True, columns must exist)

1.2.0¶

New Features¶

Uniqueness column validation - Validate that columns contain only unique values
Use rich column spec format: {"column": {"unique": True}}
Combines with dtype and nullable: {"column": {"dtype": "int64", "unique": True, "nullable": False}}
Works with regex patterns: {"r/ID_\\d+/": {"unique": True}}
Default behavior unchanged (unique=False, duplicates allowed)

1.1.0¶

New Features¶

Nullable column validation - Validate that columns contain no null values
Use rich column spec format: {"column": {"nullable": False}}
Combines with dtype validation: {"column": {"dtype": "float64", "nullable": False}}
Works with regex patterns: {"r/Price_\\d+/": {"nullable": False}}
Default behavior unchanged (nullable=True, nulls allowed)

1.0.0¶

Stable Release¶

Daffy 1.0.0 marks the first stable release. The public API (df_in, df_out, df_log) is now considered stable and follows semantic versioning.

Changed¶

Development status upgraded from Beta to Production/Stable
Updated documentation to reflect current tooling (uv, ruff, pyrefly)

Fixed¶

Improved error handling for invalid regex patterns in column specifications
Better error messages when parameter extraction fails

Internal¶

Extracted duplicate row validation logic into shared helper function
Added docstrings to public-facing utility functions

API Stability¶

As of 1.0.0, Daffy follows semantic versioning:

Major versions (2.0, 3.0) may contain breaking changes
Minor versions (1.1, 1.2) add features without breaking changes
Patch versions (1.0.1, 1.0.2) contain bug fixes only

0.19.0¶

Performance Improvements¶

Early termination for row validation - Dramatically faster when validation errors exist
Stops scanning after collecting max_errors (default behavior)
71-124x speedup when errors are present (1.2ms vs 140ms for 100k rows with errors)
No overhead for valid data (maintains 767K rows/sec throughput)
Can be disabled with early_termination=False parameter for exact error counts
Error messages now indicate when scanning stopped early: "stopped scanning early (at least N more row(s) with errors)"

New Features¶

Added early_termination parameter to validate_dataframe_rows()
Default: True (stops after max_errors for performance)
Set to False to scan entire DataFrame and get exact error count

0.18.0¶

Major Performance Improvements¶

2x faster row validation - Optimized DataFrame conversion and validation pipeline
767K rows/sec on simple validation (was 400K)
165K rows/sec on complex bioinformatics data (32 columns, 5% missing values)
Changed from batch TypeAdapter validation to optimized row-by-row with fast DataFrame iteration
Use itertuples() for efficient row access while preserving None values

Critical Bug Fix¶

Fixed NaN handling for Optional fields - NaN values in Optional fields are now properly converted to None
Previous implementation failed validation on legitimate missing data
Now correctly handles nullable numeric fields in pandas DataFrames
Converts numeric columns with NaN to object dtype to preserve None values

New Features¶

Bioinformatics benchmark (scripts/benchmark_bioinformatics.py) - Realistic validation testing
32-column feature store schema modeling cancer research data
Gene expression, clinical measurements, patient demographics, mutations, outcomes
Mixed types: floats, ints, strings, bools, Optional fields, Literal enums
Missing data patterns (~5%) typical in real-world datasets
Cross-field validation (e.g., disease-free survival ≤ follow-up time)
Performance benchmarking suite - Compare Daffy against competing libraries
Test multiple scenarios (simple, medium complexity)
Multiple dataset sizes (1k, 10k, 100k rows)
Compare pandas vs polars implementations

Internal Improvements¶

Removed unused internal functions (_pandas_to_records_fast, _iterate_dataframe_with_index)
Simplified error formatting (removed dead code branches)
Improved test coverage to 98.34%
Better handling of edge cases in validation error reporting

0.17.1¶

Improve package discoverability on PyPI and GitHub
Add polars, pydantic, decorator to PyPI keywords
Add Framework::Pydantic, Testing, and Typing classifiers
Add changelog and issues URLs to package metadata
Set GitHub topics for better search visibility

0.17.0¶

Add optional row-level validation using Pydantic models (requires Pydantic >= 2.4.0)
New row_validator parameter for @df_in and @df_out decorators
Validates actual data values, not just column structure
Batch validation for optimal performance (10-100x faster than row-by-row)
Informative error messages showing which rows failed and why
Configuration via pyproject.toml: row_validation_max_errors and row_validation_convert_nans
Works with both Pandas and Polars DataFrames

0.16.1¶

Internal refactoring: extracted DataFrame type handling to dedicated module for better code organization and maintainability

0.16.0¶

Removed Pandas and Polars from required dependencies. Daffy will not pull in Polars if your project just uses Pandas and vice versa. All combinations are dynamically supported and require no changes from existing users.

Testing & CI¶

Added comprehensive CI testing for all dependency combinations
New test suite validates optional dependency behavior
Manual testing script for developers (scripts/test_isolated_deps.py)
Updated CI to test pandas-only, polars-only, both, and none scenarios

0.15.0¶

Exception messages now include function names to improve debugging
Input validation: "Missing columns: ['Col'] in function 'my_func' parameter 'param'. Got columns: ['Other']"
Return value validation messages now clearly state "return value" instead of just showing function name
Output validation: "Missing columns: ['Col'] in function 'my_func' return value. Got columns: ['Other']"

0.14.2¶

Internal code quality improvements

0.14.1¶

Internal code quality improvements

0.14.0¶

Improve df_in error messages to include parameter names

0.13.2¶

Updated urls for Pypi site compatibility

0.13.1¶

Update documentation
Update dependencies

0.13.0¶

Fix type annotation issues with decorator parameters that could cause type errors in strict type checking
Use Sequence instead of List for better type variance compatibility
Add test case that validates type compatibility

0.12.0¶

Add support for regex patterns used with column dtype validation

0.11.0¶

Update function parameter types for better type safety
Fix missing return statement in df_log decorator
Added stricter mypy type checking settings

0.10.1¶

Built and published with UV. No functional changes

0.10.0¶

Add support for regex patterns in column name validation

0.9.4¶

Fix to strict flag loading when tool config was missing

0.9.3¶

Add configuration system to set default strict mode in pyproject.toml
Improve logging when multiple columns are missing

0.9.2¶

Add explicit __all__ export for functions to make Mypy happy

0.9.0¶

Add marker (py.typed) to tell Mypy that the library has type annotations
Fix bug when using strict parameter and no name parameter in @df_in

0.8.0¶

Support Polars DataFrames

0.7.0¶

Support Pandas 2.x
Drop support for Python 3.7 and 3.8
Build and test with Python 3.12 also

0.6.0¶

Make checking columns of multiple function parameters work also with positional arguments (thanks @latvanii)

0.5.0¶

Added strict parameter for @df_in and @df_out

0.4.2¶

Added docstrings for the decorators
Fix import of @df_log

0.4.1¶

Add include_dtypes parameter for @df_log.
Fix handling of empty signature with @df_in.

0.4.0¶

Added @df_log for logging.
Improved assertion messages.

0.3.0¶

Added type hints.

0.2.1¶

Added Pypi classifiers.

0.2.0¶

Fixed decorator usage.
Added functools wraps.

0.1.0¶

Initial release.