New Features
- Outcome weight impact diagnostics
- Added paired outcome-weight impact tests (
y*w0vsy*w1) with confidence intervals. - Exposed in
BalanceDFOutcomes,Sample.diagnostics(), and the CLI via
--weights_impact_on_outcome_method.
- Added paired outcome-weight impact tests (
- Pandas 3 support
- Updated compatibility and tests for pandas 3.x
- Categorical distribution metrics without one-hot encoding
- KLD/EMD/CVMD/KS on
BalanceDF.covars()now operate on raw categorical variables
(with NA indicators) instead of one-hot encoded columns.
- KLD/EMD/CVMD/KS on
- Misc
- Raw-covariate adjustment for custom models
Sample.adjust()now supports fitting models on raw covariates (without a model matrix)
for IPW viause_model_matrix=False. String, object, and boolean columns are converted
to pandasCategoricaldtype, allowing sklearn estimators with native categorical
support (e.g.,HistGradientBoostingClassifierwithcategorical_features="from_dtype")
to handle them correctly. Requires scikit-learn >= 1.4 when categorical columns are
present.
- Validate weights include positive values
- Added a guard in weight diagnostics to error when all weights are zero.
- Support configurable ID column candidates
Sample.from_frame()andguess_id_column()now accept candidate ID column names
when auto-detecting the ID column.
- Formula support for BalanceDF model matrices
BalanceDF.model_matrix()now accepts aformulaargument to build
custom model matrices without precomputing them manually.
- Raw-covariate adjustment for custom models
Bug Fixes
- Removed deprecated setup build
- Replaced deprecated
setup.pywithpyproject.tomlbuild in CI to avoid build failure.
- Replaced deprecated
- Hardened ID column candidate validation
guess_id_column()now ignores duplicate candidate names and validates that candidates are non-empty strings.
- Hardened pandas 3 compatibility paths
- Updated string/NA handling and discrete checks for pandas 3 dtypes, and refreshed tests to accept string-backed dtypes.
Packaging & Tests
- Pandas 3.x compatibility
- Expanded the pandas dependency range to allow pandas 3.x releases.
- Direct util imports in tests
- Refactored util test modules to import helpers directly from their modules instead of via
balance_util.
- Refactored util test modules to import helpers directly from their modules instead of via
Breaking Changes
- Require positive weights for weight diagnostics that normalize or aggregate
design_effect,nonparametric_skew,prop_above_and_below, and
weighted_median_breakdown_pointnow raise aValueErrorwhen all weights
are zero.- Migration: ensure your weights include at least one positive value
before calling these diagnostics, or catch theValueErrorif all-zero
weights are possible in your workflow.
Contributors
@neuralsorcerer, @talgalili (with code/methodological review by @talsarig)
Full Changelog: 0.15.0...0.16.0