New Features
- Propensity modeling beyond static logistic regression
ipw()now accepts any sklearn classifier via themodelargument,
enabling the use of models like random forests and gradient boosting while
preserving all existing trimming and diagnostic features. Dense-only
estimators and models without linear coefficients are fully supported.
Propensity probabilities are stabilized to avoid numerical issues.- Allow customization of logistic regression by passing a configured
:class:~sklearn.linear_model.LogisticRegressioninstance through the
modelargument. Also, the CLI now accepts
--ipw_logistic_regression_kwargsJSON to build that estimator directly for
command-line workflows.
- Covariate diagnostics
- Added KL divergence calculations for covariate comparisons (numeric and
one-hot categorical), exposed viaBalanceDF.kld()alongside linked-sample
aggregation support.
- Added KL divergence calculations for covariate comparisons (numeric and
- Weighting Methods
rake()andpoststratify()now honourweight_trimming_mean_ratioand
weight_trimming_percentile, trimming and renormalising weights through the
enhancedtrim_weights(..., target_sum_weights=...)API so the documented
parameters work as expected
(#147).
Documentation
- Added comprehensive post-stratification tutorial notebook
(balance_quickstart_poststratify.ipynb)
(#141,
#142,
#143). - Expanded poststratify docstring with clear examples and improved statistical
methods documentation
(#141). - Added project badges to README for build status, Python version support, and
release tracking
(#145). - Added IPW quickstart tutorial showcasing default logistic regression and
custom sklearn classifier usage in (balance_quickstart.ipynb). - Shorten the welcome message (for when importing the package).
Code Quality & Refactoring
-
Raking algorithm refactor
- Removed
ipfndependency and replaced with a vectorized NumPy
implementation (_run_ipf_numpy) for iterative proportional fitting,
resulting in significant performance improvements and eliminating external
dependency (#135).
- Removed
-
IPW method refactoring
- Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code
patterns into reusable helper functions:_compute_deviance(),
_compute_proportion_deviance(),_convert_to_dense_array(). - Removed manual ASMD improvement calculation and now uses existing
compute_asmd_improvement()fromweighted_comparisons_stats.py
- Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code
-
Type safety improvements
- Migrated 32 Python files from
# pyre-unsafeto# pyre-strictmode,
covering core modules, statistics, weighting methods, datasets, and test
files - Modernized type hints to PEP 604 syntax (
X | Yinstead ofUnion[X, Y])
across 11 files for improved readability and Python 3.10+ alignment - Type alias definitions in
typing.pyretainUnionsyntax for Python 3.9
compatibility - Enhanced plotting function type safety with
TypedDictdefinitions and
proper type narrowing - Replaced assert-based type narrowing with
_verify_value_type()helper for
better error messages and pyre-strict compliance
- Migrated 32 Python files from
-
Renamed BalanceDF to BalanceDF****
- BalanceCovarsDF to BalanceDFCovars
- BalanceOutcomesDF to BalanceDFOutcomes
- BalanceWeightsDF to BalanceDFWeights
Bug Fixes
- Utility Functions
- Fixed
quantize()to preserve column ordering and use proper TypeError
exceptions (#133)
- Fixed
- Statistical Functions
- Fixed division by zero in
asmd_improvement()whenasmd_mean_beforeis
zero, now returns0.0for 0% improvement
- Fixed division by zero in
- CLI & Infrastructure
- Replaced deprecated argparse FileType with pathlib.Path
(#134)
- Replaced deprecated argparse FileType with pathlib.Path
- Weight Trimming
- Fixed
trim_weights()to consistently returnpd.Serieswith
dtype=np.float64and preserve original index across both trimming methods - Fixed percentile-based winsorization edge case:
_validate_limit()now
automatically adjusts limits to prevent floating-point precision issues
(#144) - Enhanced documentation for
trim_weights()and_validate_limit()with
clearer examples and explanations
- Fixed
Tests
- Enhanced test coverage for weight trimming with
test_trim_weights_return_type_consistencyand 11 comprehensive tests for
_validate_limit()covering edge cases, error conditions, and boundary
conditions
Contributors
@neuralsorcerer, @talgalili, @wesleytlee
Full Changelog: 0.12.1...0.13.0