Breaking Changes
- CLI: unmentioned columns now go to
ignore_columnsinstead ofoutcome_columns- Previously, when
--outcome_columnswas not explicitly set, all columns that
were not the id, weight, or a covariate were automatically classified as
outcome columns. Now those columns are placed intoignore_columnsinstead. - Columns that are explicitly mentioned — the id column, weight column,
covariate columns, and outcome columns — are not ignored.
- Previously, when
New Features
- ASCII comparative histogram and plot improvements
- Added
ascii_comparative_histfor comparing multiple distributions against a
baseline using inline visual indicators (█,▒,▐,░). - Comparative ASCII plots now order datasets as population → adjusted → sample.
ascii_plot_distaccepts a newcomparativekeyword (defaultTrue) to
toggle between comparative and grouped-bar histograms for numeric variables.
- Added
Code Quality & Refactoring
- Moved dataset loading implementations out of
balance.datasets.__init__- Refactored
load_sim_data,load_cbps_data, andload_datainto
balance.datasets.loading_dataand re-exported them from
balance.datasetsto preserve the public API while keeping module
responsibilities focused.
- Refactored
Documentation
- ASCII plot documentation and tutorial examples
- Added rendered text-plot examples to ASCII plot docstrings and documented
library="balance"support. Updatedbalance_quickstart.ipynbwith
adjusted vs unadjusted ASCII plot examples.
- Added rendered text-plot examples to ASCII plot docstrings and documented
- Improved
keep_columnsdocumentation- Updated docstrings for
has_keep_columns(),keep_columns(), and the
--keep_columnsargument to clarify that keep columns control which columns
appear in the final output CSV. Keep columns that are not id, weight,
covariate, or outcome columns will be placed intoignore_columnsduring
processing but are still retained and available in the output.
- Updated docstrings for
- Clarified
_prepare_input_model_matrixargument docs- Updated docstrings in
balance.utils.model_matrixwith
explicit descriptions forsample,target,variables, andadd_na
behavior when preparing model-matrix inputs.
- Updated docstrings in
Bug Fixes
- Weight diagnostics now consistently accept DataFrame inputs
design_effect,nonparametric_skew,prop_above_and_below, and
weighted_median_breakdown_pointnow explicitly normalize DataFrame inputs
to their first column before computation, matching validation behavior and
returning scalar/Series outputs consistently.
- Model-matrix robustness improvements
_make_df_column_names_unique()now avoids suffix collisions when columns
likea,a_1, and repeatedanames appear together, renaming
duplicates deterministically to prevent downstream clashes._prepare_input_model_matrix()now raises a deterministicValueError
when the input sample has zero rows, instead of relying on an assertion.
- Stabilized
prop_above_and_below()return pathsprop_above_and_below()now builds concatenated outputs only from present
Series objects and returnsNonewhen bothbelowandaboveareNone,
avoiding ambiguous concat inputs while preserving existing behavior for valid
threshold sets.
- Validated and normalized comma-separated CLI column arguments
- CLI column-list arguments now trim surrounding whitespace and reject empty
entries (for example,"id,,weight") with clearValueErrormessages,
preventing malformed column specifications from silently propagating. - Applied to
--covariate_columns,--covariate_columns_for_diagnostics,
--batch_columns,--keep_columns, and--outcome_columnsparsing.
- CLI column-list arguments now trim surrounding whitespace and reject empty
Tests
- Added end-to-end adjustment test with ASCII plot output and expanded ASCII plot edge-case coverage
TestAsciiPlotsAdjustmentEndToEndruns the full adjustment pipeline and
asserts exact expected ASCII output. Added tests forascii_plot_distwith
comparative=Falseand mixed categorical+numeric routing.
- Expanded warning coverage for
Sample.from_frame()ID inference- Added assertions that validate all three expected warnings are emitted when inferring an
idcolumn and default weights, including ID guessing, ID string casting, and automatic weight creation.
- Added assertions that validate all three expected warnings are emitted when inferring an
- Expanded IPW helper and diagnostics test coverage
- Added tests for
link_transform()andcalc_dev()to validate behavior
for extreme probabilities and finite 10-fold deviance summaries. - Refactored diagnostics tests to use a shared IPW setup helper, added
edge-case assertions for solver/penalty values, NaN coercion of non-scalar
inputs, and now assert labels match fitted model parameters.
- Added tests for
- Expanded
prop_above_and_below()edge-case coverage- Added focused tests for empty threshold iterables, mixed
Nonethreshold groups in dict mode, and explicit all-Nonethreshold handling across return formats.
- Added focused tests for empty threshold iterables, mixed
- Added unit coverage for CLI I/O and empty-batch handling
- Added focused tests for
BalanceCLI.process_batch()empty-sample failure payloads,load_and_check_input()CSV loading paths, andwrite_outputs()delimiter-aware output writing for both adjusted and diagnostics files.
- Added focused tests for
Contributors
@sahil350 , @neuralsorcerer, @talgalili