facebookresearch/balance 0.17.0 on GitHub

Breaking Changes

CLI: unmentioned columns now go to ignore_columns instead of outcome_columns
- Previously, when --outcome_columns was not explicitly set, all columns that
  were not the id, weight, or a covariate were automatically classified as
  outcome columns. Now those columns are placed into ignore_columns instead.
- Columns that are explicitly mentioned — the id column, weight column,
  covariate columns, and outcome columns — are not ignored.

ASCII comparative histogram and plot improvements
- Added ascii_comparative_hist for comparing multiple distributions against a
  baseline using inline visual indicators (█, ▒, ▐, ░).
- Comparative ASCII plots now order datasets as population → adjusted → sample.
- ascii_plot_dist accepts a new comparative keyword (default True) to
  toggle between comparative and grouped-bar histograms for numeric variables.

Moved dataset loading implementations out of balance.datasets.__init__
- Refactored load_sim_data, load_cbps_data, and load_data into
  balance.datasets.loading_data and re-exported them from
  balance.datasets to preserve the public API while keeping module
  responsibilities focused.

ASCII plot documentation and tutorial examples
- Added rendered text-plot examples to ASCII plot docstrings and documented
  library="balance" support. Updated balance_quickstart.ipynb with
  adjusted vs unadjusted ASCII plot examples.
Improved keep_columns documentation
- Updated docstrings for has_keep_columns(), keep_columns(), and the
  --keep_columns argument to clarify that keep columns control which columns
  appear in the final output CSV. Keep columns that are not id, weight,
  covariate, or outcome columns will be placed into ignore_columns during
  processing but are still retained and available in the output.
Clarified _prepare_input_model_matrix argument docs
- Updated docstrings in balance.utils.model_matrix with
  explicit descriptions for sample, target, variables, and add_na
  behavior when preparing model-matrix inputs.

Weight diagnostics now consistently accept DataFrame inputs
- design_effect, nonparametric_skew, prop_above_and_below, and
  weighted_median_breakdown_point now explicitly normalize DataFrame inputs
  to their first column before computation, matching validation behavior and
  returning scalar/Series outputs consistently.
Model-matrix robustness improvements
- _make_df_column_names_unique() now avoids suffix collisions when columns
  like a, a_1, and repeated a names appear together, renaming
  duplicates deterministically to prevent downstream clashes.
- _prepare_input_model_matrix() now raises a deterministic ValueError
  when the input sample has zero rows, instead of relying on an assertion.
Stabilized prop_above_and_below() return paths
- prop_above_and_below() now builds concatenated outputs only from present
  Series objects and returns None when both below and above are None,
  avoiding ambiguous concat inputs while preserving existing behavior for valid
  threshold sets.
Validated and normalized comma-separated CLI column arguments
- CLI column-list arguments now trim surrounding whitespace and reject empty
  entries (for example, "id,,weight") with clear ValueError messages,
  preventing malformed column specifications from silently propagating.
- Applied to --covariate_columns, --covariate_columns_for_diagnostics,
  --batch_columns, --keep_columns, and --outcome_columns parsing.

Added end-to-end adjustment test with ASCII plot output and expanded ASCII plot edge-case coverage
- TestAsciiPlotsAdjustmentEndToEnd runs the full adjustment pipeline and
  asserts exact expected ASCII output. Added tests for ascii_plot_dist with
  comparative=False and mixed categorical+numeric routing.
Expanded warning coverage for Sample.from_frame() ID inference
- Added assertions that validate all three expected warnings are emitted when inferring an id column and default weights, including ID guessing, ID string casting, and automatic weight creation.
Expanded IPW helper and diagnostics test coverage
- Added tests for link_transform() and calc_dev() to validate behavior
  for extreme probabilities and finite 10-fold deviance summaries.
- Refactored diagnostics tests to use a shared IPW setup helper, added
  edge-case assertions for solver/penalty values, NaN coercion of non-scalar
  inputs, and now assert labels match fitted model parameters.
Expanded prop_above_and_below() edge-case coverage
- Added focused tests for empty threshold iterables, mixed None threshold groups in dict mode, and explicit all-None threshold handling across return formats.
Added unit coverage for CLI I/O and empty-batch handling
- Added focused tests for BalanceCLI.process_batch() empty-sample failure payloads, load_and_check_input() CSV loading paths, and write_outputs() delimiter-aware output writing for both adjusted and diagnostics files.