github facebookresearch/balance 0.17.0
0.17.0 (2026-03-17)

7 hours ago

Breaking Changes

  • CLI: unmentioned columns now go to ignore_columns instead of outcome_columns
    • Previously, when --outcome_columns was not explicitly set, all columns that
      were not the id, weight, or a covariate were automatically classified as
      outcome columns. Now those columns are placed into ignore_columns instead.
    • Columns that are explicitly mentioned — the id column, weight column,
      covariate columns, and outcome columns — are not ignored.

New Features

  • ASCII comparative histogram and plot improvements
    • Added ascii_comparative_hist for comparing multiple distributions against a
      baseline using inline visual indicators (, , , ).
    • Comparative ASCII plots now order datasets as population → adjusted → sample.
    • ascii_plot_dist accepts a new comparative keyword (default True) to
      toggle between comparative and grouped-bar histograms for numeric variables.

Code Quality & Refactoring

  • Moved dataset loading implementations out of balance.datasets.__init__
    • Refactored load_sim_data, load_cbps_data, and load_data into
      balance.datasets.loading_data and re-exported them from
      balance.datasets to preserve the public API while keeping module
      responsibilities focused.

Documentation

  • ASCII plot documentation and tutorial examples
    • Added rendered text-plot examples to ASCII plot docstrings and documented
      library="balance" support. Updated balance_quickstart.ipynb with
      adjusted vs unadjusted ASCII plot examples.
  • Improved keep_columns documentation
    • Updated docstrings for has_keep_columns(), keep_columns(), and the
      --keep_columns argument to clarify that keep columns control which columns
      appear in the final output CSV. Keep columns that are not id, weight,
      covariate, or outcome columns will be placed into ignore_columns during
      processing but are still retained and available in the output.
  • Clarified _prepare_input_model_matrix argument docs
    • Updated docstrings in balance.utils.model_matrix with
      explicit descriptions for sample, target, variables, and add_na
      behavior when preparing model-matrix inputs.

Bug Fixes

  • Weight diagnostics now consistently accept DataFrame inputs
    • design_effect, nonparametric_skew, prop_above_and_below, and
      weighted_median_breakdown_point now explicitly normalize DataFrame inputs
      to their first column before computation, matching validation behavior and
      returning scalar/Series outputs consistently.
  • Model-matrix robustness improvements
    • _make_df_column_names_unique() now avoids suffix collisions when columns
      like a, a_1, and repeated a names appear together, renaming
      duplicates deterministically to prevent downstream clashes.
    • _prepare_input_model_matrix() now raises a deterministic ValueError
      when the input sample has zero rows, instead of relying on an assertion.
  • Stabilized prop_above_and_below() return paths
    • prop_above_and_below() now builds concatenated outputs only from present
      Series objects and returns None when both below and above are None,
      avoiding ambiguous concat inputs while preserving existing behavior for valid
      threshold sets.
  • Validated and normalized comma-separated CLI column arguments
    • CLI column-list arguments now trim surrounding whitespace and reject empty
      entries (for example, "id,,weight") with clear ValueError messages,
      preventing malformed column specifications from silently propagating.
    • Applied to --covariate_columns, --covariate_columns_for_diagnostics,
      --batch_columns, --keep_columns, and --outcome_columns parsing.

Tests

  • Added end-to-end adjustment test with ASCII plot output and expanded ASCII plot edge-case coverage
    • TestAsciiPlotsAdjustmentEndToEnd runs the full adjustment pipeline and
      asserts exact expected ASCII output. Added tests for ascii_plot_dist with
      comparative=False and mixed categorical+numeric routing.
  • Expanded warning coverage for Sample.from_frame() ID inference
    • Added assertions that validate all three expected warnings are emitted when inferring an id column and default weights, including ID guessing, ID string casting, and automatic weight creation.
  • Expanded IPW helper and diagnostics test coverage
    • Added tests for link_transform() and calc_dev() to validate behavior
      for extreme probabilities and finite 10-fold deviance summaries.
    • Refactored diagnostics tests to use a shared IPW setup helper, added
      edge-case assertions for solver/penalty values, NaN coercion of non-scalar
      inputs, and now assert labels match fitted model parameters.
  • Expanded prop_above_and_below() edge-case coverage
    • Added focused tests for empty threshold iterables, mixed None threshold groups in dict mode, and explicit all-None threshold handling across return formats.
  • Added unit coverage for CLI I/O and empty-batch handling
    • Added focused tests for BalanceCLI.process_batch() empty-sample failure payloads, load_and_check_input() CSV loading paths, and write_outputs() delimiter-aware output writing for both adjusted and diagnostics files.

Contributors

@sahil350 , @neuralsorcerer, @talgalili

Full Changelog

0.16.0...0.17.0

Don't miss a new balance release

NewReleases is sending notifications on new releases.