github facebookresearch/balance 0.19.0
0.19.0 (2026-04-06)

5 hours ago

Highlights

This is a major architecture release. balance now has two new foundational
classes — SampleFrame and BalanceFrame — that cleanly separate data
representation from adjustment logic. The existing Sample API is fully backward
compatible
; Sample now inherits from both new classes
(Sample → BalanceFrame → SampleFrame) and all existing code continues to work
unchanged. The new classes can also be used directly for a more explicit,
composable workflow — see the new tutorial notebook for a complete walkthrough.

For full technical details — including ASCII diagrams of the class hierarchy,
internal structure, column classification, object lifecycle, linked-samples
expansion, data flow, and the Sample.__new__ guard — see
docs/architecture/architecture_0_19_0.md.

Breaking Changes

  • Removed Sample.design_effect() — use sample.weights().design_effect() instead.
    Deprecated since 0.18.0.
  • Removed Sample.design_effect_prop() — use sample.weights().design_effect_prop() instead.
    Deprecated since 0.18.0.
  • Removed Sample.plot_weight_density() — use sample.weights().plot() instead.
    Deprecated since 0.18.0.
  • Removed Sample.covar_means() — use sample.covars().mean() instead
    (with .rename(index={'self': 'adjusted'}).reindex(['unadjusted', 'adjusted', 'target']).T for the same format).
    Deprecated since 0.18.0.
  • Removed Sample.outcome_sd_prop() — use sample.outcomes().outcome_sd_prop() instead.
    Deprecated since 0.18.0.
  • Removed Sample.outcome_variance_ratio() — use sample.outcomes().outcome_variance_ratio() instead.
    Deprecated since 0.18.0.

New Features

  • Added SampleFrame — DataFrame container with explicit column-role metadata
    (sample_frame.py). Holds a DataFrame and tracks which columns are covariates,
    weights, outcomes, predicted outcomes, and ignored.

    • Factory methods: from_frame() (with explicit covar_columns parameter
      and auto-detection of id/weight columns) and from_csv().
    • DataFrame view properties: df_covars, df_outcomes, df_weights,
      df_ignored, id_column, weight_series.
    • Column-role list properties: covar_columns, weight_columns_all,
      outcome_columns, predicted_outcome_columns, ignored_columns.
    • Weight management: add_weight_column(), set_active_weight(),
      rename_weight_column(), set_weight_metadata() / weight_metadata() for
      provenance tracking.
    • Comprehensive input validation (null/negative/non-numeric weights, null IDs,
      duplicate IDs, overlapping column roles).
    • See
      docs/architecture/architecture_0_19_0.md
      for column classification rules, auto-detection logic, and internal structure.
  • Added BalanceFrame — adjustment orchestrator for survey weighting
    (balance_frame.py). Pairs a responder SampleFrame with a target SampleFrame
    for reweighting.

    • Public constructor: BalanceFrame(sample=..., target=...). Target is optional —
      BalanceFrame(sample=sf) creates a target-less instance.
    • Core API: set_target(), adjust(), covars() / weights() / outcomes()
      (BalanceDF views), summary(), diagnostics().
    • adjust(method="ipw") returns a new BalanceFrame (immutable pattern) with
      adjusted weights. Supports "ipw", "cbps", "rake", "poststratify",
      "null", and custom callables.
    • Convenience properties: df (responder-only DataFrame, mirrors Sample.df),
      df_all (combined responder + target + unadjusted with "source" column),
      has_target, is_adjusted, model.
    • Covariate overlap validation at construction and set_target().
    • to_csv(), to_download(), keep_only_some_rows_columns() for export
      and filtering.
    • See
      docs/architecture/architecture_0_19_0.md
      for property delegation, adjustment flow, and linked-samples expansion.
  • Compound/sequential adjustmentsadjust() can now be called multiple
    times on the same object. Each call uses the current (previously adjusted)
    weights as design weights, compounding adjustments. For example, run IPW first
    to correct broad imbalances, then rake on a specific variable for fine-tuning.
    The active weight column always keeps its original name (e.g., "weight");
    the original unadjusted baseline is always preserved for diagnostics
    (asmd_improvement() shows total improvement across all steps). See
    docs/architecture/architecture_0_19_0.md
    for the weight history tracking mechanism (weight_pre_adjust,
    weight_adjusted_N columns).

  • Sample refactored to inherit from SampleFrame and BalanceFrame
    Sample is now a thin facade (~242 lines) via multiple inheritance
    (Sample → BalanceFrame → SampleFrame). All adjustment, diagnostics, and
    data-access logic lives in the base classes. No public API changes — all
    existing Sample methods continue to work identically.

  • Sample.is_adjusted is now a @property returning _CallableBool — works
    both as sample.is_adjusted (property) and sample.is_adjusted() (legacy
    method call). _CallableBool also supports arithmetic via __mul__/__rmul__.

  • Added bidirectional conversion between Sample, SampleFrame, and BalanceFrame

    • SampleFrame.from_sample(sample) / Sample.to_sample_frame(): convert a
      Sample to a SampleFrame with proper column-role mapping.
    • BalanceFrame.from_sample(sample) / Sample.to_balance_frame(): convert a
      Sample (with target) to a BalanceFrame, preserving adjustment state.
    • BalanceFrame.to_sample(): convert a BalanceFrame back to a Sample.
  • Added formula support to Sample.covars() for downstream diagnostics

    • Sample.covars() now accepts a formula argument and stores it on the
      returned BalanceDFCovars object.
    • BalanceDFCovars.kld() now honors formula-driven model matrices (including
      interactions such as "age_group * gender") when a formula is provided via
      covars(formula=...).
    • Formula settings are now propagated to linked covariate views (target,
      unadjusted) so comparative diagnostics run on consistent design matrices.

Code Quality & Refactoring

  • Defined BalanceDFSource protocol and decoupled BalanceDF from Sample
    — Added BalanceDFSource runtime-checkable protocol that both Sample and
    SampleFrame satisfy, enabling BalanceDF to work with either. Removed the
    hard from balance.sample_class import Sample import from balancedf_class.py.
    See docs/architecture/architecture_0_19_0.md
    for protocol members and satisfaction diagram.

  • Extracted _build_summary() and _build_diagnostics() into summary_utils.py
    — Standalone functions accepting plain DataFrames/Series, enabling code reuse
    across Sample and BalanceFrame without circular imports.

  • BalanceDF.__init__(): added optional links parameter for explicit link
    injection, allowing BalanceDF to work with sources that do not carry mutable
    _links (e.g. SampleFrame).

  • Typing modernizationDictdict, Tupletuple, Listlist
    in annotations. cast() replaced with _assert_type() where applicable.
    Internal naming standardized across BalanceDF helpers (snake_case convention).

Tutorials

  • Added balance_quickstart_new_api.ipynb — end-to-end tutorial demonstrating the
    new SampleFrame/BalanceFrame API. Mirrors the original balance_quickstart.ipynb
    step-by-step but uses only the new classes (no Sample). Covers: loading data,
    creating SampleFrames, building a BalanceFrame, adjusting (IPW + CBPS), inspecting
    diagnostics (summary, ASMD, covariate means, design effect), visualization
    (plotly, seaborn KDE, ASCII plots), outcome analysis, transformations, compound
    adjustments, filtering rows/columns, and exporting to CSV.

Documentation

  • Added ARCHITECTURE.md — new top-level architecture document covering the class hierarchy,
    5-step workflow, key classes, weighting methods, supporting modules, and file layout.
    CLAUDE.md now links to this file instead of duplicating architecture content.
  • Added docs/architecture/architecture_0_19_0.md — detailed technical record of the 0.19.0
    architecture with ASCII diagrams covering: class hierarchy before/after, SampleFrame internals,
    BalanceFrame internal structure, object lifecycle state transitions, BalanceDF linked-samples
    expansion, BalanceDFSource protocol, BalanceDF class hierarchy, data flow, _links graph, and
    Sample.new guard.
  • Updated README.md — added "Developer and AI assistant resources" section linking to
    ARCHITECTURE.md and CLAUDE.md.

LLM/GenAI

  • Updated CLAUDE.md project context files for Claude Code users, covering architecture,
    build/test instructions (Meta and open-source), code conventions, and pre-submit checklist.
  • Updated .github/copilot-instructions.md review checklist to add missing conventions
    (MIT license header, from __future__ import annotations, factory pattern, seed fixing,
    deprecation style).

Tests

  • Added comprehensive test suites for the new classes:
    • test_balance_frame.py: ~120 tests covering construction, adjustment (all
      methods), covars/weights/outcomes integration, summary/diagnostics, analytics,
      df/export/filter, missing data, end-to-end equivalence with Sample API,
      conversion from/to Sample.
    • test_sample_frame.py: protocol conformance, weight/id/covar/outcome access,
      set_weights, BalanceDF construction, Sample-to-SampleFrame conversion.
    • test_sample.py: internal SampleFrame backing, _CallableBool, conversion
      methods, is_adjusted property/callable.
    • test_balancedf.py: BalanceDFSource protocol conformance, mock sources,
      regression tests for existing Sample API.
    • test_sample_diagnostics_helper.py: verifies _build_summary() and
      _build_diagnostics() produce identical output to Sample methods.

Contributors

@talgalili, @sahil350 ,@neuralsorcerer

Full Changelog

0.18.0...0.19.0

Don't miss a new balance release

NewReleases is sending notifications on new releases.