Highlights
This is a major architecture release. balance now has two new foundational
classes — SampleFrame and BalanceFrame — that cleanly separate data
representation from adjustment logic. The existing Sample API is fully backward
compatible; Sample now inherits from both new classes
(Sample → BalanceFrame → SampleFrame) and all existing code continues to work
unchanged. The new classes can also be used directly for a more explicit,
composable workflow — see the new tutorial notebook for a complete walkthrough.
For full technical details — including ASCII diagrams of the class hierarchy,
internal structure, column classification, object lifecycle, linked-samples
expansion, data flow, and the Sample.__new__ guard — see
docs/architecture/architecture_0_19_0.md.
Breaking Changes
- Removed
Sample.design_effect()— usesample.weights().design_effect()instead.
Deprecated since 0.18.0. - Removed
Sample.design_effect_prop()— usesample.weights().design_effect_prop()instead.
Deprecated since 0.18.0. - Removed
Sample.plot_weight_density()— usesample.weights().plot()instead.
Deprecated since 0.18.0. - Removed
Sample.covar_means()— usesample.covars().mean()instead
(with.rename(index={'self': 'adjusted'}).reindex(['unadjusted', 'adjusted', 'target']).Tfor the same format).
Deprecated since 0.18.0. - Removed
Sample.outcome_sd_prop()— usesample.outcomes().outcome_sd_prop()instead.
Deprecated since 0.18.0. - Removed
Sample.outcome_variance_ratio()— usesample.outcomes().outcome_variance_ratio()instead.
Deprecated since 0.18.0.
New Features
-
Added
SampleFrame— DataFrame container with explicit column-role metadata
(sample_frame.py). Holds a DataFrame and tracks which columns are covariates,
weights, outcomes, predicted outcomes, and ignored.- Factory methods:
from_frame()(with explicitcovar_columnsparameter
and auto-detection of id/weight columns) andfrom_csv(). - DataFrame view properties:
df_covars,df_outcomes,df_weights,
df_ignored,id_column,weight_series. - Column-role list properties:
covar_columns,weight_columns_all,
outcome_columns,predicted_outcome_columns,ignored_columns. - Weight management:
add_weight_column(),set_active_weight(),
rename_weight_column(),set_weight_metadata()/weight_metadata()for
provenance tracking. - Comprehensive input validation (null/negative/non-numeric weights, null IDs,
duplicate IDs, overlapping column roles). - See
docs/architecture/architecture_0_19_0.md
for column classification rules, auto-detection logic, and internal structure.
- Factory methods:
-
Added
BalanceFrame— adjustment orchestrator for survey weighting
(balance_frame.py). Pairs a responderSampleFramewith a targetSampleFrame
for reweighting.- Public constructor:
BalanceFrame(sample=..., target=...). Target is optional —
BalanceFrame(sample=sf)creates a target-less instance. - Core API:
set_target(),adjust(),covars()/weights()/outcomes()
(BalanceDF views),summary(),diagnostics(). adjust(method="ipw")returns a new BalanceFrame (immutable pattern) with
adjusted weights. Supports"ipw","cbps","rake","poststratify",
"null", and custom callables.- Convenience properties:
df(responder-only DataFrame, mirrorsSample.df),
df_all(combined responder + target + unadjusted with"source"column),
has_target,is_adjusted,model. - Covariate overlap validation at construction and
set_target(). to_csv(),to_download(),keep_only_some_rows_columns()for export
and filtering.- See
docs/architecture/architecture_0_19_0.md
for property delegation, adjustment flow, and linked-samples expansion.
- Public constructor:
-
Compound/sequential adjustments —
adjust()can now be called multiple
times on the same object. Each call uses the current (previously adjusted)
weights as design weights, compounding adjustments. For example, run IPW first
to correct broad imbalances, then rake on a specific variable for fine-tuning.
The active weight column always keeps its original name (e.g.,"weight");
the original unadjusted baseline is always preserved for diagnostics
(asmd_improvement()shows total improvement across all steps). See
docs/architecture/architecture_0_19_0.md
for the weight history tracking mechanism (weight_pre_adjust,
weight_adjusted_Ncolumns). -
Samplerefactored to inherit fromSampleFrameandBalanceFrame—
Sampleis now a thin facade (~242 lines) via multiple inheritance
(Sample → BalanceFrame → SampleFrame). All adjustment, diagnostics, and
data-access logic lives in the base classes. No public API changes — all
existingSamplemethods continue to work identically. -
Sample.is_adjustedis now a@propertyreturning_CallableBool— works
both assample.is_adjusted(property) andsample.is_adjusted()(legacy
method call)._CallableBoolalso supports arithmetic via__mul__/__rmul__. -
Added bidirectional conversion between Sample, SampleFrame, and BalanceFrame
SampleFrame.from_sample(sample)/Sample.to_sample_frame(): convert a
Sample to a SampleFrame with proper column-role mapping.BalanceFrame.from_sample(sample)/Sample.to_balance_frame(): convert a
Sample (with target) to a BalanceFrame, preserving adjustment state.BalanceFrame.to_sample(): convert a BalanceFrame back to a Sample.
-
Added formula support to
Sample.covars()for downstream diagnosticsSample.covars()now accepts aformulaargument and stores it on the
returnedBalanceDFCovarsobject.BalanceDFCovars.kld()now honors formula-driven model matrices (including
interactions such as"age_group * gender") when a formula is provided via
covars(formula=...).- Formula settings are now propagated to linked covariate views (
target,
unadjusted) so comparative diagnostics run on consistent design matrices.
Code Quality & Refactoring
-
Defined
BalanceDFSourceprotocol and decoupledBalanceDFfromSample
— AddedBalanceDFSourceruntime-checkable protocol that bothSampleand
SampleFramesatisfy, enablingBalanceDFto work with either. Removed the
hardfrom balance.sample_class import Sampleimport frombalancedf_class.py.
Seedocs/architecture/architecture_0_19_0.md
for protocol members and satisfaction diagram. -
Extracted
_build_summary()and_build_diagnostics()intosummary_utils.py
— Standalone functions accepting plain DataFrames/Series, enabling code reuse
acrossSampleandBalanceFramewithout circular imports. -
BalanceDF.__init__(): added optionallinksparameter for explicit link
injection, allowing BalanceDF to work with sources that do not carry mutable
_links(e.g.SampleFrame). -
Typing modernization —
Dict→dict,Tuple→tuple,List→list
in annotations.cast()replaced with_assert_type()where applicable.
Internal naming standardized acrossBalanceDFhelpers (snake_case convention).
Tutorials
- Added
balance_quickstart_new_api.ipynb— end-to-end tutorial demonstrating the
new SampleFrame/BalanceFrame API. Mirrors the originalbalance_quickstart.ipynb
step-by-step but uses only the new classes (noSample). Covers: loading data,
creating SampleFrames, building a BalanceFrame, adjusting (IPW + CBPS), inspecting
diagnostics (summary, ASMD, covariate means, design effect), visualization
(plotly, seaborn KDE, ASCII plots), outcome analysis, transformations, compound
adjustments, filtering rows/columns, and exporting to CSV.
Documentation
- Added
ARCHITECTURE.md— new top-level architecture document covering the class hierarchy,
5-step workflow, key classes, weighting methods, supporting modules, and file layout.
CLAUDE.mdnow links to this file instead of duplicating architecture content. - Added
docs/architecture/architecture_0_19_0.md— detailed technical record of the 0.19.0
architecture with ASCII diagrams covering: class hierarchy before/after, SampleFrame internals,
BalanceFrame internal structure, object lifecycle state transitions, BalanceDF linked-samples
expansion, BalanceDFSource protocol, BalanceDF class hierarchy, data flow, _links graph, and
Sample.new guard. - Updated
README.md— added "Developer and AI assistant resources" section linking to
ARCHITECTURE.mdandCLAUDE.md.
LLM/GenAI
- Updated
CLAUDE.mdproject context files for Claude Code users, covering architecture,
build/test instructions (Meta and open-source), code conventions, and pre-submit checklist. - Updated
.github/copilot-instructions.mdreview checklist to add missing conventions
(MIT license header,from __future__ import annotations, factory pattern, seed fixing,
deprecation style).
Tests
- Added comprehensive test suites for the new classes:
test_balance_frame.py: ~120 tests covering construction, adjustment (all
methods), covars/weights/outcomes integration, summary/diagnostics, analytics,
df/export/filter, missing data, end-to-end equivalence with Sample API,
conversion from/to Sample.test_sample_frame.py: protocol conformance, weight/id/covar/outcome access,
set_weights, BalanceDF construction, Sample-to-SampleFrame conversion.test_sample.py: internal SampleFrame backing,_CallableBool, conversion
methods,is_adjustedproperty/callable.test_balancedf.py:BalanceDFSourceprotocol conformance, mock sources,
regression tests for existing Sample API.test_sample_diagnostics_helper.py: verifies_build_summary()and
_build_diagnostics()produce identical output toSamplemethods.
Contributors
@talgalili, @sahil350 ,@neuralsorcerer