github vega/altair v5.1.0
Version 5.1.0

latest releases: v5.3.0, v5.2.0, v5.1.2...
10 months ago

What's Changed

Enhancements

  1. The chart.transformed_data() method was added to extract transformed chart data

For example when having an Altair chart including aggregations:

import altair as alt
from vega_datasets import data

cars = data.cars.url
chart = alt.Chart(cars).mark_bar().encode(
    y='Cylinders:O',
    x='mean_acc:Q'
).transform_aggregate(
    mean_acc='mean(Acceleration)',
    groupby=["Cylinders"]
)
chart

image
Its now possible to call the chart.transformed_data method to extract a pandas DataFrame containing the transformed data.

chart.transformed_data()

image
This method is dependent on VegaFusion with the embed extras enabled.


  1. Introduction of a new data transformer named vegafusion

VegaFusion is an external project that provides efficient Rust implementations of most of Altair's data transformations. Using VegaFusion as Data Transformer it can overcome the Altair MaxRowsError by performing data-intensive aggregations in Python and pruning unused columns from the source dataset.

The data transformer can be enabled as such:

import altair as alt
alt.data_transformers.enable("vegafusion") # default is "default"
DataTransformerRegistry.enable('vegafusion')

And one can now visualize a very large DataFrame as histogram where the binning is done within VegaFusion:

import pandas as pd
import altair as alt

# prepare dataframe with 1 million rows
flights = pd.read_parquet(
    "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet"
)

delay_hist = alt.Chart(flights).mark_bar(tooltip=True).encode(
    alt.X("delay", bin=alt.Bin(maxbins=30)),
    alt.Y("count()")
)
delay_hist

image
When the vegafusion data transformer is active, data transformations will be pre-evaluated when displaying, saving and converting charts as dictionary or JSON.

See a detailed overview on the VegaFusion Data Transformer in the documentation.


  1. A JupyterChart class was added to support accessing params and selections from Python

The JupyterChart class makes it possible to update charts after they have been displayed and access the state of interactions from Python.

For example when having an Altair chart including a selection interval as brush:

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_interval(name="interval", value={"x": [80, 160], "y": [15, 30]})

chart = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Cylinders:O', alt.value('grey')),
).add_params(brush)

jchart = alt.JupyterChart(chart)
jchart

image
It is now possible to return the defined interval selection within Python using the JupyterChart

jchart.selections.interval.value
{'Horsepower': [80, 160], 'Miles_per_Gallon': [15, 30]}

The selection dictionary may be converted into a pandas query to filter the source DataFrame:

filter = " and ".join([
    f"{v[0]} <= `{k}` <= {v[1]}"
    for k, v in jchart.selections.interval.value.items()
])
source.query(filter)

image
Another possibility of the new JupyerChart class is to use IPyWidgets to control parameters in Altair. Here we use an ipywidget IntSlider to control the Altair parameter named cutoff.

import pandas as pd
import numpy as np
from ipywidgets import IntSlider, link, VBox

rand = np.random.RandomState(42)

df = pd.DataFrame({
    'xval': range(100),
    'yval': rand.randn(100).cumsum()
})

cutoff = alt.param(name="cutoff", value=23)

chart = alt.Chart(df).mark_point().encode(
    x='xval',
    y='yval',
    color=alt.condition(
        alt.datum.xval < cutoff,
        alt.value('red'), alt.value('blue')
    )
).add_params(
    cutoff
)
jchart = alt.JupyterChart(chart)

slider = IntSlider(min=0, max=100, description='ipywidget')
link((slider, "value"), (jchart.params, "cutoff"))

VBox([slider, jchart])

image
The JupyterChart class is dependent on AnyWidget. See a detailed overview in the new documentation page on JupyterChart Interactivity.


  1. Support for field encoding inference for objects that support the DataFrame Interchange Protocol

We are maturing support for objects build upon the DataFrame Interchange Protocol in Altair.
Given the following pandas DataFrame with an ordered categorical column-type:

import altair as alt
from vega_datasets import data

# Clean Title column
movies = data.movies()
movies["Title"] = movies["Title"].astype(str)

# Convert MPAA rating to an ordered categorical
rating = movies["MPAA_Rating"].astype("category")
rating = rating.cat.reorder_categories(
    ['Open', 'G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated']
).cat.as_ordered()
movies["MPAA_Rating"] = rating

# Build chart using pandas
chart = alt.Chart(movies).mark_bar().encode(
    alt.X("MPAA_Rating"),
    alt.Y("count()")
)
chart

image
We can convert the DataFrame to a PyArrow Table and observe that the types are now equally infered when rendering the chart.

import pyarrow as pa

# Build chart using PyArrow
chart = alt.Chart(pa.Table.from_pandas(movies)).mark_bar().encode(
    alt.X("MPAA_Rating"),
    alt.Y("count()")
)
chart

image
Vega-Altair support of the DataFrame Interchange Protocol is dependent on PyArrow.


  1. A new transform method transform_extent is available

See the following example how this transform can be used:

import pandas as pd
import altair as alt

df = pd.DataFrame(
    [
        {"a": "A", "b": 28},
        {"a": "B", "b": 55},
        {"a": "C", "b": 43},
        {"a": "D", "b": 91},
        {"a": "E", "b": 81},
        {"a": "F", "b": 53},
        {"a": "G", "b": 19},
        {"a": "H", "b": 87},
        {"a": "I", "b": 52},
    ]
)

base = alt.Chart(df, title="A Simple Bar Chart with Lines at Extents").transform_extent(
    extent="b", param="b_extent"
)
bars = base.mark_bar().encode(x="b", y="a")
lower_extent_rule = base.mark_rule(stroke="firebrick").encode(
    x=alt.value(alt.expr("scale('x', b_extent[0])"))
)
upper_extent_rule = base.mark_rule(stroke="firebrick").encode(
    x=alt.value(alt.expr("scale('x', b_extent[1])"))
)
bars + lower_extent_rule + upper_extent_rule

image


  1. It is now possible to add configurable pixels-per-inch (ppi) metadata to saved and displayed PNG images
import altair as alt
from vega_datasets import data

source = data.cars()

chart = alt.Chart(source).mark_boxplot(extent="min-max").encode(
    alt.X("Miles_per_Gallon:Q").scale(zero=False),
    alt.Y("Origin:N"),
)
chart.save("box.png", ppi=300)

image

alt.renderers.enable("png", ppi=144) # default ppi is 72
chart

image

Bug Fixes

  • Don't call len on DataFrame Interchange Protocol objects (#3111)

Maintenance

  • Add support for new referencing logic in version 4.18 of the jsonschema package

Backward-Incompatible Changes

  • Drop support for Python 3.7 which is end-of-life (#3100)
  • Hard dependencies: Increase minimum required pandas version to 0.25 (#3130)
  • Soft dependencies: Increase minimum required vl-convert-python version to 0.13.0 and increase minimum required vegafusion version to 1.4.0 (#3163, #3160)

Release Notes by Pull Request

  • Explicitly specify arguments for to_dict and to_json methods for top-level chart objects by @binste in #3073
  • Add Vega-Lite to Vega compiler registry and format arg to to_dict() and to_json() by @jonmmease in #3071
  • Sanitize timestamps in arrow tables by @jonmmease in #3076
  • Fix ridgeline example by @binste in #3082
  • Support extracting transformed chart data using VegaFusion by @jonmmease in #3081
  • Improve troubleshooting docs regarding Vega-Lite 5 by @binste in #3074
  • Make transformed_data public and add initial docs by @jonmmease in #3084
  • MAINT: Gitignore venv folders and use gitignore for black by @binste in #3087
  • Fixed Wheat and Wages case study by @thomend in #3086
  • Type hints: Parts of folders "vegalite", "v5", and "utils" by @binste in #2976
  • Fix CI by @jonmmease in #3095
  • Add VegaFusion data transformer with mime renderer, save, and to_dict/to_json integration by @jonmmease in #3094
  • Unpin vl-convert-python in dev/ci dependencies by @jonmmease in #3099
  • Drop support for Python 3.7 which is end-of-life by @binste in #3100
  • Add support to transformed_data for reconstructed charts (with from_dict/from_json) by @binste in #3102
  • Add VegaFusion data transformer documentation by @jonmmease in #3107
  • Don't call len on DataFrame interchange protocol object by @jonmmease in #3111
  • copied percentage calculation in example by @thomend in #3116
  • Distributions and medians of likert scale ratings by @thomend in #3120
  • Support for type inference for DataFrames using the DataFrame Interchange Protocol by @jonmmease in #3114
  • Add some 5.1.0 release note entries by @jonmmease in #3123
  • Add a code of conduct by @joelostblom in #3124
  • master -> main by @jonmmease in #3126
  • Handle pyarrow-backed columns in pandas 2 DataFrames by @jonmmease in #3128
  • Fix accidental requirement of Pandas 1.5. Bump minimum Pandas version to 0.25. Run tests with it by @binste in #3130
  • Add Roadmap and CoC to the documentation by @jonmmease in #3129
  • MAINT: Use importlib.metadata and packaging instead of deprecated pkg_resources by @binste in #3133
  • Add online JupyterChart widget based on AnyWidget by @jonmmease in #3119
  • feat(widget): prefer lodash-es/debounce to reduce import size by @manzt in #3135
  • Fix contributing descriptions by @thomend in #3121
  • Implement governance structure based on GitHub's MVG by @binste in #3139
  • Type hint schemapi.py by @binste in #3142
  • Add JupyterChart section to Users Guide by @jonmmease in #3137
  • Add governance page to the website by @jonmmease in #3144
  • MAINT: Remove altair viewer as a development dependency by @binste in #3147
  • Add support for new referencing resolution in jsonschema>=4.18 by @binste in #3118
  • Update Vega-Lite to 5.14.1. Add transform_extent by @binste in #3148
  • MAINT: Fix type hint errors which came up with new pandas-stubs release by @binste in #3154
  • JupyterChart: Add support for params defined in the extent transform by @jonmmease in #3151
  • doc: Add tooltip to Line example with custom order by @NickCrews in #3155
  • docs: examples: add line plot with custom order by @NickCrews in #3156
  • docs: line: Improve prose on custom ordering by @NickCrews in #3158
  • docs: examples: remove connected_scatterplot by @NickCrews in #3159
  • Refactor optional import logic and verify minimum versions by @jonmmease in #3160
  • Governance: Mark @binste as committee chair by @binste in #3165
  • Add ppi argument for saving and displaying charts as PNG images by @jonmmease in #3163
  • Silence AnyWidget warning (and support hot-reload) in development mode by @jonmmease in #3166
  • Update roadmap.rst by @mattijn in #3167
  • Add return type to transform_extent by @binste in #3169
  • Use import_vl_convert in _spec_to_mimebundle_with_engine for better error message by @jonmmease in #3168
  • update example world projections by @mattijn in #3170
  • Send initial selections to Python in JupyterChart by @jonmmease in #3172

New Contributors

Full Changelog: v5.0.1...v5.1.0

Don't miss a new altair release

NewReleases is sending notifications on new releases.