Major features and improvements
- In a significant change, we have introduced
KedroSessionwhich is responsible for managing the lifecycle of a Kedro run.
- Created a new Kedro Starter:
kedro new --starter=mini-kedro. It is possible to use the DataCatalog as a standalone component in a Jupyter notebook and transition into the rest of the Kedro framework.
DatasetSpecswith Hooks to run before and after datasets are loaded from/saved to the catalog.
- Added a command:
kedro catalog create. For a registered pipeline, it creates a
<conf_root>/<env>/catalog/<pipeline_name>.ymlconfiguration file with
MemoryDataSetdatasets for each dataset that is missing from
.kedro.yml) for project configuration, in line with Python best practice.
ProjectContextis no longer needed, unless for very complex customisations.
settings.pytogether implement sensible default behaviour. As a result
context_pathis also now an optional key in
TemplatedConfigLoadernow supports Jinja2 template syntax alongside its original syntax.
- Made registration Hooks mandatory, as the only way to customise the
DataCatalogused in a project. If no such Hook is provided in
KedroContextErroris raised. There are sensible defaults defined in any project generated with Kedro >= 0.16.5.
Bug fixes and other changes
ParallelRunnerno longer results in a run failure, when triggered from a notebook, if the run is started using
before_node_runcan now overwrite node inputs by returning a dictionary with the corresponding updates.
- Added minimal, black-compatible flake8 configuration to the project template.
- Extra parameters are no longer incorrectly passed from
pysparkrequirements to allow for installation of
- Added a
--fs-argsoption to the
kedro pipeline pullcommand to specify configuration options for the
fsspecfilesystem arguments used when pulling modular pipelines from non-PyPI locations.
- Bumped maximum required
fsspecversion to 0.9.
- Bumped maximum supported
s3fsversion to 0.5 (
S3FileSysteminterface has changed since 0.4.1 version).
- In Kedro 0.17.0 we have deleted the deprecated
kedro.contextmodules in favour of
Other breaking changes to the API
Falsewhen the dataset does not exist, as opposed to raising an exception.
- The pipeline-specific
catalog.ymlfile is no longer automatically created for modular pipelines when running
kedro pipeline create. Use
kedro catalog createto replace this functionality.
kedro new. To generate boilerplate example code, you should use a Kedro starter.
- Changed the
--verboseflag from a global command to a project-specific command flag (e.g
kedro --verbose newbecomes
kedro new --verbose).
- Dropped support of the
dataset_credentialskey in credentials in
get_source_dir()was removed from
- Dropped support of
kedro new --starternow defaults to fetching the starter template matching the installed Kedro version.
cli.pyand moved it inside the Python package (
src/<package_name>/), for a better packaging and deployment experience.
.kedro.ymlfrom the project template and replaced it with
KEDRO_CONFIGSconstant (previously residing in
kedro pipeline createCLI command to add a boilerplate parameter config file in
conf/<env>/pipelines/<pipeline_name>/parameters.yml. CLI commands
kedro pipeline delete/
pullwere updated accordingly.
KedroContextconstructor now takes
package_nameas first argument.
- Custom context class is set via
KedroContext.hooksattribute. Instead, hooks should be registered in
- Restricted names given to nodes to match the regex pattern
KedroContext._create_data_catalog(). They have been replaced by registration hooks, namely
register_catalog()(see also upcoming deprecations).
Upcoming deprecations for Kedro 0.18.0
kedro.framework.context.load_contextwill be removed in release 0.18.0.
kedro.framework.cli.get_project_contextwill be removed in release 0.18.0.
- We've added a
DeprecationWarningto the decorator API for both
pipeline. These will be removed in release 0.18.0. Use Hooks to extend a node's behaviour instead.
- We've added a
DeprecationWarningto the Transformers API when adding a transformer to the catalog. These will be removed in release 0.18.0. Use Hooks to customise the
Thanks for supporting contributions
Migration guide from Kedro 0.16.* to 0.17.*
Reminder: Our documentation on how to upgrade Kedro covers a few key things to remember when updating any Kedro version.
The Kedro 0.17.0 release contains some breaking changes. If you update Kedro to 0.17.0 and then try to work with projects created against earlier versions of Kedro, you may encounter some issues when trying to run
kedro commands in the terminal for that project. Here's a short guide to getting your projects running against the new version of Kedro.
Note: As always, if you hit any problems, please check out our documentation:
To get an existing Kedro project to work after you upgrade to Kedro 0.17.0, we recommend that you create a new project against Kedro 0.17.0 and move the code from your existing project into it. Let's go through the changes, but first, note that if you create a new Kedro project with Kedro 0.17.0 you will not be asked whether you want to include the boilerplate code for the Iris dataset example. We've removed this option (you should now use a Kedro starter if you want to create a project that is pre-populated with code).
To create a new, blank Kedro 0.17.0 project to drop your existing code into, you can create one, as always, with
kedro new. We also recommend creating a new virtual environment for your new project, or you might run into conflicts with existing dependencies.
pyproject.toml: Copy the following three keys from the
.kedro.ymlof your existing Kedro project into the
pyproject.tomlfile of your new Kedro 0.17.0 project:
[tools.kedro] package_name = "<package_name>" project_name = "<project_name>" project_version = "0.17.0"
Check your source directory. If you defined a different source directory (
source_dir), make sure you also move that to
Copy files from your existing project:
- Copy subfolders of
project/src/project_name/pipelinesfrom existing to new project
- Copy subfolders of
project/src/test/pipelinesfrom existing to new project
- Copy the requirements your project needs into
- Copy your project configuration from the
conffolder. Take note of the new locations needed for modular pipeline configuration (move it from
conf/<env>/catalog/pipeline_name.ymland likewise for
- Copy from the
data/folder of your existing project, if needed, into the same location in your new project.
- Copy any Hooks from
- Copy subfolders of
Update your new project's README and docs as necessary.
settings.py: For example, if you specified additional Hook implementations in
hooks, or listed plugins under
.kedro.yml, you will need to move them to
from <package_name>.hooks import MyCustomHooks, ProjectHooks HOOKS = (ProjectHooks(), MyCustomHooks()) DISABLE_HOOKS_FOR_PLUGINS = ("my_plugin1",)
nodenames. From 0.17.0 the only allowed characters for node names are letters, digits, hyphens, underscores and/or fullstops. If you have previously defined node names that have special characters, spaces or other characters that are no longer permitted, you will need to rename those nodes.
Copy changes to
kedro_cli.py. If you previously customised the
kedro runcommand or added more CLI commands to your
kedro_cli.py, you should move them into
<project_root>/src/<package_name>/cli.py. Note, however, that the new way to run a Kedro pipeline is via a
KedroSession, rather than using the
with KedroSession.create(package_name=...) as session: session.run()
Copy changes made to
ConfigLoader. If you have defined a custom class, such as
TemplatedConfigLoader, by overriding
ProjectContext._create_config_loader, you should move the contents of the function in
Copy changes made to
DataCatalog. Likewise, if you have
ProjectContext._create_catalog, you should copy-paste the contents into
Optional: If you have plugins such as Kedro-Viz installed, it's likely that Kedro 0.17.0 won't work with their older versions, so please either upgrade to the plugin's newest version or follow their migration guides.