New features
-
New
filter_out()companion tofilter().-
Use
filter()when specifying rows to keep. -
Use
filter_out()when specifying rows to drop.
filter_out()simplifies cases where you would have previously used afilter()to drop rows. It is particularly useful when missing values are involved. For example, to drop rows where thecountis zero:df |> filter(count != 0 | is.na(count)) df |> filter_out(count == 0)
With
filter(), you must provide a "negative" condition of!= 0and must explicitly guard against accidentally dropping rows withNA. Withfilter_out(), you directly specify rows to drop and you don't have to guard against dropping rows withNA, which tends to result in much clearer code.This work is a result of Tidyup 8: Expanding the
filter()family, with a lot of great feedback from the community (#6560, #6891). -
-
New
when_any()andwhen_all(), which are elementwise versions ofany()andall(). Alternatively, you can think of them as performing repeated|and&on any number of inputs, for example:-
when_any(x, y, z)is equivalent tox | y | z. -
when_all(x, y, z)is equivalent tox & y & z.
when_any()is particularly useful withinfilter()andfilter_out()to specify comma separated conditions combined with|rather than&, like:# With `|` countries |> filter( (name %in% c("US", "CA") & between(score, 200, 300)) | (name %in% c("PR", "RU") & between(score, 100, 200)) ) # With `when_any()`, you drop the explicit `|`, the extra `()`, and your # conditions are all indented to the same level countries |> filter(when_any( name %in% c("US", "CA") & between(score, 200, 300), name %in% c("PR", "RU") & between(score, 100, 200) )) # To drop these rows instead, use `filter_out()` countries |> filter_out(when_any( name %in% c("US", "CA") & between(score, 200, 300), name %in% c("PR", "RU") & between(score, 100, 200) ))
This work is a result of Tidyup 8: Expanding the
filter()family. -
-
case_when()is now part of a family of 4 related functions, 3 of which are new:- Use
case_when()to create a new vector based on logical conditions. - Use
replace_when()to update an existing vector based on logical conditions. - Use
recode_values()to create a new vector by mapping all old values to new values. - Use
replace_values()to update an existing vector by mapping some old values to new values.
Learn all about these in a new vignette,
vignette("recoding-replacing").replace_when()is particularly useful for conditionally mutating rows within one or more columns, and can be thought of as an enhanced version ofbase::replace().recode_values()andreplace_values()have the familiarcase_when()-style formula interface for easy interactive use, but also havefromandtoarguments as a way for you to incorporate a pre-built lookup table, making them more holistic replacements for bothcase_match()andrecode().This work is a result of Tidyup 7: Recoding and replacing values in the tidyverse, with a lot of great feedback from the community (#7728, #7729).
- Use
-
case_when()has gained a new.unmatchedargument. For extra safety, set.unmatched = "error"rather than providing a.defaultwhen you believe that you've handled every possible case, and it will error if a case is left unhandled. The newrecode_values()also has this argument (#7653). -
if_else(),case_when(), andcoalesce()have gotten significantly faster and use much less memory due to a rewrite in C via vctrs (#7723, #7725, #7727). -
New
ptypeargument forbetween(), allowing users to specify the desired output type. This is particularly useful for ordered factors and other complex types where the default common type behavior might not be ideal (#6906, @JamesHWade). -
New
rbind()method forrowwise_dfto avoid creating corrupt rowwise data frames (r-lib/vctrs#1935).
Lifecycle changes
Newly stable
-
.byhas moved from experimental to stable (#7762). -
reframe()has moved from experimental to stable (#7713, @VisruthSK).
Newly breaking
if_else()no longer allowsconditionto be a logical array. It must be a logical vector with nodimattribute (#7723).
Newly deprecated
-
case_match()is soft-deprecated, and is fully replaced byrecode_values()andreplace_values(), which are more flexible, more powerful, and have much better names. -
In
case_when(), supplying all size 1 LHS inputs along with a size >1 RHS input is now soft-deprecated. This is an improper usage ofcase_when()that should instead be a series of if statements, like:# Scalars! code <- 1L flavor <- "vanilla" # Improper usage: case_when( code == 1L && flavor == "chocolate" ~ x, code == 1L && flavor == "vanilla" ~ y, code == 2L && flavor == "vanilla" ~ z, .default = default ) # Recommended: if (code == 1L && flavor == "chocolate") { x } else if (code == 1L && flavor == "vanilla") { y } else if (code == 2L && flavor == "vanilla") { z } else { default }
The recycling behavior that allows this style of
case_when()to work is unsafe, and can result in silent bugs that we'd like to guard against with an error in the future (#7082). -
The
dplyr.legacy_localeglobal option is soft-deprecated. If you used this to affect the ordering ofarrange(), usearrange(.locale =)instead. If you used this to affect the ordering ofgroup_by() |> summarise(), follow up with an additional call toarrange(.locale =)instead (#7760). -
Passing
sizetoif_else()is now deprecated. The output size is always taken from thecondition(#7722).
Other deprecation advancements
-
The following were already deprecated, and are now defunct and throw an error:
-
All underscored standard evaluation versions of major dplyr verbs. Deprecated in 0.7.0 (Jun 2017), use the non-underscored version of the verb with unquoting instead, see
vignette("programming"). This includes:add_count_()add_tally_()arrange_()count_()distinct_()do_()filter_()funs_()group_by_()group_indices_()mutate_()tally_()transmute_()rename_()select_()slice_()summarise_()summarize_()
-
mutate_each(),mutate_each_(),summarise_each(), andsummarise_each_(). Deprecated in 0.7.0 (Jun 2017), useacross()instead. -
Returning more or less than 1 row per group in
summarise(). Deprecated in 1.1.0 (Jan 2023), usereframe()instead. -
combine(). Deprecated in 1.0.0 (May 2020), usec()orvctrs::vec_c()instead. -
src_mysql(),src_postgres(),src_sqlite(),src_local(), andsrc_df(). Deprecated in 1.0.0 (May 2020), usetbl()instead. -
tbl_df()andas.tbl(). Deprecated in 1.0.0 (May 2020), usetibble::as_tibble()instead. -
add_rownames(). Deprecated in 1.0.0 (May 2020), usetibble::rownames_to_column()instead. -
The
.dropargument ofadd_count(). Deprecated in 1.0.0 (May 2020), had no effect. -
The
addargument ofgroup_by()andgroup_by_prepare(). Deprecated in 1.0.0 (May 2020), use.addinstead. -
The
.dotsargument ofgroup_by()andgroup_by_prepare(). Deprecated in 1.0.0 (May 2020). -
The
...argument ofgroup_keys()andgroup_indices(). Deprecated in 1.0.0 (May 2020), usegroup_by()first. -
The
keepargument ofgroup_map(),group_modify(), andgroup_split(). Deprecated in 1.0.0 (May 2020), use.keepinstead. -
Using
across()and data frames infilter(). Deprecated in 1.0.8 (Feb 2022), useif_any()orif_all()instead. -
multiple = NULLin joins. Deprecated in 1.1.1 (Mar 2023), usemultiple = "all"instead. -
multiple = "error" / "warning"in joins. Deprecated in 1.1.1 (Mar 2023), userelationship = "many-to-one"instead. -
The
varsargument ofgroup_cols(). Deprecated in 1.0.0 (Jan 2023).
-
-
The following were already deprecated, and now warn unconditionally if used:
-
all_equal(). Deprecated in 1.1.0 (Jan 2023), useall.equal()instead. -
progress_estimated(). Deprecated in 1.0.0 (May 2020). -
filter()with a 1 column matrix. Deprecated in 1.1.0 (Jan 2023), use a vector instead. -
slice()with a 1 column matrix. Deprecated in 1.1.0 (Jan 2023), use a vector instead. -
Not supplying the
.colsargument ofacross(). Deprecated in 1.1.0 (Jan 2023). -
group_indices()with no arguments. Deprecated in 1.0.0 (May 2020), usecur_group_id()instead.
-
-
The following were already soft-deprecated, and now warn once per session if used:
-
cur_data()andcur_data_all(). Deprecated in 1.1.0 (Jan 2023), usepick()instead. -
The
...argument ofacross(). Deprecated in 1.1.0 (Jan 2023), use an anonymous function instead. -
Using
by = character()to perform a cross join. Deprecated in 1.1.0 (Jan 2023), usecross_join()instead.
-
Removed
The following were already defunct, and have been removed:
-
id(). Deprecated in 0.5.0 (Jun 2016), usevctrs::vec_group_id()instead. If your package uses NSE and implicitly relied on the variableidbeing available, you now need to pututils::globalVariables("id")inside one of your package files to tell R thatidis a column name. -
failwith(). Deprecated in 0.7.0 (Jun 2017), usepurrr::possibly()instead. -
select_vars()andselect_vars_(). Deprecated in 0.8.4 (Jan 2020), usetidyselect::vars_select()instead. -
rename_vars()andrename_vars_(). Deprecated in 0.8.4 (Jan 2020), usetidyselect::vars_rename()instead. -
select_var(). Deprecated in 0.8.4 (Jan 2020), usetidyselect::vars_pull()instead. -
current_vars(). Deprecated in 0.8.4 (Jan 2020), usetidyselect::peek_vars()instead. -
bench_tbls(),compare_tbls(),compare_tbls2(),eval_tbls(), andeval_tbls2(). Deprecated in 1.0.0 (May 2020). -
location()andchanges(). Deprecated in 1.0.0 (May 2020), uselobstr::ref()instead.
Minor improvements and bug fixes
-
The base pipe is now used throughout the documentation (#7711).
-
The superseded
recode()now has updated documentation showing how to migrate torecode_values()andreplace_values(). -
The
.groupsmessage emitted bysummarise()is hopefully more clear now (#6986). -
stormshas been updated to include 2023 and 2024 data (#7111, @tomalrussell). -
if_any()andif_all()are now more consistent in all use cases (#7059, #7077, #7746, @jrwinget). In particular:-
When called with zero inputs,
if_any()returnsFALSEandif_all()returnsTRUE. -
When called with one input, both now return logical vectors rather than the original column.
-
The result of applying
.fnsnow must be a logical vector.
-
-
tally_n()creates fully qualified funciton calls for duckplyr compatibility (#7046) -
Empty
rowwise()list-column elements now resolve tological()rather than a random logical of length 1 (#7710). -
last_dplyr_warnings()no longer prevents objects from being garbage collected (#7649). -
case_when()now throws correctly indexed errors whenNULLs are supplied in...(#7739). -
case_when()now throws a better error if one of the conditions is an array (#6862, @ilovemane). -
bind_rows()now replaces empty (orNA) element names in a list with its numeric index while preserving existing names (#7719, @Meghansaha). -
New
slice_sample()example showing how to use it to shuffle rows (#7707, @Hzanib). -
Updated
across()examples to include an example usingeverything()(#7621, @JBrandenburg02). -
Clarified how
slice_min()andslice_max()work in the introduction vignette (#7717, @ccani007). -
Fixed an edge case when coercing data frames to matrices (#7004).
-
Fixed an issue where duckplyr's ALTREP data frames were being materialized early due to internal usage of
ncol()(#7049). -
Progress towards making dplyr conformant with the public C API of R (#7741, #7797).
-
R >=4.1.0 is now required, in line with the tidyverse standard of supporting the previous 5 minor releases of R (#7711).