Breaking changes for package developers
-
The major change in this version is that dplyr now depends on the selecting
backend of the tidyselect package. If you have been linking to
dplyr::select_helpers
documentation topic, you should update the link to
point totidyselect::select_helpers
. -
Another change that causes warnings in packages is that dplyr now exports the
exprs()
function. This causes a collision withBiobase::exprs()
. Either
import functions from dplyr selectively rather than in bulk, or do not import
Biobase::exprs()
and refer to it with a namespace qualifier.
Bug fixes
-
distinct(data, "string")
now returns a one-row data frame again. (The
previous behavior was to return the data unchanged.) -
do()
operations with more than one named argument can access.
(#2998). -
Reindexing grouped data frames (e.g. after
filter()
or..._join()
)
never updates the"class"
attribute. This also avoids unintended updates
to the original object (#3438). -
Fixed rare column name clash in
..._join()
with non-join
columns of the same name in both tables (#3266). -
Fix
ntile()
androw_number()
ordering to use the locale-dependent
ordering functions in R when dealing with character vectors, rather than
always using the C-locale ordering function in C (#2792, @foo-bar-baz-qux). -
Summaries of summaries (such as
summarise(b = sum(a), c = sum(b))
) are
now computed using standard evaluation for simplicity and correctness, but
slightly slower (#3233). -
Fixed
summarise()
for empty data frames with zero columns (#3071).
Major changes
-
enexpr()
,expr()
,exprs()
,sym()
andsyms()
are now
exported.sym()
andsyms()
construct symbols from strings or character
vectors. Theexpr()
variants are equivalent toquo()
,quos()
and
enquo()
but return simple expressions rather than quosures. They support
quasiquotation. -
dplyr now depends on the new tidyselect package to power
select()
,
rename()
,pull()
and their variants (#2896). Consequently
select_vars()
,select_var()
andrename_vars()
are
soft-deprecated and will start issuing warnings in a future version.Following the switch to tidyselect,
select()
andrename()
fully support
character vectors. You can now unquote variables like this:vars <- c("disp", "cyl") select(mtcars, !! vars) select(mtcars, -(!! vars))
Note that this only works in selecting functions because in other contexts
strings and character vectors are ambiguous. For instance strings are a valid
input in mutating operations andmutate(df, "foo")
creates a new column by
recycling "foo" to the number of rows.
Minor changes
-
Support for raw vector columns in
arrange()
,group_by()
,mutate()
,
summarise()
and..._join()
(minimalraw
xraw
support initially) (#1803). -
bind_cols()
handles unnamed list (#3402). -
bind_rows()
works around corrupt columns that have the object bit set
while having no class attribute (#3349). -
combine()
returnslogical()
when all inputs areNULL
(or when there
are no inputs) (#3365, @zeehio). -
distinct()
now supports renaming columns (#3234). -
Hybrid evaluation simplifies
dplyr::foo()
tofoo()
(#3309). Hybrid
functions can now be masked by regular R functions to turn off hybrid
evaluation (#3255). The hybrid evaluator finds functions from dplyr even if
dplyr is not attached (#3456). -
In
mutate()
it is now illegal to usedata.frame
in the rhs (#3298). -
Support
!!!
inrecode_factor()
(#3390). -
row_number()
works on empty subsets (#3454). -
select()
andvars()
now treatNULL
as empty inputs (#3023). -
Scoped select and rename functions (
select_all()
,rename_if()
etc.)
now work with grouped data frames, adapting the grouping as necessary
(#2947, #3410).group_by_at()
can group by an existing grouping variable
(#3351).arrange_at()
can use grouping variables (#3332). -
slice()
no longer enforce tibble classes when input is a simple
data.frame
, and ignores 0 (#3297, #3313). -
transmute()
no longer prints a message when including a group variable.
Documentation
- Improved documentation for
funs()
(#3094) and set operations (e.g.union()
)
(#3238, @edublancas).
Error messages
-
Better error message if dbplyr is not installed when accessing database
backends (#3225). -
arrange()
fails gracefully ondata.frame
columns (#3153). -
Corrected error message when calling
cbind()
with an object of wrong
length (#3085). -
Add warning with explanation to
distinct()
if any of the selected columns
are of typelist
(#3088, @foo-bar-baz-qux), or when used on unknown columns
(#2867, @foo-bar-baz-qux). -
Show clear error message for bad arguments to
funs()
(#3368). -
Better error message in
..._join()
when joining data frames with duplicate
orNA
column names. Joining such data frames with a semi- or anti-join
now gives a warning, which may be converted to an error in future versions
(#3243, #3417). -
Dedicated error message when trying to use columns of the
Interval
orPeriod
classes (#2568). -
Added an
.onDetach()
hook that allows for plyr to be loaded and attached
without the warning message that says functions in dplyr will be masked,
since dplyr is no longer attached (#3359, @jwnorman).
Performance
sample_n()
andsample_frac()
on grouped data frame are now faster
especially for those with large number of groups (#3193, @saurfang).
Internal
-
Compute variable names for joins in R (#3430).
-
Bumped Rcpp dependency to 0.12.15 to avoid imperfect detection of
NA
values in hybrid evaluation fixed in RcppCore/Rcpp#790 (#2919). -
Avoid cleaning the data mask, a temporary environment used to evaluate
expressions. If the environment, in which e.g. amutate()
expression
is evaluated, is preserved until after the operation, accessing variables
from that environment now gives a warning but still returnsNULL
(#3318).