New functions
between()
vector function efficiently determines if numeric values fall
in a range, and is translated to special form for SQL (#503).count()
makes it even easier to do (weighted) counts (#358).data_frame()
by @kevinushey is a nicer way of creating data frames.
It never coerces column types (no morestringsAsFactors = FALSE
!),
never munges column names, and never adds row names. You can use previously
defined columns to compute new columns (#376).distinct()
returns distinct (unique) rows of a tbl (#97). Supply
additional variables to return the first row for each unique combination
of variables.- Set operations,
intersect()
,union()
andsetdiff()
now have methods
for data frames, data tables and SQL database tables (#93). They pass their
arguments down to the base functions, which will ensure they raise errors if
you pass in two many arguments. - Joins (e.g.
left_join()
,inner_join()
,semi_join()
,anti_join()
)
now allow you to join on different variables inx
andy
tables by
supplying a named vector toby
. For example,by = c("a" = "b")
joins
x.a
toy.b
. n_groups()
function tells you how many groups in a tbl. It returns
1 for ungrouped data. (#477)transmute()
works likemutate()
but drops all variables that you didn't
explicitly refer to (#302).rename()
makes it easy to rename variables - it works similarly to
select()
but it preserves columns that you didn't otherwise touch.slice()
allows you to selecting rows by position (#226). It includes
positive integers, drops negative integers and you can use expression like
n()
.
Programming with dplyr (non-standard evaluation)
- You can now program with dplyr - every function that does non-standard
evaluation (NSE) has a standard evaluation (SE) version ending in_
.
This is powered by the new lazyeval package which provides all the tools
needed to implement NSE consistently and correctly. - See
vignette("nse")
for full details. regroup()
is deprecated. Please use the more flexiblegroup_by_()
instead.summarise_each_q()
andmutate_each_q()
are deprecated. Please use
summarise_each_()
andmutate_each_()
instead.funs_q
has been replaced withfuns_
.
Removed and deprecated features
%.%
has been deprecated: please use%>%
instead.chain()
is
defunct. (#518)filter.numeric()
removed. Need to figure out how to reimplement with
new lazy eval system.- The
Progress
refclass is no longer exported to avoid conflicts with shiny.
Instead useprogress_estimated()
(#535). src_monetdb()
is now implemented in MonetDB.R, not dplyr.show_sql()
andexplain_sql()
and matching global optionsdplyr.show_sql
anddplyr.explain_sql
have been removed. Instead useshow_query()
and
explain()
.
Minor improvements and bug fixes
- Main verbs now have individual documentation pages (#519).
%>%
is simply re-exported from magrittr, instead of creating a local copy
(#496, thanks to @jimhester)- Examples now use
nycflights13
instead ofhflights
because it the variables
have better names and there are a few interlinked tables (#562).Lahman
and
nycflights13
are (once again) suggested packages. This means many examples
will not work unless you explicitly install them with
install.packages(c("Lahman", "nycflights13"))
(#508). dplyr now depends on
Lahman 3.0.1. A number of examples have been updated to reflect modified
field names (#586). do()
now displays the progress bar only when used in interactive prompts
and not when knitting (#428, @jimhester).glimpse()
now prints a trailing new line (#590).group_by()
has more consistent behaviour when grouping by constants:
it creates a new column with that value (#410). It renames grouping
variables (#410). The first argument is now.data
so you can create
new groups with name x (#534).- Now instead of overriding
lag()
, dplyr overrideslag.default()
,
which should avoid clobbering lag methods added by other packages.
(#277). mutate(data, a = NULL)
removes the variablea
from the returned
dataset (#462).trunc_mat()
and henceprint.tbl_df()
and friends gets awidth
argument
to control the deafult output width. Setoptions(dplyr.width = Inf)
to
always show all columns (#589).select()
gainsone_of()
selector: this allows you to select variables
provided by a character vector (#396). It fails immediately if you give an
empty pattern tostarts_with()
,ends_with()
,contains()
ormatches()
(#481, @leondutoit). Fixed buglet inselect()
so that you can now create
variables calledval
(#564).- Switched from RC to R6.
tally()
andtop_n()
work consistently: neither accidentally
evaluates the thewt
param. (#426, @mnel)rename
handles grouped data (#640).
Minor improvements and bug fixes by backend
Databases
- The db backend system has been completely overhauled in order to make
it possible to add backends in other packages, and to support a much
wider range of databases. Seevignette("new-sql-backend")
for instruction
on how to create your own (#568). src_mysql()
gains a method forexplain()
.- When
mutate()
creates a new variable that uses a window function,
automatically wrap the result in a subquery (#484). - Correct SQL generation for
first()
andlast()
(#531). order_by()
now works in conjunction with window functions in databases
that support them.
Data frames/tbl_df
- All verbs now understand how to work with
difftime()
(#390) and
AsIs
(#453) objects. They all check that colnames are unique (#483), and
are more robust when columns are not present (#348, #569, #600). - Hybrid evaluation bugs fixed:
- Call substitution stopped too early when a sub expression contained a
$
(#502). - Handle
::
and:::
(#412). cumany()
andcumall()
properly handleNA
(#408).nth()
now correctly preserve the class when using dates, times and
factors (#509).- no longer substitutes within
order_by()
becauseorder_by()
needs to do
its own NSE (#169).
- Call substitution stopped too early when a sub expression contained a
[.tbl_df
always returns a tbl_df (i.e.drop = FALSE
is the default)
(#587, #610).[.grouped_df
preserves important output attributes (#398).arrange()
keeps the grouping structure of grouped data (#491, #605),
and preserves input classes (#563).contains()
accidentally matched regular expressions, now it passes
fixed = TRUE
togrep()
(#608).filter()
asserts all variables are white listed (#566).mutate()
makes arowwise_df
when given arowwise_df
(#463).rbind_all()
createstbl_df
objects instead of rawdata.frame
s.- If
select()
doesn't match any variables, it returns a 0-column data frame,
instead of the original (#498). It no longer fails when if some columns
are not named (#492) sample_n()
andsample_frac()
methods for data.frames exported.
(#405, @alyst)- A grouped data frame may have 0 groups (#486). Grouped df objects
gain some basic validity checking, which should prevent some crashes
related to corruptgrouped_df
objects made byrbind()
(#606). - More coherence when joining columns of compatible but different types,
e.g. when joining a character vector and a factor (#455),
or a numeric and integer (#450) mutate()
works for on zero-row grouped data frame, and
with list columns (#555).- �
LazySubset
was confused about input data size (#452). - Internal
n_distinct()
is stricter about it's inputs: it requires one symbol
which must be from the data frame (#567). rbind_*()
handle data frames with 0 rows (#597). They fill character
vector columns withNA
instead of blanks (#595). They work with
list columns (#463).- Improved handling of encoding for column names (#636).
- Improved handling of hybrid evaluation re $ and @ (#645).
Data tables
- Fix major omission in
tbl_dt()
andgrouped_dt()
methods - I was
accidentally doing a deep copy on every result :( summarise()
andgroup_by()
now retain over-allocation when working with
data.tables (#475, @arunsrinivasan).- joining two data.tables now correctly dispatches to data table methods,
and result is a data table (#470)
Cubes
summarise.tbl_cube()
works with single grouping variable (#480).