This post aims to explore some basic concepts of do(), along with giving some advice in using and programming.

do() is a verb (function) of dplyr. dplyr is a powerful R package for data manipulation, written and maintained by Hadley Wickham. This package allows you to perform the common data manipulation tasks on data frames, like: filtering for rows, selecting specific columns, re-ordering rows, adding new columns, summarizing data and computing arbitrary operations.

First of all, you have to install dplyr package:

and to load it:

We will analyze the use of do() with the following dataset, created with random data:

We firstly transform it into a tbl_df object to achieve a better print method. No changes occur on the input data frame.

Base Concepts of do() (Non Standard Evaluation Version)

As we already said, do() computes arbitrary operations on a data frame returning more than one number back.

To use do(), you must know that:

  • it always returns a dataframe
  • unlike the others data manipulation verbs of dplyr, do() needs the specification of . placeholder inside the function to apply, referring to the data it has to work with.

  • it is conceived to be used with dplyr group_by() to compute operations within groups:

  • the argument of do() can be named or unnamed:
    • named arguments (more than one supplied) become list-columns, with one element for each group:

    • unnamed argument (only one supplied) must be a data frame and labels will be duplicated accordingly:

Its use is the same working with customized functions.

Let us define the following function, which performs two simple operations returning a data frame:

If the argument is named the result is:

Otherwise, if argument is unnamed the result is:

Programming with do_() (Standard Evaluation Version)

How can we enclose the previous operations inside a function? Simple! Using do_() (the SE version of do()) and interp() function of lazyeval package.

Continue reading on Quantide blog...

Print Friendly, PDF & Email

Related Post