Fun Standardizing Non Standard Evaluation

Using dplyr calls on the back-end of the rfishbase re-write means working around the non-standard evaluation (NSE), as described in the dplyr vignette.

Grab the data I was using for this:

library("dplyr")
downloader::download("https://github.com/cboettig/2015/raw/fc0d9185659e7976927d0ec91981912537ac6018/assets/data/2015-02-06-taxa.csv", "taxa.csv")
all_taxa <- read.csv("taxa.csv")

Consider a simple NSE dplyr call:

x <- filter(all_taxa, Family == 'Scaridae')

The best SE version of this just needs to use the formula expression, ~, the _ SE version of the function and it’s .dots argument:

.dots <- list(~Family == 'Scaridae')
x1 <- filter_(all_taxa, .dots=.dots)

identical(x, x1)

[1] TRUE

This lets us treat the arguments (e.g. values of the factor on which we filter) as variables:

family <- 'Scaridae'
.dots <- list(~Family == family)
x2 <- filter_(all_taxa, .dots=.dots)
identical(x, x2)

[1] TRUE

If we want both the key and value to vary, we need to get pretty fancy to subvert the non-standard evaluation:

library(lazyeval)
family <- 'Scaridae'
field <- 'Family'
.dots <- list(interp(~y == x, 
                     .values = list(y = as.name(field), x = family)))
x3 <- filter_(all_taxa, .dots=.dots)
identical(x, x3)

[1] TRUE

At bit more fun to wrap this into a function where we take arbitrary number of arguments as name-value pairs:

query <- list(Family = 'Scaridae', SpecCode = 5537)
dots <- lapply(names(query), function(level){
    value <- query[[level]]
    interp(~y == x, 
                .values = list(y = as.name(level), x = value))
  })
  
x3 <-  filter_(all_taxa, .dots = dots)

More fun standardizing NSE

The previous examples show only applications to filter_(). While the general idea is the same, this pattern doesn’t translate directly for other functions, such as mutate_. Here’s some common patterns I’ve adopted when using mutate_(). First consider the familiar NSE useage:

df <- mutate(mtcars, displ_l = disp / 61.0237)
head(df)

   mpg cyl disp  hp drat    wt  qsec vs am gear carb  displ_l
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 2.621932
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 2.621932
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 1.769804
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 4.227866
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 5.899347
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 3.687092

Again we use list(interp( pattern, but note that we specify the name for our new column using setNames (naming the elements of the list).

dots <- setNames(list(lazyeval::interp(~x / y, x = quote(disp), y=61.0237)), "displ_l")
df2 <- mutate_(mtcars, .dots = dots)
identical(df, df2)

[1] TRUE

Of course the use y could be skipped for a more direct value if that was not a variable.

More `dplyr` patterns

Also thought I would scribble down some other common dplyr patterns I find myself re-using.

applying a function that returns a data.frame to each element of a list and coercing the combined output to a data.frame:

mylist %>% lapply(myfun) %>% dplyr::bind_rows()

To place this deeper in the hadleyverse, purrr::map could be dropped in for lapply in the above example.

Another common pattern for me is expand.grid() %>% group_by() %>% do(), Here’s a recent example of mine

Also includes an example of how to define group_by_all() since that is usually the grouping I need from an expand.grid() call (that is, I want to apply over all combinations of some parameter settings, etc)

Something I hope is not a common pattern but one I struggled with for a bit: making recursive calls of the above pattern for nested lists. This code in RNeXML illustrates my solution, which required both function recursion and function closure.

More fun standardizing NSE

More dplyr patterns

More `dplyr` patterns