“Data rectangling”: the process of turning highly nested data structures (e.g. JSON, XML) into a tabular format.
Data rectangling is a brilliant turn of phrase coined by Jenny Bryan (UBC, RStudio) and leader in the #rstats community. Recording or slides of Jenny’s talk on the subject give a much better intro to the idea and working with this in R, particularly through the purrr
package.
As nice as purrr
is for the task, I’ve recently found that the jqr
package from Scott Chamberlain and co can be a much easier way to go about rectangling your JSON. Here’s a quick comparison based on an example from the lesson Hadley Jenny have on Data Rectangling.
#devtools::install_github("jennybc/repurrrsive")
library(jsonlite)
library(tidyverse)
library(repurrrsive)
library(jqr)
Using purrr
gh_flat <- gh_repos %>% flatten() # abandon nested structure and hope we didn't need it
gh_tibble <- tibble(
name = gh_flat %>% map_chr("name"),
issues = gh_flat %>% map_int("open_issues_count"),
wiki = gh_flat %>% map_lgl("has_wiki"),
homepage = gh_flat %>% map_chr("homepage", .default = ""),
owner = gh_flat %>% map_chr(c("owner", "login"))
)
gh_tibble %>% datatable()
Note we need to be explicit about missing value defaults and types.
Using jqr
Note that we can simply exploit the object typing already encoded in the data (int
, lgl
,chr
)
f <- system.file("extdata/gh_repos.json", package="repurrrsive")
read_file(f) %>%
jq('.[][] | {
name: .name,
issues: .open_issues_count,
wiki: .has_wiki,
homepage: .homepage,
owner: .owner.login
} ') %>%
jqr::combine() %>% # single json file
fromJSON() %>% DT::datatable()
This example only touches the surface of the jq
syntax. The jq manual provides a nice overview of this intuitive syntax. jq
can also perform a wide range of data processing on the elements: including conditionals, comparisons, regular expressions, math, and so forth. While these are great, most R users will want to learn just enough jq
syntax to get back a nice data rectangle, and then dplyr
can take over.