Using JSON Queries (JQ)
Pure JSON
vita <- readr::read_file("../../static/js/vita.json")
jq(vita, '."@reverse".author[] |
{ year: .dateCreated,
author: .author[] | [.givenName, .familyName] | join(" ")
}') %>% combine() %>% fromJSON()
year | author |
---|---|
2017-12-09 | Wayne M. Getz |
2017-12-09 | Charles R. Marshall |
2017-12-09 | Colin J. Carlson |
2017-12-09 | Luca Giuggioli |
2017-12-09 | Sadie J. Ryan |
2017-12-09 | Stephanie S. Romañach |
2017-12-09 | Carl Boettiger |
2017-12-09 | Samuel D. Chamberlain |
2017-12-09 | Laurel Larsen |
2017-12-09 | Paolo D’Odorico |
2017-12-09 | David O’Sullivan |
2017-03-15 | Stephanie E. Hampton |
2017-03-15 | Matthew B. Jones |
2017-03-15 | Leah A. Wasser |
2017-03-15 | Mark P. Schildhauer |
2017-03-15 | Sarah R. Supp |
2017-03-15 | Julien Brun |
2017-03-15 | Rebecca R. Hernandez |
2017-03-15 | Carl Boettiger |
2017-03-15 | Scott L. Collins |
2017-03-15 | Louis J. Gross |
2017-03-15 | Denny S. Fernández |
2017-03-15 | Amber Budden |
2017-03-15 | Ethan P. White |
2017-03-15 | Tracy K. Teal |
2017-03-15 | Stephanie G. Labou |
2017-03-15 | Juliann E. Aukema |
2016-8-19 | T. Alex Perkins |
2016-8-19 | Carl Boettiger |
2016-8-19 | Benjamin L. Phillips |
2016-5-6 | Carl Boettiger |
2016-5-6 | Michael Bode |
2016-5-6 | James N. Sanchirico |
2016-5-6 | Jacob LaRiviere |
2016-5-6 | Alan Hastings |
2016-5-6 | Paul R. Armsworth |
2015-11-16 | Carl Boettiger |
2015-11-16 | Scott Chamberlain |
2015-11-16 | Ted Harte |
2015-11-16 | Karthik Ram |
2015-9-3 | Carl Boettiger |
2015-9-3 | Scott Chamberlain |
2015-9-3 | Rutger Vos |
2015-9-3 | Hilmar Lapp |
2015-1-28 | Carl Boettiger |
2015-1-7 | Boettiger |
2015-1-7 | M. Mangel |
2015-1-7 | S. Munch |
2013-7-10 | Boettiger |
2013-7-10 | Alan Hastings |
2013-6-20 | Carl Boettiger |
2013-6-20 | Noam Ross |
2013-6-20 | Alan Hastings |
2013-01-08 | Boettiger |
2013-01-08 | Alan Hastings |
2012-10-10 | Boettiger |
2012-10-10 | Alan Hastings |
2012-11-6 | Boettiger |
2012-11-6 | D. T. Lang |
2012-11-6 | P. C. Wainwright |
2012-10-11 | Carl Boettiger |
2012-10-11 | Duncan Temple Lang |
2012-5-16 | |
2012-5-16 | Alan Hastings |
2012-3-13 | Jeremy M. Beaulieu |
2012-3-13 | Dwueng-Chwuan Jhwueng |
2012-3-13 | |
2012-3-13 | Brian C. O’Meara |
2012-2-19 | Carl Boettiger |
2012-2-19 | Graham Coop |
2012-2-19 | Peter Ralph |
2009-10-19 | Carl Boettiger |
2009-10-19 | Jonathan Dushoff |
2009-10-19 | Joshua S. Weitz |
2006-11-27 | James J. Wray |
2006-11-27 | Neta A. Bahcall |
2006-11-27 | Paul Bode |
2006-11-27 | Carl Boettiger |
2006-11-27 | Philip F. Hopkins |
With JSON-LD frame
By first constructing a frame, we can get back a subset of the data we are interested in. This is not as powerful as a graph query, but still has aspects of schema-on-read.
frame <-
'{
"@context": "http://schema.org",
"@type": "ScholarlyArticle",
"author": {
"@type": "Person",
"givenName": {},
"familyName": {},
"@explicit": true
},
"dateCreated": {},
"@explicit": true
}'
vita <- jsonld::jsonld_frame("../../static/js/vita.json", frame)
as.character(vita) %>%
jq('."@graph"[] | {
year: .dateCreated,
author: .author[] | [.givenName, .familyName] | join(" ")
}') %>% combine() %>% fromJSON()
year | author |
---|---|
2017-12-09 | Wayne M. Getz |
2017-12-09 | Charles R. Marshall |
2017-12-09 | Colin J. Carlson |
2017-12-09 | Luca Giuggioli |
2017-12-09 | Sadie J. Ryan |
2017-12-09 | Stephanie S. Romañach |
2017-12-09 | Carl Boettiger |
2017-12-09 | Samuel D. Chamberlain |
2017-12-09 | Laurel Larsen |
2017-12-09 | Paolo D’Odorico |
2017-12-09 | David O’Sullivan |
2017-12-09 | Carl Boettiger |
2016-8-19 | T. Alex Perkins |
2016-8-19 | Carl Boettiger |
2016-8-19 | Benjamin L. Phillips |
2013-6-20 | Carl Boettiger |
2013-6-20 | Noam Ross |
2013-6-20 | Alan Hastings |
2009-10-19 | Carl Boettiger |
2009-10-19 | Jonathan Dushoff |
2009-10-19 | Joshua S. Weitz |
2013-01-08 | Carl Boettiger |
2013-01-08 | Alan Hastings |
2006-11-27 | James J. Wray |
2006-11-27 | Neta A. Bahcall |
2006-11-27 | Paul Bode |
2006-11-27 | Carl Boettiger |
2006-11-27 | Philip F. Hopkins |
2017-03-15 | Stephanie E. Hampton |
2017-03-15 | Matthew B. Jones |
2017-03-15 | Leah A. Wasser |
2017-03-15 | Mark P. Schildhauer |
2017-03-15 | Sarah R. Supp |
2017-03-15 | Julien Brun |
2017-03-15 | Rebecca R. Hernandez |
2017-03-15 | Carl Boettiger |
2017-03-15 | Scott L. Collins |
2017-03-15 | Louis J. Gross |
2017-03-15 | Denny S. Fernández |
2017-03-15 | Amber Budden |
2017-03-15 | Ethan P. White |
2017-03-15 | Tracy K. Teal |
2017-03-15 | Stephanie G. Labou |
2017-03-15 | Juliann E. Aukema |
2012-5-16 | Carl Boettiger |
2012-5-16 | Alan Hastings |
2012-10-10 | Carl Boettiger |
2012-10-10 | Alan Hastings |
2013-7-10 | Carl Boettiger |
2013-7-10 | Alan Hastings |
2015-1-7 | Carl Boettiger |
2015-1-7 | M. Mangel |
2015-1-7 | S. Munch |
2015-9-3 | Carl Boettiger |
2015-9-3 | Scott Chamberlain |
2015-9-3 | Rutger Vos |
2015-9-3 | Hilmar Lapp |
2012-11-6 | Carl Boettiger |
2012-11-6 | D. T. Lang |
2012-11-6 | P. C. Wainwright |
2012-2-19 | Carl Boettiger |
2012-2-19 | Graham Coop |
2012-2-19 | Peter Ralph |
2012-3-13 | Jeremy M. Beaulieu |
2012-3-13 | Dwueng-Chwuan Jhwueng |
2012-3-13 | Carl Boettiger |
2012-3-13 | Brian C. O’Meara |
2012-10-11 | Carl Boettiger |
2012-10-11 | Duncan Temple Lang |
SPARQL and RDF
A simple example
"http://dx.doi.org/10.1002/ece3.2314" %>%
httr::GET(httr::add_headers(Accept="application/rdf+xml")) %>%
httr::content(as = "parsed", type = "application/xml") %>%
xml2::write_xml("ex.xml")
Our rdflib
functions perform the simple task of parsing this rdfxml
file into R (as a redland
rdf
class object) and then writing it back out in jsonld
serialization:
rdf_parse("ex.xml", "rdfxml") %>%
rdf_serialize("ex.json", "jsonld")
and we now have JSON file. We can clean this file up a bit by replacing the long URIs with short prefixes by “compacting” the file into a specific JSON-LD context. FOAF, OWL, and Dublin Core are all recognized by schema.org, so we need not declare them at all here. PRISM and BIBO ontologies are not, so we simply declare them as additional prefixes:
context <-
'{ "@context": [
"http://schema.org",
{
"prism": "http://prismstandard.org/namespaces/basic/2.1/",
"bibo": "http://purl.org/ontology/bibo/"
}]
}'
json <- jsonld_compact("ex.json", context)
Switching contexts and framing
context <-
'{
"prism": "http://prismstandard.org/namespaces/basic/2.1/",
"dc": "http://purl.org/dc/terms/",
"bibo": "http://purl.org/ontology/bibo/",
"foaf": "http://xmlns.com/foaf/0.1/",
"owl": "http://www.w3.org/2002/07/owl#",
"schema": "http://schema.org/",
"schema:pageStart": "prism:startingPage",
"schema:pageEnd": "prism:endingPage",
"schema:volumeNumber": "prism:volume",
"schema:identifier": {"@id": "prism:issn", "@type": "@id"},
"schema:Periodical": "bibo:Journal",
"schema:author": "dc:creator",
"schema:isPartOf": "dc:isPartOf",
"schema:publisher": "dc:publisher",
"schema:name": "dc:title",
"schema:familyName": "foaf:familyName",
"schema:givenName": "foaf:givenName",
"schema:Person": "foaf:Person",
"schema:sameAs": {"@id": "owl:sameAs", "@type": "@id"},
"schema:Date": "xsd:date",
"schema:datePublished": {"@id": "http://purl.org/dc/terms/date", "@type": "schema:Date"}
}'
Compact raw JSON into this context
jsonld_compact("ex.json", context) %>%
fromJSON(simplifyVector = FALSE) -> X
Now replace that context with schema.org context, a bit of a hack
X[["@context"]] <- "http://schema.org"
X %>%
toJSON(auto_unbox = TRUE, pretty = TRUE) %>%
jsonld_compact("http://schema.org") -> Y
Now frame our desired results to explicitly include only the elements we request, giving the graph in the desired tree structure:
frame <-
'{"@context": "http://schema.org",
"@graph": {
"id": {},
"name": {},
"pageStart": {},
"pageEnd": {},
"isPartOf": {
"name": {},
"identifier": {},
"@explicit": true
},
"author": [
{
"givenName": {},
"familyName": {},
"@explicit": true
}],
"@explicit": true
}
}'
jsonld_frame(Y, frame)
## {
## "@context": "http://schema.org",
## "@graph": [
## {
## "id": "http://dx.doi.org/10.1002/ece3.2314",
## "author": [
## {
## "id": "http://id.crossref.org/contributor/carl-boettiger-2etprmps2zm1a",
## "type": "Person",
## "familyName": "Boettiger",
## "givenName": "Carl"
## },
## {
## "id": "http://id.crossref.org/contributor/t-alex-perkins-2etprmps2zm1a",
## "type": "Person",
## "familyName": "Perkins",
## "givenName": "T. Alex"
## },
## {
## "id": "http://id.crossref.org/contributor/benjamin-l-phillips-2etprmps2zm1a",
## "type": "Person",
## "familyName": "Phillips",
## "givenName": "Benjamin L."
## }
## ],
## "isPartOf": null,
## "name": "After the games are over: life-history trade-offs drive dispersal attenuation following range expansion",
## "pageEnd": "6434",
## "pageStart": "6425"
## }
## ]
## }
Note that the RDF has different semantic models than schema.org: for instance, volume
is a property of the scholarly article (well, it’s untyped in the RDF, but it’s a property of the object described by the article DOI), while in schema.org, volumeNumber
is a property of a Periodical
(or PublicationVolume
), which hasPart
s made up of PublicationIssue
objects, themselves hasPart
s made up of ScholarlyArticle
s. The whole purpose of JSON-LD functions are to respect semantics, therefore there is no way we can use JSON-LD operations to alter these semantics.
As long as we aren’t changing the object structures though, we can change the vocabulary. This is really also something of a hack: we compact the original data, and then just chop off the @context
and provide our own @context
that gives schema.org definitions to the terms.
JSON-LD is commonly used to change key names, but this assumes that both contexts can be defined relative to the same URIs. e.g. we can say that in the context of Dublin Core, implicitly "title": "http://schema.org/name", or explicitly:
“https://purl.org/dc/title”: “http://schema.org/name”`.
Perhaps this ought instead to be done with an ontological operation and the assertion of sameAs
and similar relationships. Perhaps that would also permit moving between these different levels?
Note that items with specific types must be declared as such to match types expected in schema.org. Others can be captured as schema.org terms just by setting the default @vocab
.