Semantic Markup Examples For The Lab Notebook

All notebook entries are formatted with XHTML compliant (polyglot) HTML5 semantic structure. This means that any entry can be parsed with a generic XML parser to extract the entry content in <article>, the <header>, <footer>, <aside>, etc. The <head> section provides <title> and essential <meta> tags declaring the character encoding (which also sets the MIME type for HTML5).

Head

In _includes/header.html we introduce some basic academic archive metadata using the Dublin Core ontology. The html5 <header> seems like a good place for dc:title. Other metadata is added invisibly using content in a meta tag.

<head prefix="dc: https://purl.org/dc/terms/">
<!-- HTML5 metadata -->
<meta charset="utf-8" /> <!--same as <meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'> -->
<meta name="author" content="Carl Boettiger" />
<meta name="keywords" content="Ecology, Evolution, Open Science, Reproducible Research" />
<meta name="description" content="My open lab notebook: research in theoretical ecology and evolution" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>{{ page.title }}</title>
<!-- RDFa Metdata (in Dublin Core) -->
<meta property="dc:creator" content="Carl Boettiger" />
<meta property="dc:title" content="{{ page.title }}" />
<meta property="dc:type" content="Lab Notebook" />
<meta property="dc:format" content="text/html" />
<meta property="dc:language" content="en" />

More metadata is displayed in the sidebar (in an html5 aside element), including date, tags and category. I’ve added the following RDFa to describe these (see examples with the Jekyll liquid code in _includes/sidebar.html)

<aside prefix="dc: https://purl.org/dc/terms/">
Posted on
<time datetime="{{ page.date | date_to_xmlschema }}" 
      property="dc:created">{{ page.date | date_to_string }}</time>.

This puts the date in an HTML5 time element described as the dc.created time of the publication. Tags and categories use a little Liquid code so Jekyll can import them from the YAML header. The link to the tag page is described with the HTML rel attribute for tag, while the tag itself is given the RDFa property of a blog keyword from schema.org. Categories are given the more formal dc:subject property. (See _includes/pagetags.html for Jekyll/Liquid code for looping over multiple tags, etc).

  <a rel="tag" class="tag" href="/tags.html#{{ tag | cgi_escape }}">#<span 
  property="https://schema.org/BlogPosting/keywords">{{ tag }}</span></a>
  <a rel="tag" class="category" href="/categories.html#{{ category | cgi_escape }}"><span property="dc:subject">{{ category }}</span></a>

The next and previous buttons get the HTML5 rel values next and prev (see HTML5 semantics). Note that all the Dublin Core elements are properties of the page on which they occur.

The <footer> contains licenses and contact information through social media. I’ve used the FOAF vocabulary to indicate my membership to the various social networks provided in the links. The social networks are properties of a FOAF:person, (me), which is indicated in the RDFa resource:

<div class="span4" vocab="https://xmlns.com/foaf/0.1/" typeof="Person"
resource="https://www.carlboettiger.info#me">
  <a property="account" href="mailto:[email protected]"><img
  src="/assets/img/icon-email.png" alt="email"> </a>
  <a property="account" href="https://twitter.com/cboettig"><img
  src="/assets/img/icon-twitter.png" alt="twitter"> </a>
  <a property="account" href="https://github.com/cboettig"><img
  src="https://www.carlboettiger.info/assets/img/icon-github.png"
  alt="github"> </a> <a property="account"
  href="https://www.mendeley.com/profiles/carl-boettiger"><img
  src="/assets/img/icon-mendeley.png" alt="mendeley"></a>
<a href="/atom.xml"><img src="/assets/img/icon-rss.png"
alt="feed"></a> </div>

It might be worth adding more FOAF terms describing the relationship of these networks (such as distinguishing my profile name and the base url) using meta elements to avoid adding clutter text. I use the Creative Commons namespace to identify my CC0 license. The license is a property of the page on which it is found, like the Dublin Core metadata.

<a property="https://creativecommons.org/ns#license"
href="https://creativecommons.org/publicdomain/zero/1.0/"><img
src="https://i.creativecommons.org/l/zero/1.0/88x31.png" alt="CC0"
style="float:right"></a>

When these elements are combined in a given entry, the RDFa generates this structure

Easily entering RDFa

All of this markup content is either generally static or filled in programatically by Jekyll from the YAML headers (e.g. title, categories, tags, date), leaving me free to put the RDFa into the template files just once and forget about it. In this way I can continue to write everything in Markdown, rather than the more cumbersome HTML riddled by the addition of RDFa attributes. Unfortunately there is no easy way to escape this in writing the Homepage markup, described below, but a single case isn’t so bad.

Another obvious candidate for linked data in the posts is the citation information. This markup can probably be automatically generated for each post using scripts (see Automated feeds in notebook entries), so I plan to add this to my knitcitations package when I get a chance (including CiTO support!).

Homepage

My homepage contains a brief description about me that is ripe for FOAF and schema.org. While this content is reasonably static, it does mean writing directly in HTML+RDFa, with the markup making the source almost unreadable. (It might be possible to force more of this data into YAML header and have it added through the invisible meta tags, but that seems rather convoluted). Capturing the “buisness card” information as well as the “social” information requires mixing ontologies, which can lead to some interesting complications. To make a long story short:

Implicit logic is possible with linked data: we can infer that schema:jobTitle can be a property of a foaf:Person because it is a property of a schema:Person and someone declares them as owl:equivalentProperty.
In practice it may be most robust to explicitly declare the relation by exploiting the fact that we can give multiple arguments to attributes like property and typeof.

<div id="me" prefix="foaf: https://xmlns.com/foaf/0.1/" prefix="schema:
https://schema.org/Person" typeof="foaf:Person schema:Person"
resource="https://www.carlboettiger.info#me"> 
I am  <a property="foaf:homepage schema:url"
href="https://www.carlboettiger.info"><span property="foaf:name
schema:name"><span property="foaf:givenName schema:givenName">Carl</span>
<span property="foaf:familyName schema:familyName">Boettiger</span></span></a>, <span
property="schema:jobTitle">a graduate student</span>

(This excerpt gives the general flavor, see the html source of index.html for a full example.)

The homepage generates this RDFa structured graph (from rdfa.info/play)

Notes

Some background reading on semantics

Nice collection of posts on scholarly html, but too little mention of the serious linked data stuff coming out of biscol, ievobio/nescent, etc. In particular, Martin’s history gives some nice perspective.
The war for schema. Which ontologies should we choose? (e.g. schema.org’s take
On the good & the bad of schema.org and the role of search giants

Mixing ontologies had me very confused.

How do we mix ontologies
How do we meet Google Rich Snippets requirements using FOAF?
How do we avoid redundant vocabulary terms?
We can avoid the redundancy using implicit reasoning (exploiting the logic of linked data), but we may prefer to be explict. This video does a nice job of explaining these concepts. If we do semantics right, we should be able to “dereference” an object away from a particular ontology.

Tools and Resources

Google rich snippets
rdfa extractor (into RDF, JSON, n3). Also has validator, but doesn’t point to the nu variant for html5.
RDFa Play. Wow, very nice.
sameAs.org – search onotologies for an existing term, identify identical terms in different ontologies, etc. Simply brilliant, what I’ve always wanted!

knitcitations update: adding formats an semantics.

Looking at citeproc / CSL github page.
Raw citeproc is an XML stylesheet, XSLT, which can render XML into HTML (and much more). Possible other options are using something like citeproc-js or citeproc-ruby, or writing citeproc-R.
Not quite clear how to take a CSL and a set of references and generate HTML; aforementioned packages may address this.
For the moment, probably easier to convert R bibitem to display with RDFa with scholarlyhtml recommended ontology than to enter raw XML reference and format via citeproc. Should also support adding in the cito tags. With full addresses in property, should be able to avoid any need for namespace/header modifications.