Notes

Exploring XML Schema in R

I danced around the room after reading these lines in the excellent XML and Web Technologies for Data Sciences with R book:

We can generate S4 class definitions corresponding to the data structures that are represented in the XML document. We can programmatically generate R code that will map an XML document adhering to this schema to corresponding R data structures. Similarly, we can automate the creation of code to serialize, or write, an R object of a particular type to a suitable XML format corresponding to the schema

Wow. Automatically turn EML objects into R objects and vice versa! Generate a NeXML parser for R automatically. The world is your oyster now!

Diving in

Using install_github("XMLSchema", "omegahat").

Trouble so far.

EML

readSchema("eml.xsd")

From Duncan:

As I suspected from just the description of the problem, there is a recursive circularity in the schema files - eml-documentation.xsd imports eml-text.xsd which imports eml-documentation.xsd. I’ll try to find some time to handle this.

Staying tuned…

NexML

This is a schema for phylogenetic trees. Being able to use this schema would be a big improvement over the current approach of our treebase package, which queries a vertically integrated database that is far less expressive then the NeXML schema.

I tried this schema, but cannot download the file for some reason:

nex <- readSchema("https://www.nexml.org/2009/nexml.xsd")
failed to load HTTP resource
Error: 1: failed to load HTTP resource
> 

I download a copy from here and set its xsd/ directory as my working directory:

> nex <- readSchema("nexml.xsd")
Error: XML content does not seem to be XML: './continuous.xsd'

because the path should be characters/continuous.xsd. Not sure why I’m not managing to navigate the paths correctly here.

KML example from book

I cannot load the KML schema directly from the website:

> kx = "https://developers.google.com/kml/schema/kml21.xsd"
> kml = readSchema(kx)
Error: XML content does not seem to be XML: 'https://developers.google.com/kml/schema/kml21.xsd'

Likewise,

kml = readSchema( "https://schemas.opengis.net/kml/2.2.0/ogckml22.xsd")
failed to load HTTP resource
Error: 1: failed to load HTTP resource

fails, though the other example works

pmml = readSchema("https://www.dmg.org/v4-1/pmml-4-1.xsd")

If I wget the xsd and save it locally first, then I can readSchema on the local file. If I have options(error = recover) set, then

 defineClasses(kml)

gets me deep into the debugger. If I run it with options(error=NULL), then I get warnings:

> defineClasses(kml)
character(0)
Warning messages:
1: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.Object is an existing class
2: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.Feature is an existing class
3: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.LookAt is an existing class
4: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.Geometry is an existing class
5: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.StyleSelector is an existing class
6: In (function (className, baseType, where = globalenv())  :
  https://earth.google.com/kml/2.1.TimePrimitive is an existing class
> 

I load an minimal example file and try fromXML, but hit an error:

> doc <- xmlParse("kmlexample.xml")
> fromXML(doc)
Error in UseMethod("xmlAttrs", node) : 
  no applicable method for 'xmlAttrs' applied to an object of class "c('XMLInternalDocument', 'XMLAbstractDocument')"

Stay tuned for more from Duncan on getting this rolling..

Reading

  • Early warning signals in multivariate dynamics: r citet("https://arxiv.org/abs/1306.3465", "discusses"). Population size grows as approaches collapse, multidemensional data needed to detect slowing down (eigenvalue estimates from multidimensional autoregressive model)

  • Little grants are more bang for the buck than big grants: r citet("10.1371/journal.pone.0065263", "discusses")

Code tricks

From Egan, on getting a useful version of rvm on Ubuntu 12.04 (lets me install bundle install on the lab-notebook gemfile so I can deploy from zero.ucdavis, etc.)

RVM has to be installed locally because the packaged version is broken. Before installation, the following packages need to be installed globally (unless you have root access in which case the installer will take care of them):

    bison
    libyaml-dev
    libgdbm-dev
    gawk
    sqlite

To install RVM, run:

    \curl -L https://get.rvm.io | bash -s stable --ruby --autolibs=enable --auto-dotfiles

which takes a while because it downloads and builds a lot. The installer takes care of the configuration automatically and all you need to do after installation is log into a fresh shell to source the required functions and environment.