Lab Notebook

(Introduction)

Coding

  • cboettig pushed to master at ropensci/EML: clean up .travis.yml 10:40 2014/04/22
  • cboettig pushed to master at cboettig/labnotebook: update links on software layout 10:33 2014/04/22
  • cboettig pushed to master at ropensci/EML: README updated Rbuildignore updated 10:32 2014/04/22
  • cboettig pushed to devel at ropensci/EML: README updated Rbuildignore updated 10:31 2014/04/22
  • cboettig pushed to master at cboettig/multiple_uncertainty: adding annotations cleanup 10:20 2014/04/22

Discussing

Reading

  • The limitations of diversity metrics in directing global marine conservation: Marine Policy (2014). Pages: 123-125. James P.W. Robinson, Easton R. White, Logan D. Wiwchar, Danielle C. Claar, Justin P. Suraci, Julia K. Baum et al. 05:26 2014/04/11
  • Integrating abundance and functional traits reveals new global hotspots of fish diversity.: Nature (2013). Volume: 501, Issue: 7468. Pages: 539-42. Rick D Stuart-Smith, Amanda E Bates, Jonathan S Lefcheck, J Emmett Duffy, Susan C Baker, Russell J Thomson, Jemina F Stuart-Smith, Nicole a Hill, Stuart J Kininmonth, Laura Airoldi, Mikel a Becerro, Stuart J Campbell, Terence P Dawson, Sergio a Navarrete, German a Soler, Elisabeth M a Strain, Trevor J Willis, Graham J Edgar et al. 05:26 2014/04/11
  • A bayesian approach to identifying and compensating for model misspecification in population models.: Ecology (2014). Volume: 95, Issue: 2. Pages: 329-41. James T Thorson, Kotaro Ono, Stephan B Munch et al. 06:42 2014/04/05
  • Quantitative Bioscience for the 21st Century: BioScience (2005). Volume: 55, Issue: 6. Pages: 511. Alan Hastings, Peter Arzberger, Ben Bolker, Scott Collins, Anthony R. Ives, Norman a. Johnson, Margaret a. Palmer et al. 05:23 2014/03/28

Entries

Notes

18 Apr 2014

EML

I’ve added a interface for custom units. You don’t have to mess around with additionalMetadata as it should be handled automatically. Just define the unit itself, like this 1:

  create_custom_unit(id = "metersSquaredPerHectare",
                     parentSI = "dimensionless",
                     unitType = "dimensionless",
                     multiplierToSI = "0.0001",
                     description = "Square meters per hectare")

and then use the id you give as the unit type in your unit.defs. create_custom_unit updates a custom_units list in the EMLConfig environment, which the eml or write_eml functions detect and use to write in the additional metadata. 2

If the custom unit is not defined and the session is interactive, EML will prompt for essential fields in the unit definition. Other optional fields are also described in the documentation.


  1. Note that I just use “dimensionless” as an illustration here, a the better choice might perhaps be to define an area/area unitType as @mbjones discusses in #12. (We’ll add support for OBOE semantics eventually, but that’s much more complex)

  2. This isn’t ideal, since the use of environments means the function has “side effects” and you could inadvertently include unit definitions you defined for some other reason in your EML file (if you didn’t do eml_reset_config() or start a fresh R session). That wouldn’t break anything technical, but would seem a bit strange to define units you didn’t use. The environment mechanism can be avoided by explicitly passing list of custom_units to the eml or eml_write functions.

Read more



Notes

17 Apr 2014

Units in EML

Overview of how units are determined in the EML package:

library(EML)
dat <- data.set(river = factor(c("SAC",  "SAC",   "AM")),
               spp   = c("Oncorhynchus tshawytscha",  "Oncorhynchus tshawytscha", "Oncorhynchus kisutch"),
               stg   = ordered(c("smolt", "parr", "smolt"), levels=c("parr", "smolt")), # levels indicates increasing level, eg. parr < smolt
               ct    = c(293L,    410L,    210L),
               day   = as.Date(c("2013-09-01", "2013-09-1", "2013-09-02")),

               stringsAsFactors = FALSE,

               col.defs = c("River site used for collection",
                            "Species scientific name",
                            "Life Stage",
                            "count of live fish in traps",
                            "day traps were sampled (usually in morning thereof)"),

               unit.defs = list(c(SAC = "The Sacramento River",                         # Factor, levels defined explicitly
                                  AM = "The American River"),
                                "Scientific name",                                      # Character string (levels not defined)
                                c(parr = "third life stage",                            # Ordered factor
                                  smolt = "fourth life stage"),
                                c(unit = "number", precision = 1, bounds = c(0, Inf)),  # Integer
                                c(format = "YYYY-MM-DD", precision = 1)))               # Date

The EML package provides a variety of interfaces to transform this into EML format, depending on the level of granularity desired. At the highest level, as user can directly call eml_write (aliased as write.eml to mimic other write file conventions),

eml_write(dat, "example.xml", contact = "Carl Boettiger <cboettig@ropensci.org")

This call works simply by calling the lower level functions. If dat is already the S4 object representation for eml or dataset object, they are dealt with directly. Otherwise, the function simply passes its arguments to the eml constructor function. The main difference between the eml constructor and write_eml is that the eml function returns an eml S4 object, while the write_eml function takes the additional step of transforming the S4 eml structure into XML and writing it to the desired file.

The eml function, also available to the user, has more optional named arguments than write_eml, though these can all be given to the higher level write_eml as well since they are passed through the ... mechanism. (Its default arguments illustrate calls to some of the lower-level constructors, such as eml_coverage, and will automatically try and read creator and contact from the configuration environment if they are not provided)

> eml
function (dat = NULL, title = "metadata",
          creator = get("defaultCreator", envir = EMLConfig),
          contact = get("defaultContact", envir = EMLConfig),
          coverage = eml_coverage(scientific_names = NULL,
                                  dates = NULL,
                                  geographic_description = NULL,
                                  NSEWbox = NULL),
          methods = new("methods"),
          ...,
          additionalMetadata = c(new("additionalMetadata")),
          citation = NULL,
          software = NULL,
          protocol = NULL)

Because all other arguments are optional, it is sufficient to call this function with the dat argument alone. The data object is allowed to be NULL if at least one of the other top-level alternatives to a dataset is provided: citation, software or protocol.

my_eml <- eml(dat)

This function follows the same logic as before: The function first constructs a unique identifier for the EML packageId. If dat is already an S4 dataset or dataTable object it is added immediately to a new("eml" object; otherwise a dataTable object is constructed with the next helper function, eml_dataTable. (These constructors prefaced with (eml_) are always just thin wrappers around the direct construction of these S4 objects with new("classname", ...), and exist just to simplify certain things wich are usually and frequently automated, such as creating unique ide elements.)

So far our data.set object dat is just passed unchanged from write_eml to eml to eml_dataTable.

my_dataTable <- eml_dataTable(dat)

This constructor is the first to peer inside the dat object, extracting metadata for the attributeList elements. (The dat object is also passed to the eml_physical constructor, which will use the actual data.frame to write the csv file.)

The metadata extraction is performed in two steps. First, the helper function detect_class extracts a list of the necessary metadata from the data.set. Then this list is coerced into an EML attributeList (Note in this review it becomes clear that the coercion is not a flexible and robust way to handle this, so this task is now performed by eml_attributeList and in turn, eml_attribute, following the same logic as above.) Currently, detect_class takes the legacy format of having a data.frame and a list of meta objects, structured as column name, col definitions, and unit defintions (each as character vectors, like in a data.set. Here is where things get tricky. detect_class uses the declared class of the column to decide how to interpret the column and unit metadata, using the following mapping:

numeric or integer : ratio
ordered factor: ordinal/enumeratedDomain
factor : nominal/enumeratedDomain
POSIXlt, POSIXct, Date : dateTime
character : nominal/textDomain

Looks like to would be best for eml_attribute to handle this mapping itself, particulary since the different conventions bifurcate at different spots (e.g. we must know if a nominal is enumerated or text). This could also allow for finer handling of optional unit information, such as the bounds or precision.

Read more



Notes

11 Apr 2014

labnotebook

Updates to configuration to be more portable and lightweight:

  • Drop APIs for Mendeley and Github in favor of just pulling from atom/rss feeds. Doesn’t require a platform-specific API or authentication, and has proven more robust, if sometimes less flexible and fancy.
  • Drop git-modified and sha timestamps. It’s nice having these on the pages and included in the rdfa, but the information is accessible from the git repository and from Github. The overhead time in computing these values and successfully including them wasn’t worth it.
  • However, add a sha for the vita page using a simple filter.
  • Wishing I could just embed svgs into md output as XML, instead of as binary. See full description of issue: yihui/knitr#754.

  • Installed vim-pandoc’s dev versions, vim-pantondoc and vim-pandoc-syntax via pathogen; seem to be working rather nicely.

reproducible paper example

from richfitz mwpennel and co: A manuscript with examples executed by makefile (or R source script) and checked and built by travis: richfitz/wood. Wow. Particularly nice jobs in:

  • handling downloading of raw data from original (Dryad-based) sources. Makefile seems to handle doing this in a cache-friendly way that is lighter on Travis network.
  • providing the cleaned, process data as seperate data files in clearly marked output directory. Still, the output would be more modular if done in some more generic format than R’s binary serialization as rds files. They do also provide data on Dryad, which is presumably the processed data in a more sensible format, though their data DOI, 10.5061/dryad.v7m14, is still embargoed in advance of publication at the time of writing.
  • Travis checks not only that code runs, but that it downloads and that the manuscript and figures can be complied (by LaTeX), with supplement knitted and compiled as html.
  • in a very nice touch, Travis pushes html and pdf results back to Github on gh-pages.

Some very minor nitpicks:

  • manuscript isn’t strictly a literate (knitr) document, though figures are being produced on the fly from code by make. May be a sensible choice (though misses a few opportunities for dynamic inline values).
  • license not declared, but already coming soon richfitz/wood#8
  • discussion of citation of code: richfitz/wood#11
  • trouble in executing locally, see richfitz/wood#12

rfishbase

Read more



Multiple Uncertainty Notes

10 Apr 2014

multiple uncertainty

  • Set up Paul Fackler’s MDPSOLVE and run Paul’s code. Installing user-contributed extensions in matlab seems mostly a matter adding the unzipped source directories to the path, e.g. as so:
currentdir = cd
cd /home/cboettig/.matlab/MDPSOLVE
addpath(genpath(cd))

cd /home/cboettig/.matlab/plot2svg
addpath(genpath(cd))

cd(currentdir)

(I still find matlab/octave’s syntax of not distinguishing character strings from variable names from function names kind of terrifying, but I guess that’s the price of working with economists.)

Code runs, with svg output using the user-contributed svg function plot2svg since Matlab doesn’t seem to include such basic functionality (though works fine on octave).

ropensci

Morning mostly spent wrapping up proposal steps

  • Pandoc-latex formatting stuff (xelatex, Times New Roman 12pt, doublespacing, makefile configuration, latex-template configuration, margins, yaml header metadata).

  • Reworking introduction into seperate sections

  • feedback on further issues through issue tracker.

Reading

  • Further evidence that anything can happen with regards to transitions in spatial dynamics: Gowda et al, Phys Rev E

Read more



Multiple Uncertainty Notes

09 Apr 2014

Attempt to replicate Figure 3, but with one noise at a time

  • Code: one_noise_at_a_time.m
  • Uniform Noise
  • Large noise means devation of 0.5, as in Sethi
  • Other noise terms are 0 instead of 0.1

Figure 3

  • Code: carl_fig3.m
  • Uniform Noise
  • Large noise means devation of 0.5, as in Sethi
  • Small noise means 0.1, as in Sethi

Now fixed. See:


One noise at a time, lognormal noise, sigma = 0.3. (code)

Read more



Multiple Uncertainty Notes

08 Apr 2014

Growth Noise Only

Run:

matlab -nodesktop < testing.m > testing.log

Results from running testing.m:

  • log normal noise
  • sigma_g = 0.2
  • Other noise set to zero.
  • Coarse grid 0:5:150
  • (See linked code all parameters)
Growth noise only
Growth noise only

measurement Noise only

measurement noise only
measurement noise only

Implementation Noise only

Implementation noise only
Implementation noise only

  • Thinking about the best way to embed SVGs in the notebook, see docs/21
  • Thoughts on date metadata, see docs/20

Read more



Notes

07 Apr 2014

ropensci

Working on proposal. Discussion of section 2: other players in the domain (or rather, the many community interfaces with rOpenSci).

other

rfishbase

Add FAO areas (from 04/05), see rfishbase/20

EML

  • Discussion of unit handling, feedback from Karen. EML/12

  • Need to update creation of filenames for csv files following Matt’s advice: EML/106

  • No luck with XMLSchema::readSchema parsing the measurement unit schema definitions in STMML

> XMLSchema::readSchema("../stmml.xsd")
Error in FUN(X[[1L]], ...) :
  formal argument "localElements" matched by multiple actual arguments

misc/reading

  • teary: discuss interviews. send copy of materials for archive.
  • review request EcoLet
  • follow-up emails from Wisconsin visit. Follow-up reading:

  • RDCEP

  • Here’s the RDCEP robust control paper

  • Here are two papers on the value of experimentation, 1, 2. Key references are the two papers by V. Wieland cited there. Steve and Buz discussed one of these in “Panceas” paper

  • Here’s a link to the BDW Brookings paper which has an interesting discussion by Sargent and Sims

  • This one proposes an approach to reporting to policy makers that respects uncertainties

  • Here is one we did for the World Bank to deal with uncertainties in growth policy

  • Since optimal control and even robust control can lead to controls that are complicated functions of the state vector, there’s a literature in applications to central banking that stresses “simple rules”, here’s a recent example from our shop that has lots of references,

  • Finally, on EWS’s, there’s a general multivariate mathematical approach to variance-convariance matrix approaches to EWS’s in the Mathematical Appendix to this paper by Biggs et al.

Read more



Wisconsin Visit

03 Apr 2014

Carl Boettiger Schedule 3-4 April

Thursday

Person/activity Time Location
Arrive 3:00 Dane County Airport
Tony 4:00 pickup at hotel
Jesse Miller 4:30 Birge 448
Kyle Webert 5:00 Birge 457
Cristina Herren 5:30 Birge 456
Dinner with Tony, Karen Strier 6:00 leave from Birge Hall

Friday

Person/activity Time Location
Breakfast, Meghan Fitzgerald and Fan Huan 7:30 pickup at hotel
Cecile Ane, Lam Si Tung Ho 9:00 Birge 341
Steve Carpenter and CFL folk 10:45 CFL 226A
Jeremy Ash 2:00 Birge 338
Break, walk over with Jacob 2:30 Birge 460
prep for seminar 3:00 Nolan Hall
seminar 3:30 Nolan Hall
reception 6:00 Tony’s house

Misc notes:

  • See email from Buz for robust control reading. email to Karen.

Read more



Ropenhack Notes

31 Mar 2014

Breakfast

  • discussion of what makes an API good or a total pain
  • discussion of extending data.set class with metadata

Morning Session

Chatting with Hadley:

Best practices for R client packages to REST APIs

  • (When) Do we return response call metadata?
  • Very modular function construction
  • Error handling, assertthat checks
  • Explicit parsing, as seperate function. e.g.
  • Better alternatives in place of require when loading SUGGESTS lists, which still pollutes the namespace; see examples in httr.

We’ve begun writing dependencies package to address some of the issues in managing version dependencies of R packages.

Afternoon Session

Working with Martin and Gavin on docs workflow.

One challenge of this setup is passing secure data over travis, such as the keys necessary to execute the git push. Travis has a mechanism for encrypting and transmitting these keys.

Otherwise, working in simplifying and sanitizing the separation between ruby and knitr interfaces in providing a dynamic coding static blogging platform.

Also various interactions and video interview for event documentation.

Evening

Excellent social; discussions with Josh Bloom re BIDS etc

Wednesday

Mostly hacking on docs workflow, see commit log etc.

Read more