Notes

Assembling various loose ends and some ropensci work.

Misc things to work out and write at some date

Catch up on literature reading (unsorted list in Mendeley). Get entries into notebook.
Write thoughts on selecting links anchor text in the notebook
knitr script workflow: externalizing chunks, cache and figure prefixes.
branching projects and unique repositories
source code for all manuscripts into repositories
clean up directories. in particular, establish separate prosecutors-fallacy repository
Comments on workflow: Date-oriented (notebook) vs File-oriented (Github) explanation. (Or is Github both, via history button, and the notebook just the summary that links to it?)

ropensci

As per yesterday’s announcement, we are officially funded by a generous grant from the Alfred P. Sloan Foundation!

Today:

Writing feedback on goals, comments to advisors

taxize

My friendly review of Scott’s taxize manuscript (pdf)

ropensci website

Scott has a good question about automatic TOC sidebars.

Can generate TOC with redcarpet filter, but it grabs all headers. Unclear how to modify so that it grabs only headers above a certain level. See my attempt as an issue in redcarpet repo. Similarly pandoc only provides --toc as a compile flag, which does not give us the same control over it’s placement in the HTML.

misc semantics

Interesting example of RDF visualization and linking to external linked-data resources. In particular, good use/example of linking to DBpedia information. See issue #63

Misc

Open peer review on copernicus platform
Smith fellows 2014 applications open
Neil’s five stars of research software
Titus’s academic software notsuck
Nature Chemistry’s take on Twitter for publishers
A Nature Chemistry blog reflects on Pasteur’s Quadrant
G8 Science ministers call for global research infrastructure, open scientific research data and expanded access to results

How I use git with svn repositories

I don’t use SVN much, but in my opinion “commits” mean something different in SVN then in Git anyway, so I’d rather not entangle the commit history. Because merging is hard in SVN and all local changes are slave to the master central repository, commits tend to be larger, more polished things. Commits and merges are cheap and easy in git, so I commit much more often. By having to commit twice, I am deliberate about whether I’m committing to SVN, with all the overhead that involves, or whether I’m just committing to git.

I think my approach is much easier than this

My replies to data quality survey

What constitutes data quality?

Statistical quality Measurement error, the degree to which potential covariates are either controlled for or co-measured, and the sample size.

Computational quality Open and standardized format (csv, xml in defined schema of the field, etc) Thorough metadata conforming to tightly defined and verifiable schema. Terms defined as URIs / linked data. Openly accessible under permissive licensing.

How might data quality be assessed generally?

Statistical quality depends on what the researcher wants to do with it, and is generally easy for the researcher to assess. All the elements listed under statistical quality should simply be included in metadata, and researchers can draw their own conclusions about sample size, covariates, etc.

Computational quality measurement should reflect data management practices more broadly. Something like Tim Berners Lee’s five-star mechanism for linked data, or badges that indicate if data includes verification tools, etc.

DataCite’s Role?

Good quality data should be incentivised the same way good publications are. If good quality data can make a career, researchers will rise to the challenge with a little help for proofreading and formatting. Ease of depositing data in a useful format and sharing that data, along with metrics to reflect the impact it has, can go a long way in this regard.