Thoughts on the New Policies for Data Archiving at NSF and in Common Journals

This post is a work in progress, a scratch pad for me to start assembling what I’ve been learning about and resources pertaining to the new policies emerging from NSF and journals relevant to ecology and evolution.  Hoping to highlight not only the policies, but the issues, opportunities, and concerns around them.

I am hoping to help organize a workshop to discuss these issues in my department this Winter.  The Davis Open Science group, is planning a series of these workshops, hoping to work with departments, the libraries and the UC Davis Office of Research and its Responsible Conduct of Research program (in compliance with NIH/NSF ethics requirements), as well as resident faculty and editors.  Suggestions on what you would like to see in such a workshop much appreciated.

What are the new NSF policies?

Each discipline has (or will have, see notes and refs  by Heather Piwowar it’s own guidelines, but the basic gist is perhaps best summarized in excerpt from this NSF statement:

This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4 ), and may include:

  1. The types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;

  2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);

  3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;

  4. policies and provisions for re-use, re-distribution, and the production of derivatives; and

  5. plans for archiving data, samples, and other research products, and for preservation of access to them.

Data management requirements and plans specific to the Directorate, Office, Division, Program, or other NSF unit, relevant to a proposal are available at: https://www.nsf.gov/bfa/dias/policy/dmp.jsp. If guidance specific to the program is not available, then the requirements established in this section apply. [….] The Data Management Plan will be reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.

What are the new requirements being set by journals?

The American Naturalist

This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

Whitlock, M.C. et al. Data archiving. The American Naturalist 175, 145-6(2010).

Evolution has a similar policy, outlined in:

Fairbairn, D.J. the Advent of Mandatory Data Archiving. Evolution (2010) (preprint)

Journals with policy in place (links and editions coming)

  • The American Naturalist (see Whitlock, 2010; above)

  • Evolution (see Fairbarin, 2010; above)

  • Journal of Evolutionary Biology (as mentioned in Whitlock)

  • Molecular Ecology (as mentioned in Whitlock)

  • Heredity (as mentioned in Whitlock)

  • ESA journals: Ecology, Ecological Monographs, Ecological Applications, and ** Frontiers**.  Currently have a position in Dryad board and a soft data deposition requirement.  Also maintains the Ecological Archive. (pers. comm.)

What are the available archives?

There’s a variety of potential archives.  Identifying the best archive for the type of data involved may be complicated.

Funding Models & Concerns

Cost is a key concern: good archiving takes resources, and does little good if repositories aren’t maintained or persistent.  While the repositories are supported by their own grants (for the moment), it seems the repositories will generally charge the  journals where the publication accompanying the data will be submitted.  The journals may in turn charge the authors, who will have to write these costs into their NSF grants?

Like the policies themselves, these details are still being actively worked out [ref] Todd Vision, associate director of informatics at NESCENT which oversees the Dryad project currently, discusses this in this FF forum.  [/ref].

Good references on funding models include these papers by Beagerie and others:

Why Archive?  What’s the added value of Repositories?

Dryad has a good explanation of what happens when data is deposited.

The need for data archiving in ecology and evolutionary biology has been persuasively argued in articles in a variety of our journals for some time.    See:

Impact of Sharing and Citing Data

In addition to funding, primary concerns involve how data will be shared and cited.  Done well, researchers are appropriately accredited and rewarded when there data is re-used, making research faster, supported by larger data sets, more reproducible, and carrying a broader impact.  A set of articles highlights many of these issues:

Data citation is closely tied to the discussion of data licenses, such as CC licenses that allow conditional reuse, vs content in the public domain that can be used unconditionally. It seems data is not generally viewed by the law as a creative work but as fact, and thus not under the jurisdiction of intellectual property and copyright licenses. Acknowledging this, repositories such as Dryad identify data as Creative Commons zero, which is essentially an intentional decision to put data into the public domain. See more discussion of when this applies and when data can be cc-by, etc, and Wilbanks’ explanation.

Following the Developments; Sources

The Data Citation collection on Mendeley provides an actively maintained collection of articles discussing practices for depositing, citing, incentives, and tools for data management and publication.

Discussion continues on a few FF forum threads have helped assemble this information:

1 2

EDIT

Dryad maintains an excellent list of the growing number of Ecology and Evolution partner journals that mandate archiving