LibraryGuides: Research Data Management: Sharing Data & Data Repositories

What is Data Sharing

Data sharing is the practice of making research data accessible to other stakeholders, including investigators, research subjects, and the broader public. This often involves submitting data to repositories like Harvard Dataverse, Figshare, or institutional repositories, allowing others to discover and use it. Increasingly, funding agencies, publishers, and research institutions mandate or strongly encourage data sharing to promote transparency, enhance research reproducibility, and increase the impact of research. Proper data citation and publication also ensure that the original researchers receive appropriate credit for their contributions.

What is a Data Repository?

Data repositories are tools for sharing and preserving research data. There are hundreds of repositories worldwide. Some cater to a specific research community, while others are general-purpose. Repositories may be called data centers, data archives, or scientific databases.

They are often divided into three categories:

Institutional Repositories (IRs) are affiliated with a researcher’s institution.

Domain-specific or Disciplinary Repositories (DRs) are discipline-specific and often operated by a professional organization, a consortium of researchers, or a similar group.

General-purpose or Open Repositories (ORs) allow researchers to deposit and make their data available regardless of disciplinary or institutional affiliation.

Finding Data Repositories

Re3data.org
Re3data is a registry of data repositories, covering a wide range of disciplines from around the world. It allows researchers to search for repositories in their discipline and to identify relevant polices and terms of use.
FAIRSharing Catalog
FAIRSharing is a curated catalog of databases, along with associated standards and policies. It also includes standards and databases recommended by journal or funder data policies.
Scientific Data Recommended Repositories
A list of disciplinary and open repositories evaluated to ensure that they meet the data access, preservation and stability requirements of Nature's Scientific Data journal.
NIH Data Repositories
National Institutes of Health-supported data repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network.

Disciplinary Repositories

Disciplinary data repositories are set up to accommodate the data needs of a specific research community. They are the most likely to offer both the specialist domain knowledge and the data management expertise needed to ensure data are properly kept and used.

They may provide the ideal solution to meet data archiving and public access expectations of funding agencies, publishers, and the researcher community. However, they are also the most likely to be selective, requiring advance planning to meet standards for metadata and documentation.

Using Disciplinary Data Repositories
Since there are many data repositories, it is important to review terms and conditions before use.

1. Is the repository reputable and who supports it?
It may be listed in re3data, FAIRSharing, or broadly recognized by the research community. Better yet, it is endorsed by a journal, funder, or professional society.

2. Will it take data you want to deposit and how are data deposited?
Data may need to be of a particular type and file format. Some repositories allow self-deposit while others mediate deposit.

3. Will the repository be safe in legal terms?
Some repositories may be capable of safely storing sensitive or restricted data, while others may not. Ideally the repository allows depositors to assign terms of use and licenses.

4. Will the repository sustain the data value?
A repository can add value by making data findable, accessible, interoperable and reusable (FAIR) for the long term. This includes assigning persistent identifies (like DOIs) to datasets, requiring standard metadata for discoverability, and conducting file preservation activities.

5. Will it support analysis and track data usage?
Repositories may also provide citation information to users and usage tracking for the depositor.

From: Whyte, A. (2015). ‘Where to keep research data:DCC checklist for evaluating data repositories’ v.1 Edinburgh: Digital Curation Centre. Available online: www.dcc.ac.uk/resources/how-guides

Generalist Repositories

Generatlist repositories accept data regardless of data type, format, content, or disciplinary focus. Subject specific repositories are preferred if an appropriate one is available for your research topic or data type.

Generalist Repository Comparison Chart
Compares size limits, licensing options, costs, etc.
Best Practices for Data Submission in Generalist Repositories: A Checklist
Harvard Dataverse
Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data.
Dryad
Dryad is an open data publishing platform and community committed to the open availability and routine re-use of all research data. Dryad fully curates all data and metadata and publishes exclusively under a Creative Commons Public Domain License (CC0).
FigShare
Figshare is a freely available open data publishing platform for all researchers where they can share and get credit for all types of scholarly output including any file type from any research discipline. The Figshare+ repository supports sharing of larger datasets.
Mendeley Data
Mendeley Data is a free repository specialized for research data. Search more than 20+ million datasets indexed from 1000s of data repositories and collect and share datasets with the research community following the FAIR data principles.
Open Science Foundation (OSF)
OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle in open science best practices.
Vivli
Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform. Our focus is on sharing individual participantlevel data from completed clinical trials to serve the international research community.
Zenodo
Powering Open Science, built on Open Source. Built by reserachers for researchers. Run from the CERN data centre, whose purpose is long term preservation of digital objects. CERN maintains one of the largest scientific datasets in the world for
high-energy physics.

Research Data Management

Librarian

Data Services - Phoenix

What is Data Sharing

What is a Data Repository?

Finding Data Repositories

Disciplinary Repositories

Generalist Repositories