Research Data: Defining Data

What is Research Data?

Funding agencies have policies regarding how data are archived and/or made available. But, what do they mean by “data”? 

The United States Office of Management and Budget defines data in the following way: 

"Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analysis, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include:

  • (A) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
  • (B) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.” (emphasis added)

In a practical context...

Consider what set of files and information you would need to provide someone if s/he wanted to validate your published research findings. We should not and cannot save everything; prioritize the subset of your data that meets the definition above. As an example, raw data files may be critical in the initial processing phase of your research project, but they might become useless after the data have been converted to a more workable format (say, binary instrument output converted to ASCII). When facing the reality that you can’t save and manage every digital bit that you’ve ever collected, consider what would be useful to you or others if you had to reproduce your results.