Skip to Main Content
Stony Brook University

Research Data Guide

Resources to help you manage your research data.

File Names and Variables

File Names

File names should:

  1. Represent the contents of the file. Though file names shouldn't be excessively long, wherever possible they should capture the key information about file data.
  2. Have clear, intuitive names where possible. You might know what "DV" means right now, but it will be helpful to future researchers if you specify "DataValidation" in your file name.
  3. Allow for the full scope of your data set. Files will sort neatly if all numerals have the same number of digits—for example, if you have a few thousand soil samples, use a name like SoilSample0001 instead of SoilSample1. 
  4. Be unique. You don't want to end up with multiple files named Samples.xlsx, even if they are in different folders to start.
  5. Not include characters other than numbers, letters, and underscores. Special characters can cause problems when migrating data, even if they are permitted in the system where the file is created.
  6. Be consistent and recordable. Prepare documentation so that others can easily parse your filenames!

Here's an example of a well-constructed, well-documented file name:

MaizeRootCarbon_012_056_af_0423_raw.csv
MaizeRootCarbon = experiment name
012   = experiment number
056   = sample number
af   = stain used, acid fuchsin
0423  = 2-digit coordinates of image (4 across, 23 down)
raw  = data stage

Variables

Variables should: 

  1. Have human-readable and understandable names. You might know what the variable "UPID" refers to right now, but it will be helpful to future researchers if you specify "University Personal Identification Number" as you record your data.
  2. Have a clear meaning. For example, a variable labeled "senior" is human-readable, but could be recording a number of different pieces of information: is the subject a senior citizen? Senior researcher? Senior in high school? What is the standard for any of these things? A variable name like "Over_65" is more meaningful.
  3. Be measured and recorded consistently. Do not measure in ounces for one experiment and grams in another. Do not record Last Name, First Name in one experiment and Last Name, First Initial in another. 

You can help keep your variable usage clear by using a data dictionary. Data dictionaries are documents that include variable names along with their descriptions, data types (such as date or integer), units of measurement, possible values, etc.

Data Standards

Using standards as you collect your data will help you and any future researchers trying to make sense of your data. Data standards are specific to particular type of experiment or field of study. They can define: 

  • What data to collect. Are there specific metadata or variables you need to record? 
  • How to represent that data. What variable name structure should you use?
  • What vocabulary to use. Is there a controlled vocabulary or ontology relevant to your needs?
  • Hw to communicate the data. Should a particular data model be used to transmit the information? 

A standard can involve any of all of these components. There is no standard for standards, and many thousands of data standards have been developed. Unless a standard is used by others in the field, it has very limited utility—so when determining whether to use a particular standard, always find out if a particular standard has widespread support or is, for example, just used by the lab that developed it. 

Where to find standards

Lab Notebooks

Electronic lab notebooks can help provide infrastructure for organizing and managing your data. There are many different lab notebooks on the market, all with different affordances. What works for one researcher or lab may not work for another: Harvard Medical School has developed an expansive comparison grid to help you evaluate lab notebooks.