Skip to main content
Stony Brook University

Research Data: File Organization

What Should I Focus on When Organizing Data?

There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues. Consider the following things as you organize your data:

  • File version control (see tools at right)
  • Directory structure & file naming conventions (see below)
  • File naming conventions for specific disciplines (see right)
  • File structure
  • Use same structure for backups

File Naming Best Practices

File names should provide context for the files that they name, and distinguish them from files that may be similar. Many files are used independently of their file or directory structure, so provide sufficient description in the file name.

1. Be consistent

  • Have conventions for naming:
    • Directory structure
    • Folder names
    • File names
  • Always include the same information (e.g. date and time)
  • Retain the order of information (e.g. YYYYMMDD, not MMDDYYY )

2. Be descriptive so others can understand your meaning.

Try to keep file and folder names under 32 characters.

Within reason, Include relevant information such as:

  • Unique identifier (ie. Project Name or Grant # in folder name)
  • Project or research data name
  • Conditions (Lab instrument, Solvent, Temperature, etc.)
  • Run of experiment (sequential)
  • Date (in file properties too)
  • Use application-specific codes in 3-letter file extension and lowercase: mov, tif, wrl
  • When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.
  • No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > -
  • Use only one period and before the file extension (e.g. name_paper.doc
    NOT name.paper.doc OR name_paper..doc)

example: Project_instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext

Directory Structure Naming Conventions

The structure of directories/folders for organizing the files should also have a clear, documented naming convention.

The top-level folder or directory should include the project title, unique identifier, and date (year).

Directories/folders within the substructure should be divided by a common theme. For example. each folder may contain a run of an experiment or a different version of each dataset.


Adapted from: GeorgiaTech and University of Oregon

Data Organization Tools

Data Identifiers

Datasets identifiers will allow your data to be referenced and shared. Data identifiers must be globally unique and persistent: they must not be repeated elsewhere and they must not change over time.

Identifier schemes:

URI Uniform Resource Identifier
PURL Persistent Uniform Resource Locator
DOI Digital Object Identifier
HDL The Handle System
InChI IUPAC International Chemical Identifier 

File Naming Conventions for Specific Disciplines

Many communities of practice have standard recommendations, for example:

File Renaming

Version Control

Workflow Tools

Bibliographic Management

(Adapted from GeorgiaTech)