Skip to Main Content

Research Data Management Services

Data Management Tips

File naming conventions, schemas, metadata, and file type management can help you ensure that the right data is being preserved and disseminated.

File management: names, formats, dictionaries

File management helps during the research process by providing consistency and enabling easier sharing of materials. File management consists of three facets: naming conventions, file formats, and data dictionaries.

Naming conventions

Meaningful file names and file folders. As you document your research's lifecycle, begin to think about that lifecycle in consistent terms. For example, if your lab work includes repeated measurements during an experiment, what are the facets that are important to that measurement -- date of measurement, parameters around how it was taken, particular equipment used, researcher's initials, lab name, etc.? These can serve as building blocks for your file folders and file names. 

Length, characters, and spacing. Whenever possible, shorter names (under 25 characters) are preferable. Avoid using spaces in your file names, and use periods (.) only to separate your meaningful name from file extension (like .csv or .mov). Avoid special characters whenever possible. Instead, use underscores (_) or dashes (-) to provide separations between meaningful facets of your file name.

Versioning. Account for versioning whenever possible. As you decide what constitutes a version, ensure that your project uses this versioning consistently. Use leading zeroes or string padding to help with consistency as well. For example, if you have 10 versions of a file, you want to ensure these versions are numerically consistent with leading zeroes so that they sort properly:

  • file1_v1.csv
  • file1_v2.csv
  • file1_v10.csv
  • file2_v1.csv
  • file10_v1.csv

could unintentionally sort as:

  • file1_v1.csv
  • file1_v10.csv
  • file1_v2.csv
  • file10_v1.csv
  • file2_v1.csv

Instead, use:

  • file01_v01.csv
  • file01_v02.csv
  • file01_v10.csv
  • file02_v01.csv
  • file10_v01.csv

File formats

As you review your research output, determine if your format is proprietary (can only be opened by specific software) or exportable to commonly used software. If there are guidelines for deposit to a repository, are there specific requirements you need to meet? Finally, what features may be lost during export of that data?

It is often useful to think about file formats in classifications to aid in export and reuse. Stanford's data management services provides the following guidance:

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

Data dictionaries

Data dictionaries provide documentation for what fields are inside your data set. This helps with understanding the data set's structure, variable names, and relationships between fields. The dictionary does not have to be elaborate but should document aspects like variable name, units, allowed values, and description. OSF's guidance provides a general overview, but disciplines or funding agencies may have their own standards and requirements.

Metadata

Metadata provides information about what is inside the data set to aid in discovery and use of the materials you have produced during your research. It allows us to know:

  • Who created the data
  • How it was created
  • When it was created
  • What tools might be needed to view and use the data
  • Understand any rights and use conditions 
  • Connect to related objects

For more information, see University of Pittsburgh's Research Data Management guide