Subject Research, Course Guides, Documentation: Research Data Management Services: Data Management Tips

Data Management Tips

File naming conventions, schemas, metadata, and file type management can help you ensure that the right data is being preserved and disseminated.

File management: names, formats, dictionaries

File management helps during the research process by providing consistency and enabling easier sharing of materials. File management consists of three facets: naming conventions, file formats, and data dictionaries.

Naming conventions

Meaningful file names and file folders. As you document your research's lifecycle, begin to think about that lifecycle in consistent terms. For example, if your lab work includes repeated measurements during an experiment, what are the facets that are important to that measurement -- date of measurement, parameters around how it was taken, particular equipment used, researcher's initials, lab name, etc.? These can serve as building blocks for your file folders and file names.

Length, characters, and spacing. Whenever possible, shorter names (under 25 characters) are preferable. Avoid using spaces in your file names, and use periods (.) only to separate your meaningful name from file extension (like .csv or .mov). Avoid special characters whenever possible. Instead, use underscores (_) or dashes (-) to provide separations between meaningful facets of your file name.

Versioning. Account for versioning whenever possible. As you decide what constitutes a version, ensure that your project uses this versioning consistently. Use leading zeroes or string padding to help with consistency as well. For example, if you have 10 versions of a file, you want to ensure these versions are numerically consistent with leading zeroes so that they sort properly:

file1_v1.csv
file1_v2.csv
file1_v10.csv
file2_v1.csv
file10_v1.csv

could unintentionally sort as:

file1_v1.csv
file1_v10.csv
file1_v2.csv
file10_v1.csv
file2_v1.csv

Instead, use:

file01_v01.csv
file01_v02.csv
file01_v10.csv
file02_v01.csv
file10_v01.csv

File formats

As you review your research output, determine if your format is proprietary (can only be opened by specific software) or exportable to commonly used software. If there are guidelines for deposit to a repository, are there specific requirements you need to meet? Finally, what features may be lost during export of that data?

It is often useful to think about file formats in classifications to aid in export and reuse. Stanford's data management services provides the following guidance:

Containers: TAR, GZIP, ZIP
Databases: XML, CSV
Geospatial: SHP, DBF, GeoTIFF, NetCDF
Moving images: MOV, MPEG, AVI, MXF
Sounds: WAVE, AIFF, MP3, MXF
Statistics: ASCII, DTA, POR, SAS, SAV
Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
Tabular data: CSV
Text: XML, PDF/A, HTML, ASCII, UTF-8
Web archive: WARC

Data dictionaries

Data dictionaries provide documentation for what fields are inside your data set. This helps with understanding the data set's structure, variable names, and relationships between fields. The dictionary does not have to be elaborate but should document aspects like variable name, units, allowed values, and description. OSF's guidance provides a general overview, but disciplines or funding agencies may have their own standards and requirements.

Metadata

Metadata provides information about what is inside the data set to aid in discovery and use of the materials you have produced during your research. It allows us to know:

Who created the data
How it was created
When it was created
What tools might be needed to view and use the data
Understand any rights and use conditions
Connect to related objects

For more information, see University of Pittsburgh's Research Data Management guide

Research at Schaffer Library