The Yangambi (DRC) meteorological office

File formats and naming convention guidelines

best practices in dealing with digital data...

This is an almost verbatim copy of the best practices for file naming and formats of the Stanford Library. Within the COBECORE project we will adhere to these naming and format guidelines with a strong emphasis on completely open formats and human readable data.

Advised standards are highlighted as bold, but due to connections with previous projects different standards might be preferred. In such cases, consistency is key to complementarity with previous or ongoing work.

File naming

How you organize and name your files will have a big impact on your ability to find those files later and to understand what they contain. You should be consistent and descriptive in naming and organizing files so that it is obvious where to find specific data and what the files contain.

It’s a good idea to set up a clear directory structure that includes information like the project title, a date, and some type of unique identifier. Individual directories may be set up by date, researcher, experimental run, or whatever makes sense for you and your research.

Information for file names

File names should allow you to identify a precise experiment from the name. Choose a format for naming your files and use it consistently.

You might consider including some of the following information in your file names, but you can include any information that will allow you to distinguish your files from one another.

  • Project or experiment name or acronym
  • Location/spatial coordinates
  • Researcher name/initials
  • Date or date range of experiment
  • Type of data
  • Conditions
  • Version number of file
  • Three-letter file extension for application-specific files

Another good idea is to include in the directory a readme.txt file that explains your naming format along with any abbreviations or codes you have used.

Tips for file naming

  • A good format for date designations is YYYYMMDD or YYMMDD. This format makes sure all of your files stay in chronological order, even over the span of many years.
  • Try not to make file names too long, since long file names do not work well with all types of software.
  • Do not use special characters such as ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “
  • When using a sequential numbering system, using leading zeros for clarity and to make sure files sort in sequential order. For example, use “001, 002, …010, 011 … 100, 101, etc.” instead of “1, 2, …10, 11 … 100, 101, etc.”
  • Do not use spaces. Some software will not recognize file names with spaces, and file names with spaces must be enclosed in quotes when using the command line. Other options include:
    • Underscores, e.g. file_name.xxx
    • Camel case, where the first letter of each section of text after the first word is capitalized, e.g. fileName.xxx

Renaming files

You may already have a lot of data collected for your project and wish to organize and rename these files for easier data management. If you have too many files to rename them all by hand, try one of the following applications for renaming your files:


File formats

The file formats you use have a direct impact on your ability to open those files at a later date and on the ability of other people to access those data.

Proprietary vs. open formats

You should save data in a non-proprietary (open) file format. Conversion to an open data format should not result in data loss from your files. The Library of Congress has published a Recommended Formats Statement that discusses this topic in great depth.

Guidelines for choosing formats

When selecting file formats for archiving, the formats should ideally be:

  • Non-proprietary
  • Unencrypted
  • Uncompressed
  • In common usage by the research community
  • Adherent to an open, documented standard
  • Interoperable among diverse platforms and applications
  • Fully published and available royalty-free
  • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
  • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

Some preferred file formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV, JSON
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG, PNG
  • Tabular data: CSV
  • Text: XML, HTML, ASCII, UTF-8