The file formats you use have a direct impact on your ability to open those files at a later date and on the ability of other people to access those data.
Proprietary vs. open formats
You should save data in a non-proprietary (open) file format. Conversion to an open data format should not result in data loss from your files. The Library of Congress has published a Recommended Formats Statement that discusses this topic in great depth.
Guidelines for choosing formats
When selecting file formats for archiving, the formats should ideally be:
- In common usage by the research community
- Adherent to an open, documented standard
- Interoperable among diverse platforms and applications
- Fully published and available royalty-free
- Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
- Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.
Some preferred file formats
- Containers: TAR, GZIP, ZIP
- Databases: XML, CSV, JSON
- Geospatial: SHP, DBF, GeoTIFF, NetCDF
- Moving images: MOV, MPEG, AVI, MXF
- Sounds: WAVE, AIFF, MP3, MXF
- Statistics: ASCII, DTA, POR, SAS, SAV
- Still images: TIFF, JPEG, PNG
- Tabular data: CSV
- Text: XML, HTML, ASCII, UTF-8