Electronic documents are everywhere and the old method of using pen-and-paper are becoming increasingly scarce. However, the introduction of electronic formats has introduced a new level of complexity to the system that has significant implications on society. A paper document is physical thing. It can be easily read (as long as the handwriting is readable) and as long has you have a pencil or a pen, you can write you own. As long as it is not damaged by water or fire, they tend to last a long time and are often readable after hundreds of years. On the other hand, electronic documents are a subtle combination of electric charges. In order to read or write the document, you must have knowledge that tells you how those electronic charges are ordered. This knowledge is called the file format. You will also software that understands this file format and can read or write it for you.
Often when a file format is created, the knowledge of the file format are given out and people are given permission to use the format freely. Then software developers get together and and create software that can use this file format. Finally, people can get this software and use the file format. A format like this, that does not have restrictions on its use, is known as a free file format.1)
However, not all file formats are free. Sometimes a file format is created and the owners keep the knowledge of the format secret. They might let others use it under limited circumstances, but there are discriminating restrictions on its use. Often they will develop special software to use the format, and people are forced to use the software if the want to use the file format. Those who chose to use the format will be bound by any license restrictions on that software. These kind of formats are called non-free.2)
As the present moment, non-free file formats have a number of advantages that encourages their use. Sometimes there are no equivalent free formats. When there is a competing free format, software writers may refuse to support it because it threatens part of their business. Sometimes restrictions on non-free formats are only limited. Finally, most people will not immediately benefit from using free formats.
The number of cases where there is no equivalent free file format have gotten much smaller and more esoteric over the past decade, but they still exist. For example, Macromedia Flash was the only way to transmit vector graphics over the Internet before the advent of SVG3).
Commercial interests are often thrown into the midst as well. For example, Microsoft Word uses several non-free formats. Microsoft Word has a monopoly on word processing software because other software can not easily read or write to those file formats. Recently a free format has arisen called the OpenDocument Format (ODF). ODF support would allow Microsoft Word to be compatible with dozens of other software 4). However, Microsoft refuses to support it because that would allow other word processing software to compete with Microsoft Word. A good example of this is the recent push for ODF in Massacusets which has been stalled by a non-profit organization named Citizens Against Government Waste which receives funding from Microsoft.
In addition to this, sometimes owners of non-free formats will allow use of their formats, but restrict that use. They may charge a fee for its use or have a restriction that says that it can be used as long as it is not for commercial use. Those who do not fall under the restricted categories might end up using the format.
To the consumer, the benefits of free file formats are not immediately visible. However, this can be a double edged sword because by the time these issues become visible, it is too late. Mark Pilgrim used iMovie, iTunes, and iPhoto for many of years and when he finally decided to switch to some other software he found that:
Years of creating content, most recently video content in iMovie. Home movies of my children being born and growing up … All editability is lost. All my iTunes ratings and playlists are lost. All my iPhoto tags and ratings are lost.
Earlier I made a rather strong claim that electronic formats have significant implications on society, so now I will demonstrate the reason for this claim. First, free formats are essential to the success of our democracies. Second, free enterprise ultimately requires free formats. Finally, free formats are necessary for long-term data preservation and our ability to remember history.
Democracy requires free information. James Madison, the author of the United States Bill of Rights, understood this when he wrote:
A popular Government without popular information or the means of acquiring it, is but a Prologue to a Farce or a Tragedy or perhaps both. Knowledge will forever govern ignorance, and a people who mean to be their own Governors, must arm themselves with the power knowledge gives.
Additionally, free information is only possible with free formats. If the file format is not freely readable, than the information contained inside can not be considered free. Jimmy Wales, the founder of the Wikipedia, writes:
If we offer information in a proprietary or patent-encumbered format … we are forcing others who want to use our allegedly free knowledge to themselves use proprietary software.
Thus, democracy requires free information, and that information must be in a free format.
Free enterprise is a fundamental part of modern economics. The idea is that people are free to make money as they see fit.5) Once a non-free format becomes established as a monopoly it becomes abusive. The owners have full control over the format and the users of the format are bound to use it if they in order to access their current documents and files. This creates a high barrier for anyone interested in using an alternative format, a situation called vendor lock-in. Vendor lock-in kills free enterprise by keeping new companies from competing with the established monopoly.
The last item is one that is not limited to this debate, but this debate is an important factor. Previous technologies for storing knowledge were fairly long lasting. Paper can get burnt and rocks can crack, but baring a catastrophe they tended to last many life-times. Electronic formats give us a whole new way to shoot ourselves in the foot. After a bad experience with Microsoft Word, Neal Stephenson wrote:
There are very few fixed assumptions in [writing], but one of them is that once you have written a word, it is written, and cannot be unwritten. The ink stains the paper, the chisel cuts the stone, the stylus marks the clay, and something has irrevocably happened (my brother-in-law is a theologian who reads 3250-year-old cuneiform tablets—he can recognize the handwriting of particular scribes, and identify them by name). But word-processing software—particularly the sort that employs special, complex file formats—has the eldritch power to unwrite things. A small change in file formats, or a few twiddled bits, and months' or years' literary output can cease to exist.
Non-free software makes the problem much worse. Say the owners of the format die, or their company gets sold out to someone else who decides to kill the format, or the owners update the format and the new version is not compatible with the old one. As long as the format is free and documented, someone knowledgeable can write software to read or convert it. However, users non-free formats are at the mercy of the format's owners. They may provide tools to covert the old documents, but even that is not certain.
So that is the problem, but merely understanding a problem does us no good unless there is some solution. At the moment there is no single solution, but rather a number of half-solutions. It seems unlikely that a completely satisfactory solution will arise in the near future, at least as long as non-free formats exist.
One solution is to settle on the simplest common denominator. Use plain ASCII text for everything. This has obvious disadvantages. you would have to abandon any hope of internationalization because most languages have character that are not among the 128 ASCII characters. Any data that is not representable in text (like images) will not work either. In spite of these limitations it is useful when preserving data is of the utmost importance. The Gutenberg Project does this.
Another solution is to simply ignore the problem and continue to use what works best at the moment. This the easiest solution and it seems to be the most popular at the moment. However, as long as non-free formats continue to be in wide spread use, well will suffer from the problems associated with them.
The final solution is to switch to free formats and mandate that only those be used. This is no easy task, but some have decided that the benefits outweigh the disadvantages. Just recently the French National Assembly standardized on using the OpenDocument Format for storage and exchange of government documents.