In 1884, the Lord Kelvin had the following to say about inadequate measurement systems:
"You, in this country, are subjected to the British insularity in weights and measures; you use the foot, inch and yard. I am obliged to use that system, but must apologize to you for doing so, because it is so inconvenient, and I hope Americans will do everything in their power to introduce the French metrical system. ... I look upon our English system as a wickedly, brain-destroying system of bondage under which we suffer. The reason why we continue to use it, is the imaginary difficulty of making a change, and nothing else; but I do not think in America that any such difficulty should stand in the way of adopting so splendidly useful a reform." [Source]
120 years later, America (and, sadly, much of Cascadia,) still hasn't heeded His words. To the contrary, we have shackled ourselves with an additional modern form of measuremental bondage that is even more brain-destroying than anything the most wicked Brit could have devised -- one that even perverts the system that Kelvin advocated. I am speaking of the units we use to measure data on computers; the bytes, kilobytes, megabytes, gigabytes, terabytes, and, yes, even petabytes that we have all become so familiar with, and yet can be so confused by.
Let's start with the basic units. The bit, in case you didn't know, is the smallest unit of information in a binary system. There are no fractional bits, and the term is unambiguous. This is an acceptable unit. The byte is a little more convoluted. In present-day usage, 1 byte = 8 bits. However, the term originally referred to the number of bits needed to encode a character. Consequently, there were computer systems where bytes were different numbers of bits. This system-specific functional term only later became a general unit of information when the 8-bit character size became a standard, resulting in one term having two incompatible meanings (albeit one now considered obsolete) in the same field.
But the real confusion comes when bit and byte are used together. The abbreviation for byte is uppercase B, whereas the abbreviation for bit is lowercase b. In theory this seems simple and even eloquent, but in practice people often use B/b indiscriminately, usually out of ignorance of the difference (not to mention problems caused by caps lock scofflaws and e. e. cummings wannabes.) Oddly, the original "bite" was given a "y" so that it wouldn't be misspelled "bit," but this rather obvious abbreviation problem was overlooked.
The next level of trouble comes from the prefixes used with these two terms. How many bytes are in a kilobyte? The answer depends on whom you ask. Computer science people would say 1 kilobyte = 1,024 bytes (1,024 is a round number in binary notation: 10000000000.) But this ignores the accepted meanings of the Metric prefixes (kilo = 1,000, mega = 1,000,000, etc.) which means proponents of the Metric System correctly reject this usage as improper.
Now if it was just people in an unrelated field being persnickety then maybe this wouldn't really be a practical problem for computer users; However, the proper Metric meaning of the prefixes are used by some in the computing industry, although often out of ulterior motives. For instance, if a harddrive manufacturer says a drive has 10 gigabytes, then it actually has 10,000,000,000 bytes, not 10,737,418,240 bytes as your operating system would measure 10 gigabytes -- you may be getting less than you think you're getting. The result of this mix of proper and improper use of Metric prefixes is ambiguity and the potential for errors (or dishonest pricing).
So how should we solve these problems? For starters, we need to replace the term byte with one that has a more obvious abbreviated distinction from bit. Just as Kelvin urged Americans to follow the lead of the French, I am urging everyone to use the French term for 8-bits: octet. This term -- born out of anglophobia -- is both unambiguous and descriptive. Octet literally means a group of eight. In the context of informational measurement, it means 8 bits. Octet is abbreviated o, so there's no confusing it for bits. Plus, the French have already been using it for years, with no problems.
Next, we need to consistently stop misusing the Metric prefixes. A kilo is defined as 1,000 and it should never be used for something else. Instead, we should widely adopt the binary prefixes that were approved as a standard by the International Electrotechnical Commission (IEC) in 1998. These prefixes are as follows:
- kibi- (Ki) = 210 (1024)
- mebi- (Mi) = 220 (1048576)
- gibi- (Gi) = 230 (1073741824)
- tebi- (Ti) = 240 (1099511627776)
- pebi- (Pi) = 250 (1125899906842624)
(For more, see the NIST reference page on prefixes for binary multiples.)
Thus, the old, tired, and confused kilobyte (8,192 bits) will be reborn as the kibioctet (Kio). So, computer users, say hello to your new friend kibioctet, as well as his buddies mebioctet (Mio), gibioctet (Gio), tebioctet (Tio), and, yes, even pebioctet (Pio).
UPDATE (2004-06-21): Here are some trendy badges so you can educate your visitors of these important new data measurement terms and show your site's support of sensible standards:
UPDATE (2004-07-06): This article, with additional information, can now be found at ZPi Labs: Kibioctets. Any further updates will go on that page. If you want to link to this information, link there.