Research Labs

ZPi Labs |



The Sensible Way Of Measuring Binary Data

A proposal by Lyle Zapato

The Problem

In 1884, the Lord Kelvin had the following to say about inadequate measurement systems:

Lord Kelvin "You, in this country, are subjected to the British insularity in weights and measures; you use the foot, inch and yard. I am obliged to use that system, but must apologize to you for doing so, because it is so inconvenient, and I hope Americans will do everything in their power to introduce the French metrical system. ... I look upon our English system as a wickedly, brain-destroying system of bondage under which we suffer. The reason why we continue to use it, is the imaginary difficulty of making a change, and nothing else; but I do not think in America that any such difficulty should stand in the way of adopting so splendidly useful a reform." [Source]

120 years later, America (and, sadly, much of Cascadia,) still hasn't heeded His words. To the contrary, we have shackled ourselves with an additional modern form of measuremental bondage that is even more brain-destroying than anything the most wicked Brit could have devised -- one that even perverts the system that Kelvin advocated. I am speaking of the units we use to measure data on computers; the bytes, kilobytes, megabytes, etc. that we have all become so familiar with, and yet can be so confused by.

Let's start with the basic units. The bit, in case you didn't know, is the smallest unit of information in a binary system. There are no fractional bits, and the term is unambiguous. This is an acceptable unit. The byte is a little more convoluted. In present-day usage, 1 byte = 8 bits. However, the term originally referred to the number of bits needed to encode a character. Consequently, there were computer systems where bytes were different numbers of bits. This system-specific functional term only later became a general unit of information when the 8-bit character size became a standard, resulting in one term having two incompatible meanings (albeit one now considered obsolete) in the same field.

But the real confusion comes when bit and byte are used together. The abbreviation or symbol for byte is uppercase B, whereas the symbol for bit is lowercase b. In theory this seems simple and even eloquent, but in practice people often use B/b indiscriminately, usually out of ignorance of the difference (not to mention problems caused by caps lock scofflaws and e. e. cummings wannabes.) Oddly, the original "bite" was given a "y" so that it wouldn't be misspelled "bit," but this rather obvious abbreviation problem was overlooked (even more odd considering the widespread use at the time of text based computer systems with no lower case letters).

The next level of trouble comes from the prefixes used with these two terms. How many bytes are in a kilobyte? The answer depends on whom you ask. Computer science people would say 1 kilobyte = 1,024 bytes (1,024 is a round number in binary notation: 10000000000.) But this ignores the accepted meanings of the SI prefixes (kilo = 1,000, mega = 1,000,000, etc.) which means proponents of the Metric System correctly reject this usage as improper.

Ambiguous Meaning Of Prefixes
UnitCS UseSI Use
kilobyte1,024 bytes1,000 bytes
megabyte1,048,576 bytes1,000,000 bytes
gigabyte1,073,741,824 bytes1,000,000,000 bytes

Now if it was just people in an unrelated field being persnickety then maybe this wouldn't really be a practical problem for computer users; However, the proper Metric meaning of the prefixes are used by some in the computing industry, although often out of ulterior motives. For instance, if a harddrive manufacturer says a drive has 10 gigabytes, then it actually has 10,000,000,000 bytes, not 10,737,418,240 bytes as your operating system would measure 10 gigabytes -- you may be getting less than you think you're getting. The result of this mix of proper and improper use of Metric prefixes is ambiguity and the potential for errors (or dishonest pricing).

When we combine the two problems above, things get even worse:

Compounded Confusion
Symbol UsedPossible MeaningsSymbol UsedPossible Meanings
kb or KB1,000 bitskB8,000 bits
1,024 bits8,192 bits
8,000 bitsPerson is 133t?
8,192 bits

There's also the possibility that various mixes of these different interpretations could all end up being fed into a single calculation, resulting in errors even greater than above and potentially leading to disaster (much like when the mixing of metric and imperial measurments resulted in NASA's Mars Orbitor being lost in space.)

The Solution

So how should we solve these problems? For starters, we need to replace the term byte with one that has a more obvious abbreviated distinction from bit. Just as Kelvin urged Americans to follow the lead of the French, I am urging everyone to use the French term for 8-bits: octet. This term -- born out of anglophobia -- is both unambiguous and descriptive. Octet literally means a group of eight. In the context of informational measurement, it means 8 bits. Octet is abbreviated o, so there's no confusing it for bits. Plus, the French have already been using it for years, with no problems. Thus:

Base Units Of Binary Information
bitbthe quantum of binary information
octeto8 bits (formerly 1 byte)

Next, we need to consistently stop misusing the Metric prefixes. A kilo is defined as 1,000 and it should never be used for something else. Instead, we should widely adopt the binary prefixes that were approved as a standard in 1998 by the International Electrotechnical Commission (IEC -- whose first president, incidentally, was Lord Kelvin). These prefixes are as follows:

Binary Multiple Prefixes
kibi-Ki210 (1,024)
mebi-Mi220 (1,048,576)
gibi-Gi230 (1,073,741,824)
tebi-Ti240 (1,099,511,627,776)
pebi-Pi250 (1,125,899,906,842,624)
exbi-Ei260 (1,152,921,504,606,846,976)
zebi-Zi270 (1,180,591,620,717,411,303,424)
yobi-Yi280 (1,208,925,819,614,629,174,706,176)

(For more on these prefixes, see the official standard in IEC 60027-2, the IEC article "When is a kilobyte a kibibyte?", and the NIST reference page on prefixes for binary multiples.)

Combining these prefixes with the base unit bit, we get the following binary exponential units:

Binary Exponential Units of Data (Bit)
UnitSymbolNumber Of Bits
kibibitKib1,024 bits
mebibitMib1,048,576 bits
gibibitGib1,073,741,824 bits
tebibitTib1,099,511,627,776 bits
pebibitPib1,125,899,906,842,624 bits
exbibitEib1,152,921,504,606,846,976 bits
zebibitZib1,180,591,620,717,411,303,424 bits
yobibitYib1,208,925,819,614,629,174,706,176 bits

And combining the prefixes with the base unit octet, we get:

Binary Exponential Units of Data (Octet)
UnitSymbolNumber Of Bits (Depreciated Unit)
kibioctetKio8,192 bits (kilobyte)
mebioctetMio8,388,608 bits (megabyte)
gibioctetGio8,589,934,592 bits (gigabyte)
tebioctetTio8,796,093,022,208 bits (terabyte)
pebioctetPio9,007,199,254,740,992 bits (petabyte)
exbioctetEio9,223,372,036,854,775,808 bits (exabyte)
zebioctetZio9,444,732,965,739,290,427,392 bits (zettabyte)
yobioctetYio9,671,406,556,917,033,397,649,408 bits (yottabyte)

(For brevity, I have referred to the system above and the two sets of binary exponential units derivied from it as the "kibioctet standard" -- or informally just "kibioctets", as in "promote kibioctets" below -- since the unit kibioctet concisely shows the two differences separating the standard from the depreciated nonstandard standard currently in use.)

Promote Kibioctets

Feel free to use these badges to show your site's compliance with the kibioctet standard and to promote kibioctet usage and understanding:

kibioctet mebioctet gibioctet tebioctet pebioctet