Take a collection of real-world data such as the lengths of all rivers in the world, the populations of counties in the United States, the net worths of American corporations, or the street addresses of all residents of Detroit. Strip away all the information except the leading digits. What percentage of these digits do you expect to be 1’s? 2’s? 9’s? Surely the answer is the same for all digits: 11% (1/9), right?
It turns out that in many cases, the leading digits are not uniformly distributed, but obey Benford’s law: the leading digit occurs with frequency . [See Terence Tao's post for details on when Benford's law applies, but roughly speaking the data must be very spread out (e.g., spanning several orders of magnitude), so data sets like zip codes in the US or heights of adult males won't follow Benford's law.]
Thus we expect the leading digits to occur with the following frequencies:
Let us look at a concrete example: powers of 2. The first 20 powers of 2 are: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, and 524288. The leading digit is 1 six times, that’s 30%, it is 2 four times (20%), and 9 zero times (umm,… 0%).
I computed the first 100 and first 500 powers of 2 and found that the leading digits have percentages:
Benford’s law has been used to detect fraud in tax returns and in election results—the numbers that people make up typically do not follow Benford’s law. For more information you may want to read other accounts of Benford’s law on the web.
In this blog post I will give a proof just for the powers of 2. It is a clever proof that uses a theorem about rotations of the circle that I wrote about last summer.
For simplicity we’ll consider a circle of circumference 1; equivalently we may think of the circle as with 0 and 1 glued together or as , the set of real numbers, modulo 1. Let denote the fractional part of the real number . Then all represent the same point on the circle.
Let be any real number. We can think of the sequence as the orbit of 0 under repeated rotations of the circle by . We begin with the point 0 on the circle, rotate by to obtain the point , rotate by again to obtain , etc. In my earlier blog post I stated the following theorem.
Theorem. Suppose is an irrational number.
1. Then is a dense subset of the circle.
2. Moreover, if is an interval in the circle of length and there are elements of in , then .
Part 1 of the theorem states that the orbit of the point 0 under an irrational rotation by “fills up” the entire circle; that is, every open interval on the circle, regardless of how small, intersects this set of points. Part 2 says something even stronger. It says that the orbit fills up the circle in a uniform way—asymptotically the amount of time the orbit spends in an interval is equal to the length of the interval (we often say that “the time average equals the space average”).
So what does this have to do with Benford’s law?
The leading digit of is provided there exists a nonnegative integer such that
Taking the logarithm (base 10) of the above inequality we obtain
Notice that for , and are between 0 and 1. Thus we conclude that the leading digit of is provided
But is an irrational number. By the second part of our theorem we know that the percentage of the set that intersects the interval is precisely the length of the interval , , which is what Benford’s law says.