Irrational rotations of the circle and Benford’s law

Take a collection of real-world data such as the lengths of all rivers in the world, the populations of counties in the United States, the net worths of American corporations, or the street addresses of all residents of Detroit. Strip away all the information except the leading digits. What percentage of these digits do you expect to be 1’s? 2’s? 9’s? Surely the answer is the same for all digits: 11% (1/9), right?

It turns out that in many cases, the leading digits are not uniformly distributed, but obey Benford’s law: the leading digit {d} occurs with frequency {\log_{10}(1+1/d)}. [See Terence Tao’s post for details on when Benford’s law applies, but roughly speaking the data must be very spread out (e.g., spanning several orders of magnitude), so data sets like zip codes in the US or heights of adult males won’t follow Benford’s law.]

Thus we expect the leading digits to occur with the following frequencies:

\displaystyle \begin{array}{cc} \text{Leading digit} &\text{Benford's prediction}\\ 1	& 30.10\%\\ 2	& 17.61\%\\ 3	& 12.49\%\\ 4	& 9.69\%\\ 5	& 7.92\%\\ 6	& 6.69\%\\ 7	& 5.80\%\\ 8	& 5.11\%\\ 9	& 4.58\% \end{array}

Let us look at a concrete example: powers of 2. The first 20 powers of 2 are: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, and 524288. The leading digit is 1 six times, that’s 30%, it is 2 four times (20%), and 9 zero times (umm,… 0%).

I computed the first 100 and first 500 powers of 2 and found that the leading digits have percentages:

\displaystyle \begin{array}{cccc} \text{Leading digit} & \text{First 100 powers} &\text{First 500 powers} & \text{Benford's prediction}\\ 1	& 30\%	&30.2\%& 30.10\%\\ 2	& 17\%	&17.6\%& 17.61\%\\ 3	& 13\%	&12.4\%& 12.49\%\\ 4	& 10\%	&9.6\%& 9.69\%\\ 5	& 7\%	&7.8\%& 7.92\%\\ 6	& 7\%	&6.8\%& 6.69\%\\ 7	& 6\%	&5.6\%& 5.80\%\\ 8	& 5\%	&5.0\%& 5.11\%\\ 9	& 5\%& 	4.6\%&4.58\% \end{array}

Amazing!

Benford’s law has been used to detect fraud in tax returns and in election results—the numbers that people make up typically do not follow Benford’s law. For more information you may want to read other accounts of Benford’s law on the web.

In this blog post I will give a proof just for the powers of 2. It is a clever proof that uses a theorem about rotations of the circle that I wrote about last summer.

For simplicity we’ll consider a circle of circumference 1; equivalently we may think of the circle as {[0,1]} with 0 and 1 glued together or as {\mathbb{R}/\mathbb{Z}}, the set of real numbers, modulo 1. Let {\{x\}} denote the fractional part of the real number {x}. Then {\{3.25\}=\{239.25\}=\{-1.75\}=0.25} all represent the same point on the circle.

Let {\alpha} be any real number. We can think of the sequence {0,\{\alpha\},\{2\alpha\},\{3\alpha\},\ldots} as the orbit of 0 under repeated rotations of the circle by {\alpha}. We begin with the point 0 on the circle, rotate by {\alpha} to obtain the point {\{\alpha\}}, rotate by {\alpha} again to obtain {\{2\alpha\}}, etc. In my earlier blog post I stated the following theorem.

Theorem. Suppose {\alpha} is an irrational number.
1. Then {\big\{0,\{\alpha\},\{2\alpha\},\{3\alpha\},\ldots\big\}} is a dense subset of the circle.
2. Moreover, if {I} is an interval in the circle of length {l(I)} and there are {n(k)} elements of {\big\{0,\{\alpha\},\{2\alpha\},\{3\alpha\},\ldots,\{(k-1)\alpha\}\big\}} in {I}, then {\displaystyle\lim_{k\rightarrow\infty}\frac{n(k)}{k}=l(I)}.

Part 1 of the theorem states that the orbit of the point 0 under an irrational rotation by {\alpha} “fills up” the entire circle; that is, every open interval on the circle, regardless of how small, intersects this set of points. Part 2 says something even stronger. It says that the orbit fills up the circle in a uniform way—asymptotically the amount of time the orbit spends in an interval is equal to the length of the interval (we often say that “the time average equals the space average”).

So what does this have to do with Benford’s law?

The leading digit of {2^{m}} is {d} provided there exists a nonnegative integer {n} such that

\displaystyle d10^{n}\le 2^{m}< (d+1)10^{n}.

Taking the logarithm (base 10) of the above inequality we obtain

\displaystyle n+\log_{10}(d)\le m\log_{10}2<n+\log_{10}(d+1).

Equivalently,

\displaystyle m\log_{10}2\in[n+\log_{10}(d),n+\log_{10}(d+1)).

Notice that for {d=1,\ldots 9}, {\log_{10}(d)} and {\log_{10}(d+1)} are between 0 and 1. Thus we conclude that the leading digit of {2^{m}} is {d} provided

\displaystyle \{m\log_{10}2\}\in[\log_{10}(d),\log_{10}(d+1)).

But {\log_{10}(2)} is an irrational number. By the second part of our theorem we know that the percentage of the set {\big\{0,\{\log_{10} 2\},\{2\log_{10} 2\},\{3\log_{10} 2\},\ldots\big\}} that intersects the interval {I=[\log_{10}(d),\log_{10}(d+1))} is precisely the length of the interval {I}, {l(I)=\log_{10}(d+1)-\log_{10}(d)=\log_{10}(1+1/d)}, which is what Benford’s law says.

7 Comments

  1. Pat Ballew says:

    Dave,

    In my blog on Benford’s law I point out that, “As you might have guessed, someone else did it earlier; a half century earlier. In 1881 a note to the American Journal of Mathematics by an American astronomer named Simon Newcomb described an unusual observation. He had noticed that the tables of logarithms that were in common use back then by astronomers, always had the pages of the lower numbers more dog-eared than the pages of the higher numbers. He suggested that natural observations tend to start with the number one more often than with an eight or nine. For some reason, the observation went without much comment. ”

    For my students, I point out that before they think Newcomb was TOOOO bright, I point out that in 1903 (only months before the Wright Bros. flight at Kittyhawk) he declared boldly, “”Aerial flight is one of that class of problems with which man cannot cope”.

    1. Thanks, Pat. I hadn’t seen your earlier post. I like that story about Newcomb. I’ll have to see if I can find a book of logarithms in our college’s archives so that I can see the page usage for myself. Also thanks for giving me the publication year of his note. I was able to find it in our online holdings in an instant.

  2. Ivan says:

    There is a misspelling in the last formula.
    should be “log(1 + 1/d)”; there is sign mismatch.
    Feel free to remove this comment)

    1. Thanks. It is fixed now.

  3. Rick Meese says:

    I think it’s useful to imagine a counter. If you have a lot of numbers you’re counting, the dials at the lower end would move quickly; the ones at the higher end (that represent millions, or hundreds of thousands) would move slowly. Those high numbers wouldn’t flip over very often as the smaller numbers kept turning through all the digits. How often do you hear of 900 million anyway. It’s usually 1.2 million or some lower number, especially with $$$. Thanks for the post.

Comments are closed.