Turing’s topological proof that every written alphabet is finite

Recently one of my colleagues was reading Alan Turing‘s groundbreaking 1936 article “On Computable Numbers with an Application to the Entscheidungsproblem.” This is the article in which Turing introduced the turing machine, solved Hilbert’s Entscheidungsproblem (`decision problem’), and proved that the halting problem is undecidable. It is viewed by many as the foundation of computer science.

My colleague shared it with me because it contains a neat use of topology. In this paper Turing gives a topological argument that every written alphabet must be finite. For example, our alphabet has 26 letters, 52 including both capital and lower case, 10 numerals, numerous punctuation marks, etc. Finitely many. Even if we came up with a scheme to generate new symbols, we would only be able to create finitely many.

Here is the explanation in Turing’s own words (we’re particularly interested in the accompanying footnote).

Computing is normally done by writing certain symbols on paper. We may suppose this paper is divided into squares like a child’s arithmetic book… I shall… suppose that the number of symbols which may be printed is finite. If we were to allow an infinity of symbols, then there would be symbols differing to an arbitrarily small extent.* The effect of this restriction of the number of symbols is not very serious. It is always possible to use sequences of symbols in the place of single symbols. Thus an Arabic numeral such as 17 or 999999999999999 is normally treated as a single symbol. Similarly in any European language words are treated as single symbols (Chinese, however, attempts to have an enumerable infinity of symbols). The differences from our point of view between the single and compound symbols is that the compound symbols, if they are too lengthy, cannot be observed at one glance. This is in accordance with experience. We cannot tell at a glance whether 9999999999999999 and 999999999999999 are the same.

*If we regard a symbol as literally printed on a square we may suppose that the square is {0\le x \le 1}, {0 \le y \le 1}. The symbol is defined as a set of points in this square, viz. the set occupied by printer’s ink. If these sets are restricted to be measurable, we can define the “distance” between two symbols as the cost of transforming one symbol into the other if the cost of moving unit area of printer’s ink unit distance is unity, and there is an infinite supply of ink at {x = 2}, {y = 0}. With this topology, the symbols form a conditionally compact space.

Here’s my more modern topological interpretation of this claim.

Terms from topology

I am assuming that the reader is familiar with the terms metric, metric space, topological space, and compact set.

As a brief refresher, recall that a function {d:X\times X\rightarrow \mathbb{R}} is a metric on {X} if for any elements {a,b,c\in X},

  1. {d(a,b)\ge 0}, with equality iff {a=b},
  2. {d(a,b)=d(b,a)}, and
  3. {d(a,b)+d(b,c)\ge d(a,c)}.

If {X} has a metric {d}, then we say that {(X,d)} is a metric space. Given a metric we can define open neighborhoods, and thus generate a topology. So every metric space is a topological space.

A subset {A} of a topological space {X} is compact if every open cover of {A} has a finite subcover. In ordinary Euclidean space being compact is equivalent to being closed and bounded. In this discussion we let {F(X)} denote the set of all compact subsets of {X}.

The Hausdorff metric

We would like to speak about the distance between two compact sets. That is, we’d like a metric for the set {F(X)}.

Let {(X,d)} be a metric space and {A} and {B} be two compact subsets of {X}. Suppose {a} is the point in {A} farthest from any point in {B} and that this distance is {d(a,B)}. Similarly, suppose {b} is the point in {B} farthest from any point in {A} and that this distance is {d(b,A)}. Then the Hausdorff distance between {A} and {B}, {D(A,B)}, is the larger of {d(a,B)} and {d(b,A)}. It is not difficult to show that {D} satisfies the properties of a metric listed above. This is the Hausdorff metric.

For example, suppose {A}, {B}, {C}, and {D} are the colored sets shown below. The farthest point in {A} from {B} (marked {a} in the diagram) is 4 units away. Similarly, the farthest point in {B} from {A} (marked {b}) is 3 units away. Thus {D(A,B)=\max\{3,4\}=4}. In the next picture, the blue set {C} is a subset of the green set {E}. Thus {d(c,E)=0} for every {c\in C}. The point {e\in E} is farthest from {C}, with {d(e,C)=0.5}. Thus {D(C,E)=\max\{0,0.5\}=0.5}.

One nice property of the Hausdorff metric is that if {X} is a compact space, then so is {F(X)}.

Proof of Turing’s claim

We would like to prove that any written alphabet is finite. First, we make the following assumptions.

  1. Each symbol can be drawn inside a given square. More specifically, we will assume that each symbol is a compact subset of {[0,1]\times[0,1]}.
  2. The human eye cannot tell two symbols apart when they are too similar. That is, there is an {\varepsilon>0} such that any symbols {A,B\subset [0,1]\times[0,1]} with {D(A,B)<\varepsilon} are visually indistinguishable.

Let {X=[0,1]\times[0,1]}. Then {F(X)} is the set of all possible symbols. Since {X} is compact, so is {F(X)}.

Let {B_{\varepsilon}(A)=\{B\in F(X)\colon D(A,B)<\varepsilon\}} denote the {\varepsilon}-ball around {A\in F(X)} using the Hausdorff metric. Then the set of all possible {\varepsilon}-balls, {\{B_{\varepsilon}(A):A\in F(X)\}}, is an open cover of {F(X)}. Since {F(X)} is compact, there exists a finite subcover, {\{B_{\varepsilon}(A_{1}),\ldots,B_{\varepsilon}(A_{n})\}}.

From this we can conclude that every symbol {A\in F(X)} is visually indistinguishable from at least one of the {A_{i}}. That is, there can be no more than {n} visually distinct symbols. In particular, there can be no written alphabet with more than {n} letters!

[Note: in (1) we assume that all symbols are compact subsets of {X}. We could have dropped the compactness assumption, but if we did then the Hausdorff metric becomes a pseudometric and things get a little messier.]


There is nothing special about assuming our figures are letters in an alphabet. Using the same reasoning, we can conclude that it is possible to draw only finitely many pictures with a black pen on a given canvas.

What if we are allowed to use color? Suppose that the visible spectrum of colors is a compact set {Y}. Two colors that are very similar are visually indistinguishable. Thus we can use the identical argument, but now a symbol is a compact subset of {[0,1]\times[0,1]\times Y}, to conclude that there are only finitely many color pictures on a given canvas.


  1. Nat STahl says:

    Does this extend to music as well?

    1. Good question. I’d guess “yes” if we were to frame our hypotheses right. For example, the set of pitches that we can hear is compact, the pieces cannot exceed any fixed length (time), and each note is a pure sine wave. Maybe we could relax that last one—I’m not sure.

  2. D. Eppstein says:

    The argument makes sense using Hausdorff distance, the way you have it, but it sounds to me like Turing intended something different, a form of earth movers’ distance.

    1. Thanks! I was careful not to say that I was giving “Turing’s proof,” although I didn’t explicitly say that I wasn’t…! I was trying to give a proof along the same lines as his using things that I knew.

      I had/have some questions about what he wrote in that short footnote. I was not sure if his “metric” was equivalent to the Hausdorff metric (thanks for posting that link), I didn’t know why he mentioned measurable sets, and I was not sure why he said that the set of symbols was conditionally compact instead of compact. Probably you need measurability to use the metric that you mention, and in that case the space is only conditionally compact. (I’m not a point set topologist.) I’ll keep thinking about it.

  3. DavidC says:

    Interesting post! Thank you.

Comments are closed.