The transcendence of e

A real number {x} is called algebraic if it is the root of a polynomial with integer coefficients. Examples of algebraic numbers are {1/2} (it is the root of {p(x)=2x-1}), {\sqrt{2}} ({p(x)=x^{2}-2}), the golden ratio ({p(x)=x^{2}-x-1}), and the single real root of the quintic polynomial {x^{5}-x+1} (which cannot be expressed with radicals).

A real number that is not algebraic is called transcendental. It is easy to see that every transcendental number is irrational, but, as we see above, not every irrational number is transcendental. In 1874 Georg Cantor proved that there are only countably many algebraic numbers. Thus there must be uncountably many transcendental numbers! The vast, vast majority of real numbers are transcendental.

Despite the fact that almost every real number is transcendental, it is very difficult to prove that a given number is transcendental. Joseph Liouville discovered the first transcendental number in 1844:

\displaystyle \sum_{n=1}^{\infty}10^{-n!}=0.1100010000000000000000010\ldots

In 1873 Charles Hermite proved that {e} is transcendental and nine years later Ferdinand von Lindemann proved that {\pi} is transcendental. Some other transcendental numbers are: {e^{\pi}} (we have not yet proved that {\pi^{e}} is transcendental), {2^{\sqrt{2}}}, {\ln(2)}, {\sin(1)}, and {i^{i}} (yes, this is a real number: {e^{-\pi/2}}).

In this sequence of three blog posts I will prove that {e} is transcendental. At a glance the proof looks long and complicated, but the proof is really quite straightforward (this post is the longest of the three). Most of the proof is nothing more than algebraic manipulations and divisibility arguments with integers. There is some differential calculus, but nothing beyond Calculus I: the mean value theorem, the derivative of {e^{x},} the product rule, and the limit {\displaystyle\lim_{n\rightarrow\infty}\frac{a^{n}}{n!}=0}. The proof also uses the infinitude of primes.

This proof is based on Adolf Hurwitz‘s 1893 simplification of Hermite’s proof. (To be specific, I used Herstein’s Topics in Algebra as a source for the details of Hurwitz’s proof.)

Three-step proof that e is transcendental
Step 1

Step 2
Step 3

Theorem. {e} is transcendental.

This is a proof by contradiction. Suppose {e} is algebraic. That is, {e} is the root of a polynomial {\varphi(x)=c_{0}+c_{1}x+c_{2}x^{2}+\cdots+c_{n}x^{n}} with integer coefficients. In other words,

\displaystyle \varphi(e)=c_{0}+c_{1}e+c_{2}e^{2}+\cdots+c_{n}e^{n}=0.

We may assume that {c_{0}>0}. (If {c_{0}} is zero divide {\varphi(x)} by {x}, or a sufficiently large power of {x}; if {c_{0}<0}, multiply {\varphi(x)} by {-1}.)

Step 1.

In this first post we will prove the following lemma. We use the familiar notation that {f^{(k)}} is the {k}th derivative of {f} (and {f^{(0)}=f}).

Lemma 1. Suppose {e} is a root of the polynomial {\varphi(x)=c_{0}+c_{1}x+c_{2}x^{2}+\cdots+c_{n}x^{n}}. Let {f} be a polynomial and {F(x)=\sum_{i=0}^{\infty}f^{(i)}(x)}. Then there exist {\alpha_{1},\ldots,\alpha_{n}\in(0,1)} such that {c_{0}F(0)+\cdots+c_{n}F(n)=c_{1}\beta_{1}+\cdots+c_{n}\beta_{n}}, where {\beta_{k}=-ke^{k(1-\alpha_{k})}f(k\alpha_{k})}.

Suppose {f} is a polynomial of degree {r} with real coefficients. Then {f^{(r+1)}(x)=f^{(r+2)}(x)=\cdots=0}. In particular, the function {F} is a finite sum:

\displaystyle  F(x)=\sum_{i=0}^{\infty}f^{(i)}(x)=f(x)+f^{(1)}(x)+\cdots+f^{(r)}(x).

Notice that the derivative of {F} has the following simple form

F'(x)=f^{(1)}(x)+f^{(2)}(x)+\cdots+f^{(r+1)}(x)
=f^{(1)}(x)+f^{(2)}(x)\cdots+f^{(r)}(x)
=F(x)-f(x).

Define a new function {g(x)=e^{-x}F(x)}. The derivative of {g} also has a compact form. Using the product rule we obtain

g'(x)=-e^{-x}F(x)+e^{-x}F'(x)
=-e^{-x}F(x)+e^{-x}(F(x)-f(x))
=-e^{-x}f(x).

At this point we must invoke the mean value theorem.

Theorem. [Mean value theorem] Suppose {f:\mathbb{R}\rightarrow\mathbb{R}} is a differentiable function and {a<b}. Then there exists {c\in(a,b)} such that {\displaystyle f'(c)=\frac{f(b)-f(a)}{b-a}}.

Since the number {c} is between {a} and {b}, we can write it as {c=a+\alpha(b-a)} where {\alpha\in(0,1)}. Thus we may restate the theorem as follows.

Theorem. [Mean value theorem (vers. 2)] Suppose {f:\mathbb{R}\rightarrow\mathbb{R}} is a differentiable function and {a<b}. Then there exists {\alpha\in(0,1)} such that {\displaystyle f'(a+\alpha(b-a))=\frac{f(b)-f(a)}{b-a}}.

The function {g} is differentiable, so we may apply the mean value theorem to the interval {[0,k]} ({a=0}, {b=k}). It guarantees {\alpha\in(0,1)} such that

\displaystyle g'(0+\alpha(k-0))=\frac{g(k)-g(0)}{k-0}.

Using the definition of {g} and our expression for {g'} this equality becomes

\displaystyle -e^{-\alpha k}f(k\alpha)=\frac{e^{-k}F(k)-e^{0}F(0)}{k}.

A little algebra yields the following equality (we will call this common value {\beta}).

\displaystyle F(k)-e^{k}F(0)=-ke^{k(1-\alpha)}f(k\alpha)=\beta.

We now repeat this for all of the intervals {[0,k]} for {k=1,\ldots,n}, where {n} is the degree of the polynomial {\varphi} for which {\varphi(e)=0}. Each application of the mean value theorem yields its own {\alpha_{k}} and its own corresponding {\beta_{k}}:

\displaystyle F(k)-e^{k}F(0)=-ke^{k(1-\alpha_{k})}f(k\alpha_{k})=\beta_{k}.

Multiply the left and right sides of this equality by {c_{k}} (the {k}th coefficient of {\varphi(x)}) to obtain

\displaystyle c_{k}F(k)-c_{k}e^{k}F(0)=c_{k}\beta_{k}.

Summing these equation for {k=1,\ldots, n} we obtain

\displaystyle (c_{1}F(1)+\cdots+c_{n}F(n))-F(0)(c_{1}e+\cdots c_{n}e^{n})=c_{1}\beta_{1}+\cdots+c_{n}\beta_{n}.

But, since {\varphi(e)=0}, we know that {c_{1}e+\cdots c_{n}e^{n}=-c_{0}}. So the equality simplifies to

\displaystyle c_{0}F(0)+\cdots+c_{n}F(n)=c_{1}\beta_{1}+\cdots+c_{n}\beta_{n}.

This proves the lemma.

In the next step we will choose a specific polynomial {f} for the lemma.