12.2 Information-Theoretic Arguments

Looking at the game of 20 questions another way, if there was an algorithm A that could always win the game with 19 or fewer questions, then we could represent A with a binary decision tree:

The internal nodes of the tree are questions, and the leaves are "verdicts" -- decisions that some guess is correct -- like "the number is 21,345".

If the algorithm A exists, then the tree has height at most 19 and has at least one million leaves.

But we know that a binary tree of height k has at most 2k leaves. 219 is less than one-million, so the algorithm A can't exist.

THEOREM: A binary tree of height H has at most 2H leaves. (height=H)==>(#leaves<=2H)

PROOF: If H=0, then the tree has one node, and one leaf, and 2H=1. Now assume that H>=1, and for j=0,1,2,..,H-1 a binary tree of height j has at most 2j leaves. Suppose T is a binary tree of height H. Since H>=1, T has a root node and left and right sub-trees L and R. If not empty, the heights of L and R are between 0 and H-1, inclusive. Therefore by the induction hypothesis, L and R each have no more than 2H-1 leaves. Every leaf of T is either a leaf of L or a leaf of R (Note we used here the fact that L and R can't both be empty). Therefore T has no more than 2H-1 + 2H-1 leaves (=2H).

COROLLARY: Any binary tree with N leaves must have height at least lg(N).

PROOF: Suppose that a binary tree has N leaves and height H. By the theorem, the tree has no more than 2H leaves. Therefore N<=2H, and so lg(N)<=H. QED

THEOREM 12.2.1 Any binary tree with 2^H leaves has an average height of at least H. (#leaves>=2^H)==>(average height>=H)

THEOREM 12.2.1 (logarithmic form) Any binary tree with N leaves has an average height of at least lg N.

PROOF: We proceed by induction on N.

If the number of leaves = N = 1, then the tree must look like one of these pictures:


    ... *  *  *  *   *  ...
       /   /      \   \
      *   *        *   *
     /                  \
    *	                 *
The (average) height is 0 or more, and 0 = lgN = lg(1).

Now suppose that N>=2, and that for j=1,..,N-1 whenever a binary tree has j leaves, its height is at least lg(j).

If T is a binary tree with N leaves, then T has left and right sub-trees L and R, not both empty. Let p be the number of leaves in L, and N-p the number of leaves in R.

If averageHeight(T) denotes the average height of the tree T, then N*averageHeight(T) is the sum of the lengths of all the paths from the root of T to the leaves. If we calculate this by looking at the paths from the roots of L and R to the respective leaves, we see that

N*averageHeight(T)

= p + p*averageHeight(L) + (N-p) + (N-p)*averageHeight(R)

= N + p*averageHeight(L) + (N- p)*averageHeight(R)

Assume for the moment that neither L nor R is empty.

---------------------------
Start Argument "A"

Then 1 <= p,N-p <= N-1. By the induction hypothesis, averageHeight(L) >= lg(p) and averageHeight(R) >= lg(N-p). Therefore:

(1) N*averageHeight(T) >= N + p*lg(p) + (N-p)*lg(N-p)

By analyzing the real-valued function

g(x) = xlg(x) + (N-x)lg(N-x),

we can determine that the derivative is

g'(x) = [lg(x) - lg(N-x)],

which is zero just when x=N/2. The second derivative is

g''(x) = (1/ln2)[(1/x)+(1/(N-x))]

which is positive when x = N/2 and N>=2. Therefore:

p*lg(p) + (N-p)*lg(N-p) >= (N/2)lg(N/2) + (N/2)lg(N/2) = N*lg(N/2) = N*lg(N) - N,

and so relation (1) implies that

N*averageHeight(T) >= N + N*lg(N) - N = N*lg(N) In other words averageHeight(T) >= lg(N).

End Argument "A"
---------------------------

Getting back to the possibility that L or R is empty. Suppose for the sake of definiteness that R is empty. Then it suffices to show that averageHeight(L) >= lg(N), since averageHeight(T) = 1 + averageHeight(L) in that case. If L has two non-empty sub-trees, then "Argument A" applies. If not, it will suffice to show that the one sub-tree of L is of averageHeight >= lg(N). Since the number of leaves in T is N, and N >= 2, we must eventually reach a descendent of the root of T that has more than one child. "Argument A" applies to that tree. It proves that tree has averageHeight >= lg(N). Since averageHeight(T) = v + averageHeight(G) where G is the tree with two sub-trees that is reached eventually, and v is the number of steps required to reach it, we see that this also proves that averageHeight(T) >= lg(N).