Counting Nodes in Planar Rooted Trees

Theodore Hwa

The following was inspired by a graph theory problem posted by Bill Taylor on sci.math. A planar rooted tree is a tree with a distinguished node (the root) and a given order of siblings for each node. (i.e. the left to right arrangement of nodes is important) The following are all the planar rooted trees of order 4. The node marked * is the root.

  *      *        *        *      *
  |     / \      / \       |     /|\
  o    o   o    o   o      o    o o o
  |    |            |     / \ 
  o    o            o    o   o
  |                       
  o

(If the above graphs don't look right, switch the font in your browser to Courier or some other fixed-width font.)

It is a (well-known?) fact that the number of planar rooted trees having n+1 vertices is the nth Catalan number: C(2n,n)/(n+1). But this means that the total number of nodes among all the planar rooted trees of order n+1 is C(2n,n). My question is: Is there a bijective proof of this fact? This would consist of assigning to each node a subset of size n, from a set of size 2n.

In the following, I give a bijective construction. The bijection and its proof all use well-known combinatorial ideas.

The construction is as follows. Given a p.r.t. (planar rooted tree) of n+1 vertices, start from the root and perform a "circular" depth first traversal of the tree. Siblings should be traversed in their given order. Write down every node encountered, *including* when the traversal "pops" from a child to its parent. We do not allow directly moving from one sibling to another; traversal is strictly along the edges of the tree. The traversal must terminate at the root. For example, in the second 4-node tree above:

   A
  / \
 B   C
 |
 D

The order of traversal is: A, B, D, B, A, C, A. One way to think of this is a sequence of function calls in a computer program, where each parent calls its children in their given order. Then the traversal is obtained by running the program to completion and writing down the sequence of functions we encounter.

Now label each node with a subset of {0,1,2,...,2n-1} indicating all of its positions in the traversal, where the initial root node is position 0 (included in the subset; but the final occurrence of the root in position 2n is not included). This is NOT yet the bijection we seek. For the tree above, the labeling is:

A = {0,4} B = {1,3} C={5} D={2}

Note that:

the size of vertex's label is equal to the degree of the vertex.
the integers in a vertex's label are either all odd, or all even.
two nodes are connected if and only if two consecutive integers occur in their respective labels (of course, we work modulo 2n).

We define the "index" of a vertex to be the smallest integer assigned to it. Thus the indices are: A=0, B=1, C=5, D=2. The root always has index 0.

Now, fix any vertex v, and define the *signature* of v to be the subset of {0,1,...,2n-1} obtained as follows. For each vertex w different from v, choose the integer in w's label that is next in sequence starting from the index of v and proceeding forward, wrapping around if necessary. Put all these choices together to obtain a subset of size n. This is the claimed bijection.

The signature of A is {1,2,5}
The signature of B is {2,4,5}
The signature of C is {0,1,2}
The signature of D is {3,4,5}

For example, to find the signature of D, note that its index is 2. From the label A = {0,4}, we choose the value 4, because 4 comes before 0 if we start at 2 and count forwards (modulo 6). For the label B = {1,3}, we choose the value 3. For the label C = {5}, 5 is the only choice. Therefore, the signature of D is {3,4,5}.

Here are signatures for all the p.r.t. of order 4 (the case n=3):


  123    125      134      124       135
   |     / \      / \       |       / | \
  023  245 012  234 014    024   235 145 013
   |    |            |     / \
  035  345          015  034 025
   |
  045

Note that every combination of 3 integers from {0,1,2,3,4,5} occurs exactly once. I have also verified the case n=4 by computer (source, in Pascal, available on request).

Reversing the process

This construction was motivated by another combinatorial interpretation of the Catalan numbers: C(2n,n)/(n+1) is the number of sequences of balanced parentheses of length 2n. Equivalently, it is the number of sequences of +1 and -1's of length 2n, with total sum 0, for which the cumulative sum is always <=0. This will show up below.

It is important to realize that because the traversal is circular, it can be started at an arbitrary node, not just the root. The bijection can be described in a slightly different manner. Given a vertex v, perform the same traversal, now starting and ending at v. Perform the same labeling, except that v, the initial vertex, now starts at position *equal to its index*. To get the signature of v, write down the position of the *first* occurrence of each node, other than v, along the traversal.

Here's how we reverse the process. Given a signature, we compute a sequence of cumulative sums s_k = a_0 + ... a_{k-1}, for 0 <= k < 2n, where a_i = -1 if i occurs in the signature, and a_i = 1 if it does not. Select an index m for which s_m is maximal.

Now construct the tree as follows. Start with a root node labelled m. Work your way through successive integers m, m+1, ..., back to m. For each integer s which occurs in the signature, draw a new vertex emanating from the current node, and label it s. If s does not occur in the signature, back up to the current node's parent, and label it s. We will never be forced beyond the original node (having no parent) because the cumulative sums of +1's and -1's never go higher than that of the starting node (it was chosen maximal). (+1 means move *up* the tree from a node to its parent, while -1 means move *down* the tree while creating a new node.)

Now re-root the tree at the node that gets labelled 0, and our desired node is, of course, the node labelled m, with which we started the above process. It is easy to see from this construction that the signature of the resulting node must be the original signature.

Some Questions

Can you read off the properties of a node from its signature? For example, can you tell if it is a root, a leaf, how many children it has, etc. ?

Answer:The number of children of a node is equal to one less than the number of times the maximal cumulative sum occurs, unless it is the root, in which case it is the number of times the maximal sum occurs. A node is a root if and only if all cumulative sums are <=0. A node is a leaf if and only if the maximal cumulative sum is unique and nonzero.

Can you use this bijection to solve the problem posted by Bill Taylor to sci.math recently, which states that exactly half the nodes of all p.r.t.'s of a given size are leaf nodes? (I have a not so interesting brute-force induction proof of this.)

Send any comments to me, Theodore Hwa, at hwatheod@cs.stanford.edu