Counting Nodes in Planar Rooted Trees

Theodore Hwa

Alternate version of this article (describes the bijection is slightly different language; also trees are upside down). Warning: a few errors have been fixed in this version but not the alternate version yet.

The following was inspired by a graph theory problem posted by Bill Taylor on sci.math. A planar rooted tree is a tree with a distinguished node (the root) and a given order of siblings for each node. (i.e. the left to right arrangement of nodes is important) The following are all the planar rooted trees of order 4. The node marked * is the root.

o
|
o    o            o    o   o
|    |            |     \ /
o    o   o    o   o      o    o o o
|     \ /      \ /       |     \|/
*      *        *        *      *

(If the above graphs don't look right, switch the font in your browser to Courier or some other fixed-width font.)

It is a (well-known?) fact that the number of planar rooted trees having n+1 vertices is the nth Catalan number: C(2n,n)/(n+1). But this means that the total number of nodes among all the planar rooted trees of order n+1 is C(2n,n). My question is: Is there a bijective proof of this fact? This would consist of assigning to each node a subset of size n, from a set of size 2n.

In the following, I give a bijective construction. The bijection and its proof all use well-known combinatorial ideas.

The construction is as follows. Given a p.r.t. (planar rooted tree) of n+1 vertices, start from the root and perform a "circular" depth first traversal of the tree. Siblings should be traversed in their given order. Write down every edge traversed, *including* when the traversal "pops" from a child to its parent. We do not allow directly moving from one sibling to another; traversal is strictly along the edges of the tree. The traversal must terminate at the root. For example, in the second 4-node tree above:

 D
 |
 B   C
  \ /
   A

The order of traversal is: A-B-D-B-A-C-(A). One way to think of this is a sequence of function calls in a computer program, where each parent calls its children in their given order. Then the traversal is obtained by running the program to completion and writing down the sequence of functions we encounter. We put the last (A) in parentheses because the traversal should be regarded as a circular list, returning to the original A. However, any reference to the order of nodes will be with respect to the canonical ordering encountered when we first made the traversal.

Now, fix any vertex v, and define the *signature* of v to be the following sequence of +'s and -'s. Starting with the edge following the first occurrence of v in the traversal, write down either a + or - underneath each edge in the traversal, moving forwards along the circular list. The rule is to write down a + if the edge is being traversed for the first time, and - if the edge is being traversed for the second time. Every edge will be traversed exactly twice, once in each direction. Therefore, we will have n +'s and n -'s.

The signature of A is ++--+-
The signature of B is -+-++-
The signature of C is ++---+
The signature of D is --+++-

For example, to obtain the signature of B, write the traversal in canonical order:

A B D B A C (A)

B occurs for the first time as the second letter, so we start with the edge B-D. By the rule, we must write down a +. Then D-B, which we just saw, so write -. Then B-A gets a +. Then A-C gets a +. Then C-A gets a -. Then A-B gets a -. We have:

A B D B A C (A)
 - + - + + -

Here are signatures for all the p.r.t. of order 4 (the case n=3). We write 234 to mean that positions 2,3,4 contain +'s, and the others contain -'s. We number positions from 0. The root is at the top.


  012    014      023      013       024
   |     / \      / \       |       / | \
  125  134 015  123 035    135   124 034 025
   |    |            |     / \
  245  234          045  235 145
   |
  345

Note that every combination of 3 integers from {0,1,2,3,4,5} occurs exactly once. I have also verified the case n=4 by computer (source, in Pascal, available on request).

Reversing the process

This construction was motivated by another combinatorial interpretation of the Catalan numbers: C(2n,n)/(n+1) is the number of sequences of balanced parentheses of length 2n. Equivalently, it is the number of sequences of +1 and -1's of length 2n, with total sum 0, for which the cumulative sum is always >=0. This will show up below.

Here's how we reverse the process. Given a signature a_0 ... a_{2n-1}, we compute a sequence of cumulative sums s_k = a_0 + ... a_{k-1}, for 0 <= k < 2n. ('+' is interpeted as +1, '-' is interpreted as -1). Choose an index m for which s_m is minimal.

Now construct the tree as follows. Start with a temporary root node. (This will not necessarily be the actual root of the p.r.t. we construct.) Read one symbol at a time from the signature, starting with the symbol at position m. If the symbol is a '+', draw a new node emanating upwards from the current node. If the symbol is a '-', drop down to the current node's parent. We will never be forced below the temporary root node (having no parent) because the cumulative sums of +1's and -1's never go lower than that of the starting node (it was chosen minimal).

As you move along the signature, pay close attention when you get to the first symbol of the signature as given. The node you are at *before* processing that symbol, is the *actual* root of the tree. Re root the tree at that actual root, to get a p.r.t. The node you are at *after* processing that symbol, is the *first child* of the actual root.

How do we order siblings? The creation process above is again a circular traversal. We order siblings in the order they are encountered *starting the traversal at the actual root*.

We have thus constructed a p.r.t. We have also identified a node, namely, the temporary root we started with. It is easy to see from this construction that the signature of the resulting node must be the original signature. (Well, as usual in these combinatorics constructions, I think trying it out for yourself will probably be more convincing than reading a written explanation.)

Some Questions

Can you read off the properties of a node from its signature? For example, can you tell if it is a root, a leaf, how many children it has, etc. ?

Answer:The degree of a node is the number of times the minimal cumulative sum occurs. Therefore, the number of children of a node is equal to one less the number of times the minimal cumulative sum occurs, unless it is the root, in which case it is the number of times the minimal sum occurs. A node is a root if and only if the minimal cumulative sum of its signature is zero. A node is a leaf if and only if the minimal cumulative sum of its signature is unique and nonzero.

Can you use this bijection to solve the problem posted by Bill Taylor to sci.math recently, which states that exactly half the nodes of all p.r.t.'s of a given size are leaf nodes? (I have a not so interesting brute-force induction proof of this.)

Send any comments to me, Theodore Hwa, at hwatheod@cs.stanford.edu