AVL Trees

AVL trees are named after Adel'son-Vel'ski and Landis -- two Russian mathematicians who first defined them.

AVL trees are height-balanced 1-trees. For each node, the difference in height of the left and right subtrees is at most 1.

It is known that in an AVL tree, overall height never exceeds 1.45*log(N). (The math required to get an average figure is difficult. Empirical studies put the average height close to log(N). (The logs mentioned above are base-2.)

A "three-way switch" is required for each node -- to tell if the subtrees are equal in height, or which of the left or right is higher. The scheme can be implemented with as few as two bits. We will denote the balance values as 0, -1, and +1, representing respectively balanced node, one that is "heavy" on the left, and one that is "heavy" on the right.

In an AVL tree, insertion starts normally. Afterwards, manipulations called "rotations" are done if necessary to keep the tree from getting too far out of balance.

When inserting, the notion of the "pivot node" of the insertion operation is an important concept.

DEF: the pivot node P of a new node N that is being inserted into an AVL tree: P is the imbalanced node on the search path from the root to N that is furthest from the root.

Note: above when we refer to "imbalanced" we are talking about the states of the nodes before the new node N is inserted into the tree.

It is possible that there is no pivot node. It is possible that all the nodes on the search path for N are balanced.

Insertion into an AVL tree can be broken into three cases. The first two do not require that any rotations be done. In the third case, a rotation is always required.

Case 1: no pivot node -- in this case the insertion is done and the "balance bits" are adjusted in the tree in all the nodes on the path from the new node up to the root. Every node reached from the left receives a balance of "-1" (left is higher), and every node reached from the right receives a balance of "+1" (right is higher). Note that the work for the insertion remains O(logN), where N is the number of elements in the tree.

Case 2: a pivot node exists and the new node gets inserted on the short side of the pivot. In this case balance bits need to be adjusted only from the new node up to the pivot. The algorithm is as in Case 1 -- add 1 when coming up from the right, and subtract 1 when coming up from the left. In this case, the pivot will change from +1 or -1 to 0.

Case 3: a pivot exists and the new node gets inserted on the high side of the pivot node. This configuration initially violates the balance property of the AVL tree and a rotation must be done to balance the tree.

Four different "rotations" are used: single left rotation, single right rotation, double left rotation, and double right rotation.

These rotations are all simple variations of the same thing. The left and right versions are mirror images of each other. Furthermore, the double rotations can be decomposed and seen to be the result of doing two single rotations. A first single rotation is done in one direction at a lower level. Next a second single rotation in the opposite direction is done at the level of the pivot.

After a rotation is done, some balance bits have to be adjusted. At most one rotation is required to balance the tree after doing an insertion. The total work involved is O(log(N)).

The general methods for doing rotations can be described using example AVL trees of height 5 or more. The actions required to rotate in height 3 or 4 AVL trees are somewhat special, but easy to figure out.

Deletion of a node from an AVL tree can require more than one rotation. In fact, up to log(N) rotations may be required. This case occurs when deleting the leftmost node of a Fibonacci tree. In this case, a rotation has to be done through each node on the path up from the deleted node to the root.

However in large AVL trees studied empirically, only approximately 0.214 rotations per deletion were required and 0.465 per insertion.

Implementation of AVL trees is complex. For moderate data sizes with random insertion and deletion orders the binary search tree is preferable because of lower overhead.

For disk-based applications, B-trees are preferred.