AVL Trees
AVL trees are named after
Adel'son-Vel'ski and Landis -- two
Russian mathematicians who first
defined them.
AVL trees are height-balanced 1-trees.
For each node, the difference in
height of the left and right subtrees
is at most 1.
It is known that in an AVL tree,
overall height never exceeds
1.45*log(N). (The math required to
get an average figure is difficult.
Empirical studies put the average
height close to log(N). (The logs
mentioned above are base-2.)
A "three-way switch" is required for
each node -- to tell if the subtrees
are equal in height, or which of the
left or right is higher. The scheme
can be implemented with as few as two
bits. We will denote the balance
values as 0, -1, and +1, representing
respectively balanced node, one that
is "heavy" on the left, and one that
is "heavy" on the right.
In an AVL tree, insertion starts
normally. Afterwards, manipulations
called "rotations" are done if
necessary to keep the tree from
getting too far out of balance.
When inserting, the notion of the
"pivot node" of the insertion operation
is an important concept.
DEF: the pivot node P of a new node N
that is being inserted into an AVL
tree: P is the imbalanced node on the
search path from the root to N that is
furthest from the root.
Note: above when we refer to
"imbalanced" we are talking about the
states of the nodes before the new
node N is inserted into the tree.
It is possible that there is no pivot
node. It is possible that all the
nodes on the search path for N are
balanced.
Insertion into an AVL tree can be
broken into three cases. The first
two do not require that any rotations
be done. In the third case, a
rotation is always required.
Case 1: no pivot node -- in this case
the insertion is done and the "balance
bits" are adjusted in the tree in all
the nodes on the path from the new
node up to the root. Every node
reached from the left receives a
balance of "-1" (left is higher), and
every node reached from the right
receives a balance of "+1" (right is
higher). Note that the work for the
insertion remains O(logN), where N is
the number of elements in the tree.
Case 2: a pivot node exists and the
new node gets inserted on the short
side of the pivot. In this case
balance bits need to be adjusted only
from the new node up to the pivot.
The algorithm is as in Case 1 -- add 1
when coming up from the right, and
subtract 1 when coming up from the
left. In this case, the pivot will
change from +1 or -1 to 0.
Case 3: a pivot exists and the new
node gets inserted on the high side of
the pivot node. This configuration
initially violates the balance
property of the AVL tree and a rotation
must be done to balance the tree.
Four different "rotations" are used:
single left rotation, single right
rotation, double left rotation, and
double right rotation.
These rotations are all simple
variations of the same thing. The
left and right versions are mirror
images of each other. Furthermore,
the double rotations can be decomposed
and seen to be the result of doing two
single rotations. A first single
rotation is done in one direction at a
lower level. Next a second single
rotation in the opposite direction is
done at the level of the pivot.
After a rotation is done, some balance
bits have to be adjusted. At most one
rotation is required to balance the
tree after doing an insertion. The
total work involved is O(log(N)).
The general methods for doing
rotations can be described using
example AVL trees of height 5 or
more. The actions required to rotate
in height 3 or 4 AVL trees are
somewhat special, but easy to figure
out.
Deletion of a node from an AVL tree
can require more than one rotation.
In fact, up to log(N) rotations may be
required. This case occurs when
deleting the leftmost node of a
Fibonacci tree. In this case, a
rotation has to be done through each
node on the path up from the deleted
node to the root.
However in large AVL trees studied
empirically, only approximately 0.214
rotations per deletion were required
and 0.465 per insertion.
Implementation of AVL trees is
complex. For moderate data sizes with
random insertion and deletion orders
the binary search tree is preferable
because of lower overhead.
For disk-based applications, B-trees
are preferred.