nearest pair algorithm

(rev. 03/11/2012)

Algorithm for Finding Nearest Pair of Points in a Set P of Points in the Plane

#1 Pre-processing: If necessary, rotate the set so that no pair of points have the same x- or y-coordinate. (I'm not sure of the complexity of finding such a rotation. We'll just restrict ourselves to seeking a solution to the simplified problem in which no two points have the same x- or y-coordinates.)
#2 Pre-processing: Also produce lists P_x (the points of P, sorted by increasing x-coordinate) and P_y (the points of P, sorted by increasing y-coordinate). "Attach to each entry in each list a record of the position of that point in both lists." So P_x and P_y are lists of entries like this: (x_i, y_i, rank of x_i in P_x, rank of y_i in P_y). So, for example, an element like this (3, 8, 24, 81) would be the point (3,8) in the plane and the 24 and 81 mean that this point is the 24th item in P_x and the 81st element in P_y. This can all be done in O(N*log(N)) time where |P|=N. As an example of a possible structure, P_x could be implemented as an array of N pointers, the ith item pointing to a record representing the point with x-rank equal to i. P_y could be implemented similarly, pointing into the same collection of records as P_x.
Let Q be the first ceil(N/2) elements of P_x - the "left half". Let R be the last floor(N/2) elements of P_x - the "right half".
In O(N) time create Q_x, a version of the "left half" Q sorted by increasing x-value. This can be done by walking through P_x and selecting the first ceil(N/2) elements to build a new list. We might use a new array of ceil(N/2) pointers for Q_x and point the pointers at duplicate records of the ones used for P_x. These records have the form (s,t,u,v), where s is the x-coord, t is the y-coord, u is the x-rank, and v is the y-rank. The ranks u and v are relative to P_x and P_y, but u is also correct relative to Q_x.
In O(N) time create Q_y - the elements of Q in order of ascending y-value. This can be done by traversing P_y from lowest y to highest y, and selecting the points whose x-rank is <= ceil(N/2). This information could be used to create another array of pointers pointing into the set of records used with Q_x. The idea would be, for the ith element found in P_y with x-rank r_x <= ceil(N/2), to point the ith element of the new Q_y array at the element J=(s,t,u,v), to which the r_xth element of Q_x points, and change the value of v to i, to indicate the y-rank of J within Q_y.
In a similar manner, create R_x and R_y in O(N) time - lists of the points on the "right side" by increasing x-value and y-value.
To summarize, we can construct, in O(N) time, Q_x, Q_y, R_x, and R_y, so that we now have two problems of half the size, problems that are exactly alike in structure to the original problem represented by P, P_x, and P_y. Note that this did NOT require sorting the half-sets. We need do only O(N) work to obtain sorted half-sets alike in structure to P_x, and P_y.
Next, recursively determine the closest pairs (q₀^*, q₁^*) of points in Q and (r₀^*, r₁^*) in R (assume that the base case is where the set of points contains 3 or fewer elements, and the solution is computed in that case just by examining all possible pairs of points.)
let δ = min( d(q₀^*, q₁^*), d(r₀^*, r₁^*) ).
Let L be the vertical line that goes through x^* - the x-coordinate of the rightmost point in Q.
Any pair of points of P whose distance apart is less than δ must be inside the "strip" of width 2δ with center-line L.
By scanning through P_y in O(N) time, select the set of elements of P that lie inside the strip, ordered by increasing y-value. (We simply select points whose x-coordinate differ from x^* by δ or less.) Call that ordered list S_y.
For each point s ∈ S_y, compute distance from s to the next 15 points in S_y . let (σ, σ') be the pair found achieving the minimum of these distances, after completing the entire traversal of S_y. The traversal requires O(N) work.
Return the lesser of δ and d(s,s').