Closest-pair problem
Introduction
The closest-pair problem in computational geometry is the problem of finding the pair of two closest points given a set of points in Euclidean space. This algorithm uses the divide-and-conquer approach for the Euclidean space. The algorithm is used in geographical information systems to help find the proximity of two points in a coordinate plane, image processing in detecting the closely related pixel regions, and proximity-based clustering, etc.
Implementation
(* Closest-pair problem *)
type point = { x : float; y : float }
let cmpPointX (a : point) (b : point) = compare a.x b.x
let cmpPointY (a : point) (b : point) = compare a.y b.y
let distSqrd (seg : (point * point) option) =
match seg with
| None -> max_float
| Some(line) ->
let a = fst line in
let b = snd line in
let dx = a.x -. b.x in
let dy = a.y -. b.y in
dx*.dx +. dy*.dy
let dist seg =
sqrt (distSqrd seg)
let shortest l1 l2 =
if distSqrd l1 < distSqrd l2 then
l1
else
l2
let halve l =
let n = List.length l in
BatList.split_at (n/2) l
let rec closestBoundY from maxY (ptsByY : point list) =
match ptsByY with
| [] -> None
| hd :: tl ->
if hd.y > maxY then
None
else
let toHd = Some(from, hd) in
let bestToRest = closestBoundY from maxY tl in
shortest toHd bestToRest
let rec closestInRange ptsByY maxDy =
match ptsByY with
| [] -> None
| hd :: tl ->
let fromHd = closestBoundY hd (hd.y +. maxDy) tl in
let fromRest = closestInRange tl maxDy in
shortest fromHd fromRest
let rec closestPairByX (ptsByX : point list) =
if List.length ptsByX < 2 then
None
else
let (left, right) = halve ptsByX in
let leftResult = closestPairByX left in
let rightResult = closestPairByX right in
let bestInHalf = shortest leftResult rightResult in
let bestLength = dist bestInHalf in
let divideX = (List.hd right).x in
let inBand = List.filter(fun(p) -> abs_float(p.x -. divideX) < bestLength) ptsByX in
let byY = List.sort cmpPointY inBand in
let bestCross = closestInRange byY bestLength in
shortest bestInHalf bestCross
let closestPair pts =
let ptsByX = List.sort cmpPointX pts in
closestPairByX ptsByX
let parsePoint str =
let sep = Str.regexp_string "," in
let tokens = Str.split sep str in
let xStr = List.nth tokens 0 in
let yStr = List.nth tokens 1 in
let xVal = (float_of_string xStr) in
let yVal = (float_of_string yStr) in
{ x = xVal; y = yVal }
let loadPoints filename =
let ic = open_in filename in
let result = ref [] in
try
while true do
let s = input_line ic in
if s <> "" then
let p = parsePoint s in
result := p :: !result;
done;
!result
with End_of_file ->
close_in ic;
!result
;;
let loaded = (loadPoints "Points.txt") in
let start = Sys.time() in
let c = closestPair loaded in
let taken = Sys.time() -. start in
Printf.printf "Took %f [s]
" taken;
match c with
| None -> Printf.printf "No closest pair
"
| Some(seg) ->
let a = fst seg in
let b = snd seg in
Printf.printf "(%f, %f) (%f, %f) Dist %f
" a.x a.y b.x b.y (dist c)
The algorithm consists of dividing the set of n points into two subsets of n/2 points, finding the closest pair in each subset recursively, and then combining these pairs to make the final result. The combining step where two closest pairs cross the midpoint takes O(n) time because it sorts the set of points in Y dimension using the two closest pairs’ distance to select the boundary points.
The algorithm has the following steps:
- Sort the points according to their x coordinates.
- For each half of the set, recursively find the closest pair.
- Let d be the minimum distance obtained between a point in the left subset and a point in the right subset.
- Combine the two closest pairs obtained from the left and right subsets (from the recursive calls).
- Check the strip with width d centred at the middle x-coordinate of the closest pair of the boundary lines, for pairs of points closer than d apart. If there is a pair within d, return the closest pair.
Step-by-step Explanation
- The algorithm requires the points to be given as an array, sorted based on their x-coordinate.
- For the given points, the algorithm recursively divides the points into two halves based on their x-coordinate.
- The recursion continues until the number of points becomes too small for partitioning.
- Then, the closest points in the adjacency are considered and their distance is computed.
- Points with a distance below the threshold will be excluded from the final result.
- The points are sorted based on their y-coordinates, allowing us to quickly find the closest pair in the vertical strip of points near the dividing line.
- Check whether the minimum distance obtained in each of the subproblems less than the distance across the center strip. Combine the answers obtained from each subproblem.
Complexity Analysis
Let’s assume that the given set of points has n points.
The algorithm sorts the points according to their x-coordinates, which takes O(n lg n) time. The algorithm uses a divide-and-conquer approach in which half of the points are assigned to the left and right independently. Therefore, we can describe the running time of the algorithm as follows:
- T(n) = 2 T(n/2) + O(n)
We have to find the closest pair in the entire set by merging the closest pair in both the right and left parts. It takes O(n) time to combine the two solutions.
The time complexity of the algorithm can be expressed as:
- T(n) = 2T(n/2) + n + O(n)
- T(n) = 2T(n/2) + O(n)
Using the Master theorem, we can conclude that the above recurrence relation’s time complexity is O(nlogn).
Therefore, the total time complexity of the algorithm will become O(n log n). The space complexity of the algorithm is also O(n), as we are sorting and combining the points by the highest and lowest variables.
The algorithm is effective in eliminating all but closest-pair candidates within the strip, which results in a relatively small number of candidates. It is robust and works well even for large data sets.