Introduction

In computer science, a heap is a tree data structure similar to that of a binary search tree but with two differences. First, the shape of the heap tree can be arbitrary, unlike the binary search tree, which always satisfies left < root < right. Second, each node in the heap has a higher priority than its child nodes, unlike the binary search tree, which has the priority only on the left child. The algorithm in this markdown file implements the Min-Heap, which is a heap where the root node always contains the minimum value.

The Min-Heap algorithm is widely used for implementing priority queues, sorting algorithms, graph algorithms, and in compression algorithms. The algorithm is also used to lazily generate combinations as a sorting mechanism.

Implementation


(* Min-Heap *)

type 'a t =
  { mutable used : int
  ; mutable data : 'a array
  ; cmp : 'a -> 'a -> int
  }

let left_son pos = (pos lsl 1) + 1

let right_son pos = (pos lsl 1) + 2

let parent pos = (pos - 1) lsr 1

(* can't set an initial size without knowing the type *)
let make cmp = { used = 0; data = [||]; cmp }

let make_int ?(initial_size = 0) cmp = { used = 0; data = Array.make initial_size 0; cmp }

let make_float ?(initial_size = 0) cmp =
  { used = 0; data = Array.make initial_size 0.; cmp }


let size_arr heap = Array.length heap.data

let extend_size heap = heap.data <- Array.(append heap.data (copy heap.data))

let size heap = heap.used

let push ({ used; cmp; _ } as heap) x =
  if size_arr heap = 0
  then (
    heap.data <- [| x |];
    heap.used <- 1)
  else (
    if size_arr heap <= used then extend_size heap;
    (* insert x at the end and swap it with its parent if x is smaller *)
    let pos = ref used in
    let par = ref (parent !pos) in
    while !pos > 0 && cmp x heap.data.(!par) < 0 do
      heap.data.(!pos) <- heap.data.(!par);
      pos := !par;
      par := parent !pos
    done;
    heap.used <- used + 1;
    heap.data.(!pos) <- x)


let pop ({ data; cmp; _ } as heap) =
  match heap.used with
  | 0 -> invalid_arg "owl_utils_heap: can't pop an element from an empty heap"
  | _ ->
    let x = data.(0) in
    (* replace the first element with the last one and swap it with its
     * smallest son *)
    heap.used <- heap.used - 1;
    data.(0) <- data.(heap.used);
    let pos = ref 0 in
    let again = ref true in
    while !again do
      let small = ref !pos in
      let left = left_son !pos
      and right = right_son !pos in
      if left < heap.used && cmp data.(left) data.(!small) < 0 then small := left;
      if right < heap.used && cmp data.(right) data.(!small) < 0 then small := right;
      if !pos = !small
      then again := false
      else (
        let tmp = data.(!pos) in
        data.(!pos) <- data.(!small);
        data.(!small) <- tmp;
        pos := !small)
    done;
    x


let peek heap =
  match heap.used with
  | 0 -> invalid_arg "owl_utils_heap: can't peek at an empty heap"
  | _ -> heap.data.(0)


let is_empty heap = heap.used = 0

let to_array heap = Array.sub heap.data 0 heap.used

The Min-Heap algorithm uses an array-based data structure to store the binary tree. When an element is inserted into the heap, it is placed at the end of the array, and moved up to its correct position by swapping it with its parent until it is in the right location. Similarly, when the root node is removed, the last element in the heap replaces it. The new root node is swapped down with its smaller child nodes until it is in the correct location. This operation is called “heapifying.”

Step-by-Step Explanation

  1. The algorithm begins with defining the data structure for a Min-Heap, which is an array of elements x, where each parent element x is smaller than its child elements in the binary tree.
  2. The function “push” is called to add a new element x to the heap. If the binary tree is empty, the first element is added as the root node. Otherwise, the function extends the size of the array if its current size is not sufficient to store the new node.
  3. The new element x is then added to the end of the binary tree. If the added element is smaller than its parent, it is swapped with its parent, and this process continues up the binary tree until the node is placed in its correct position.
  4. The function “pop” selects the minimum element, which is the root node of the heap, removes it and returns it. The function replaces the root node with the last element and starts to heapify, i.e., compare the swapped node with its child nodes. If the swapped node is smaller than the child nodes, the node is in the right place. Otherwise, swap the node with the smaller child node and repeat until the node is in the correct location.
  5. The function “peek” is called to view the minimum element without removing it from the heap. The function checks that the heap is not empty before returning the element.
  6. The “to_array” function returns all the elements in the heap as an array.
  7. The function “is_empty” checks if the binary tree is empty, i.e., no root nodes exist.

Complexity Analysis

The time complexity of the Min-Heap algorithm largely depends on the number of heapify operations that the “push” and “pop” functions perform. Let us assume that the array has n elements.

  1. The time complexity of adding a new element using the “push” function is O(log n) since the frequency of comparisons required to find the correct node position in the binary tree decreases by half with each level.
  2. The time complexity of removing the root node using the “pop” function is also O(log n) because it requires adjusting the position of the last child node and then heapifying by comparing the node with its child nodes.
  3. The time complexity of peeking at the minimum element using the “peek” function is constant time, i.e., O(1), because it does not require heapifying or swapping any elements.
  4. The time complexity of converting the binary tree to an array using the “to_array” function is O(n) since it requires iterating through all heap elements.
  5. The time complexity of checking if the heap is empty using the “is_empty” function is a constant time O(1).

In summary, the Min-Heap algorithm has a time complexity of O(log(n)) for adding and removing elements, O(n) for converting the binary tree to an array, and an O(1) for checking if the binary heap is empty.