Shannon Entropy
Shannon Entropy
Introduction
Shannon entropy is a concept in information theory that measures the uncertainty or randomness associated with a set of data. It was introduced by Claude Shannon, a mathematician and electrical engineer, in his 1948 paper “A Mathematical Theory of Communication”.
Shannon entropy is widely used in cryptography, data compression, and other fields where information needs to be encoded and transmitted efficiently and securely.
Implementation
(* Shannon Entropy *)
module CharMap = Map.Make(Char)
let entropy s =
let count map c =
CharMap.update c (function Some n -> Some (n +. 1.) | None -> Some 1.) map
and calc _ n sum =
sum +. n *. Float.log2 n
in
let sum = CharMap.fold calc (String.fold_left count CharMap.empty s) 0.
and len = float (String.length s) in
Float.log2 len -. sum /. len
let () =
entropy "1223334444" |> string_of_float |> print_endline
The implementation of the Shannon entropy algorithm is written in OCaml. It takes a string as input and returns the entropy of the string as a floating-point number.
Step-by-step Explanation
- Define a module CharMap as a map from characters to floating-point numbers using the OCaml built-in Map library.
- Define the
entropyfunction which takes a stringsas an argument. - Define the
countfunction which takes a CharMapmapand a charactercas arguments. It counts the number of occurrences of charactercin the string and updates the map accordingly. Ifcis not in the map, it is added with a count of 1. - Define the
calcfunction which takes a character and its frequency in the string as arguments. It returns the product of frequency and the base-2 logarithm of frequency. - In the
entropyfunction, foldcountover the input stringsstarting with an empty CharMap to get a mapping of each character to its frequency. - Fold
calcover the CharMap obtained in step 5 to calculate the Shannon entropy for the input string. - Calculate the length of the input string as a float using
String.lengthand perform the final calculation to get the entropy as a floating-point number. - The entropy value is returned from the
entropyfunction.
Complexity Analysis
The Shannon entropy algorithm calculates the frequency of each character in a given string and then applies the Shannon entropy formula using that frequency distribution.
Let n be the length of the input string. Then:
- The
String.fold_leftcall in step 5 takesO(n)time to traverse the input string and update the character frequency map. - The
CharMap.foldcall in step 6 takesO(k log k)time wherekis the number of distinct characters in the input string. In the worst case,kcan be as large as the total number of Unicode characters which is around 143,000. - The final calculation in step 7 takes constant time.
Therefore, the overall time complexity of the algorithm is O(n + k log k). In practice, the k log k term dominates since it is much larger than n for most input strings.