Posts /

ITA Baranov’s Walkthrough. Chapter 9

Twitter Facebook Google+
07 Jan 2017

Medians and Order Statistics

image-center

The ith order statistic of a set of n elements is the ith smallest element. For example, the minimum of a set of elements is the first order statistic (i = 1), and the maximum is the nth order statistic (i = n). A median, informally, is the “halfway point” of the set. When n is odd, the median is unique, occurring at i = (n+1)/2. When n is even, there are two medians, occurring at i = n/2 and i = n/2+1. Next, the chapter addresses the problem of selecting the ith order statistic from a set of n distinct numbers. We formally specify the selection problem as follows:

Input: A set A of n (distinct) numbers and an integer i, with 1 ≤ i ≤ n

Output: The element x is in set A that is larger than exactly i - 1 other elements of A.

We can no doubt solve the problem using any of comparison sorts presented earlier in O(nlgn) time. However, there is a much faster solution.

Selection in linear time

To solve this problem, the book presents a divide-and-conquer algorithm. The algorithm RANDOMIZED-SELECT is modeled after the quicksort algorithm of Chapter 7. As in quicksort, we partition the input array recursively. But unlike quicksort, which recursively processes both sides of the partition, RANDOMIZED-SELECT works on only one side of the partition. This difference makes the latter to have the expected running time of Θ(n), assuming that the elements are distinct.

RANDOMIZED-SELECT(A,p,r,i)
if p == r     // base case
	return A[p]
q = RANDOMIZED-PARTITION(A,p,r) // partition
k = q - p + 1 // left side + 1 (pivot)
if i == k  //the pivot value is the answer
	return A[q] 
elseif i < k  // the answer is in the front 
	return RANDOMIZED-SELECT(A,p,q-1,i)
else          // the answer is in the back half 
	return RANDOMIZED-SELECT(A,q+1,r,i-k)

By doing some math, the book concludes that we can find any order statistic, and in particular the median, in expected linear time, assuming that the elements are distinct. There is also an algorithm for selection problem that has a worst-case linear time but I won’t cover it here since it’s not as practical as this one.


Twitter Facebook Google+