Chapitre 3: Greedy algorithms

Interval scheduling problem
18:54

Problem (Interval scheduling problem)

  • Job starts at and finishes at
  • Two jobs are compatible if they don't overlap
  • Goal: find maximum subset of mutually compatible jobs

Question

What kind of heuristics would you consider to select your maximum subset?

What is wrong with earliest start time, shortest interval or fewest conflicts?

Dynamic programming solution
18:54

Consider the set of activities that start after activity finishes and before starts, and write the size of an optimal solution on .

If is part of the optimal solution, then

We brute-force the choice of to obtain:

Exercise

Implement a top-down memoized version of the problem. What is its running time?

DP solution
18:54

Loading

Greedy template
18:54

Greedy template

  • Consider jobs in some order
  • Select first job
  • Remove incompatible jobs
  • Repeat steps 2 and 3 while still possible

Question

How would you order the activities?

Optimality of greedy algorithm
18:54

Proposition

The greedy algorithm that considers jobs in increasing order of finish time is optimal.

Phrased in terms of the DP, we could say

Proposition

The optimal solution of could always be assumed to contain the activity in with earliest finish time.

Greedy implementation
18:54

Exercise

Implement a greedy version of the interval scheduling problem

Loading

Exercises
18:54

Exercise (Interval partitioning)

  • Lectures have start and finish times
  • Goal: find smallest number of classrooms to schedule all lectures.

Implement a solution, prove it's optimal and find its running time.

Programming exercises
18:54

Exercise

Given an array of nonnegative integers, you are initially positioned at the first index of the array. Each element in the array represents your maximum jump length at that position. Determine if you are able to reach the last index.

Exercise

What if we want to know the minimum number of jumps required to reach the last index?

Loading

Exercise: Lemonade change
18:54

Exercise

At a lemon stand, each lemonade costs 5 euros. Each customer will buy one lemonade and pay with either a 5, 10, or 20 euro bill. You must determine whether it is possible to provide the correct change to each customer, knowing that you don't have any change at first.

Lossless text compression
18:54

Question

Given a text that uses 32 symbols (26 letters, space, punctuation), how can we encode this text in bits?

Question

How can we use the relative frequencies of the letters to reduce our encoding? How do we know when the next symbol begins?

Prefix code
18:54

A prefix code does not have ambiguities.

Definition (Prefix code)

A prefix code for a set is a function

such that is never a prefix of if are distinct.

Codes: binary tree representation
18:54

Example

Draw the tree associated with

Question

When is a binary tree a representation of a prefix code?

Exercise

What is the meaning of 111010001111101000

Optimality
18:54

Definition (Average bits per letter)

In a binary tree, the formula becomes

Problem

Given an alphabet , find a prefix code that minimizes the ABL.

Towards Huffman encoding
18:54

Key observations

  • Lowest frequency letters should be at the lowest level in the tree of an optimal prefix code.
  • For , the lowest level always contains at least two leaves.
  • The order in which items appear in a level does not matter

Huffman encoding by hand
18:54

Huffman encoding example

Example

Encode the string "pepper" using Huffman encoding

Huffman: pseudo-code
18:54

Huffman(S) {
  if |S|=2 {
    return tree with root and 2 leaves
  } else {
    let y and z be lowest-frequency letters in S
    S’ = S
    remove y and z from S’
    insert new letter ω in S’ with fω=fy+fz
    T’ = Huffman(S’)
    T = add two children y and z to leaf ω from T’
    return T
  }
} 

Question

What's the time complexity?

Huffman: correctness
18:54

Proposition

Let T' be the tree obtained by removing two sibling leaves , and relabelling the parent with frequency .

Proposition

Huffman code for S achives the minimum ABL of any prefix code

Priority queue
18:54

We need a priority queue model so we can associate shorter encodings with the most frequent letters. The following method should be implemented and run "fast".

  • insert(S, x)
  • max(S)
  • extract_max(S)
  • increase_key(S, x, k)

Data structure: Heaps
18:54

Definition (Heap)

A heap is an array visualized as a nearly complete binary trees.

Example

Represent the following array as a heap

0123456789
231234451078291

Question

Given a node , what are the indices of its children? What about the index of the parent?

Max and min-heaps
18:54

Definition (Max/min-heaps)

A max (resp. min) heap is a heap that has the additional property that a parent is always greater (resp. less) than or equal to its children.

A min heap is ideal to find the two least frequent letters!

Min-Heaps in Python's standard library
18:54

Question

How are these methods implemented to keep the heap invariant? Why do we sift down instead of up?

Loading

Huffman encoding: implementation
18:54

Loading

Huffman encoding: implementation
18:54

Exercise

Implement Huffman's encoding. How many bits do you save on Shakespeare's 18th sonnet?

Loading

Huffman code: exercises
18:54

Huffman code: exercises
18:54

Exercices: homework
18:54