ALGO Notes

8/3/2019 ALGO Notes

1/62

ENGRECE 235: Design and Analysis of AlgorithmsCourse Code: 15845Quarter: Fall 2003

Professor Pai H. Chou

Electrical Engineering and Computer Science

September 26, 2003

8/3/2019 ALGO Notes

2/62

Contents

1 Introduction to Analysis and Design of Algorithms 5

1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Whats an algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.2 Classes of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.3 Example: sorting problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.4 Many sorting algorithms available: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Execution model assumption: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Algorithm runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Algorithm design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.1 Approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Example: merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Problem complexity vs. Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Growth of Functions, Data structures 8

2.1 Asymptotic Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 big O = asymptotic upper bound: (page 44) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 big = asymptotic lower bound (page 45) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 big = asymptitic tight bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.4 little-o, little- not asymptotically tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Polynomials, log, factorial, fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Polynomial in n of degree d: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.3 Factorial: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.4 Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.5 misc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Python: Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Data Structures Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 Array implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Stack and Calls/Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.3 Linked List x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.4 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.5 rooted trees w/ unbounded branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Recurrences; QuickSort 11

3.1 Recurrence: by Substitution method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Recursion tree, convert to summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 The Master Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 QuickSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4.1 Worst case: uneven partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4.2 Best case: balanced partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.3 Average case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1

8/3/2019 ALGO Notes

3/62

4 Heap sort, Priority Queues 15

4.1 Heap data structure: Binary heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Heapify: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3 BuildHeap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3.1 Complexity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3.2 HeapSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.5 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.1 Heap Insert: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.5.2 Extract max: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.5.3 Heap Increase key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Linear Time sorting, Medians/Order Stat 18

5.1 Comparison-based sorting: prob. complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2 Non-comparison sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2.1 Counting sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2.2 RadixSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2.3 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.3 Median and Order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3.1 Find min? max? both min and max? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3.2 How to find the i-th smallest w/out sorting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3.3 Can show that worst-case select is O(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3.4 Justification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6 Binary Search Trees and Red-Black Trees 22

6.1 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1.1 API for BST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.2 Python Programming with Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.1 Node data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.2.2 Tuple-encoded Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.3.1 Five Properties: (important!!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.3.2 How to maintain the Red-Black Tree properties? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.3.5 Tree data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7 Dynamic Programming 27

7.1 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.2 Case study: Assembly line scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.3 Matrix-Chain multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.3.1 optimial parenthesization to inimize # scalar mults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.4 Longest common subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.5 Optimal Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

8 Greedy Algorithm 31

8.1 Activity Selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8.2 Greedy choice property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.3 Huffman codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

9 Basic Graph Algorithm 35

9.1 Graphs G = (V,E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.2 Breadth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.3 Depth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.4 Topological sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

9.5 Strongly Connected Components (SCC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

9.6 Python Programming: How to represent a graph!? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

9.6.1 Simple: Directed graph, no weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

9.6.2 Weighted graph, use dictionary of dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2

8/3/2019 ALGO Notes

4/62

9.6.3 Wrap data structure in a class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

10 Minimum Spanning Trees 40

10.1 Min-spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

10.2 Kruskals algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

10.2.1 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

10.3 Prims algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

11 Single Src Shortest Paths 43

11.1 DAG shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

11.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

11.2 D ijkstras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

11.2.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

11.2.2 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

11.2.3 Correctness proof sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

11.2.4 What about negative weights? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

11.3 B ellman-Ford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

11.3.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

11.3.2 Correctness: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

11.3.3 Complexity: O(V E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4511.4 Bellman-Ford: special case of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

11.4.1 Bellman Ford as a Constraint Solver! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

12 All Pairs Shortest Paths 47

12.1 Special case Linear Programming (contd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

12.1.1 Correctness sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

12.2 All Pairs shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

12.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

12.2.2 Repackage this as a matrix structure: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

12.2.3 Floyd-Warshall: O(V3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.2.4 Algorithm - very simple! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

12.3 Transitive closure: E (Warshalls) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.4 Sparse graphs: Johnsons algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

12.4.1 Proof weight works: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

13 Flow Networks 50

13.1 F low network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

13.1.1 Maxflow methods: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

13.2 FOR D-F ULKERSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

13.3 EDMONDS-K AR P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

13.4 Maximal bipartite matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

14 NP-Completeness 53

14.1 Other Graph problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

14.1.1 Euler Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

14.1.2 Hamiltonian cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

14.1.3 Traveling Salesman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

14.1.4 Constructive vs. Decision vs. Verification Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

14.2 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

14.2.1 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

14.2.2 Decision problem in Formal language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

14.2.3 Decision in polynomial time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

14.2.4 P, NP, co-NP in terms of formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

14.3 R eduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

14.4 NP, NP-complete, NP-hard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

14.4.1 How to show L NP-Complete? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5514.4.2 Examples Circuit Satisfiability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

14.4.3 NP completeness proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

14.5 NPC Proof for (Formula) Satisfiability problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

14.5.1 3-CNF satisfiability (3-SAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3

8/3/2019 ALGO Notes

5/62

14.6 NPC proof contd: k-clique problem (CLIQUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

15 Approx. Algorithms 58

15.1 k-VERTEX-COVER problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

15.1.1 Observation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

15.1.2 Approximation algorithm for Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

15.2 Hamiltonian (HAM-CYCLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

15.3 Traveling Salesman Problem (TSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5915.3.1 Approximated TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

15.3.2 General TSP: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

15.4 Subset-sum problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

15.4.1 Exact solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

15.4.2 NP Complete proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

15.4.3 polynomial-time approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

15.5 Set Covering problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

15.5.1 Greedy approximation for SET-C OVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4

8/3/2019 ALGO Notes

6/62

Lecture 1

Introduction to Analysis and Design of Algorithms

Administrative

Course syllabus: http://e3.uci.edu/03f/15845/syllabus.html

1.1 Algorithm

1.1.1 Whats an algorithm?

input, output problem statement

sequence of steps

data structures

must always halt and output must satisfy property

1.1.2 Classes of algorithms

sorting (classified by problem)

tree traversal, graph algorithms (classified by data struc-tures / representation)

dynamic programming (classified by technique)

incremental, greedy, divide-&-conquer algorithm (clas-sified by decision criteria)

Why this class?

fundamental cross cutting applications

analysis aspect: problem complexity, what problems areeasy/hard, how to quantify

many solutions to the same problem

many applications of a given solution

1.1.3 Example: sorting problem

input: array A = [a1, a2, . . .an];

output: array A = [a1, a2, . . . ,a

n] such that a

1 a2

. . . an

1.1.4 Many sorting algorithms available:

insertion sort

See Chapter 2 Sec. 2.1 for explanation; We will revisit sorting

later.

idea: partition A into two: [1 . . .(j 1)] already nonde-creasing, [j . . . n] not yet sorted.

grow this sorted subarray by one more element

insert it into the tail of the sorted one, but need to makeit sorted

shift everybody up by one until the nondecreasingproperty is satisfied again

done when j

1 = n

5 2 4 6 1 3input A

5 2 4 6 1 3

sortedA[1..j-1]

not sortedA[j .. n]

A[j]

5 4 6 1 3

new A[j]

j = 2

j = 3

2 5 6 1 3

2

4j = 4

new A[j]

j =52 5 64 1 3

new A[j]

j =62 5 64 3

new A[j]

1

Output 2 5 641 3

5

8/3/2019 ALGO Notes

7/62

1.2 Analysis

1.2.1 Execution model assumption:

RAM (random access machine) as opposed to stack,queue, or FSM!

sequential instruction execution

alternative assumptions: parallelism, communication,synchronization

associative memory (content-addressable)

data structures pointers/linked list instead of copying

1.2.2 Algorithm runtime

measure it, but want a formula T(n) where n is problemsize

want to factor out processor speed as a scaling factor

worst case, best case, average case

algorithm memory consumption (visit later)

Chapter 3 deals with Order of growth (n2)

Insertion sort: O(n2)

Quick sort: O(n2) worst case, O(n lg n) avg.

Merge sort, Heap Sort: (n lgn)

1.3 Algorithm design

1.3.1 Approaches:

incremental

divide-and-conquer (e.g., MergeSort, QuickSort)

greedy (e.g., Min-Spanning Tree)

backtracking

randomization

genetic

1.3.2 Example: merge sort

divide array into two halves

sort left and sort right

interleave the two halves

uses Recursion

(n lg n) complexity, where lg = log base 2

but uses at least twice as much memory in order to keepmerge linear

5 2 4 6 1 3input A[1:6]

5 2 4MergeSortA[1:3] 6 1 3

MergeSortA[4:6]

(1)

(2)

MergeSortA[1:2]

(3)5 2

(4)

4

5

MergeSort

A[1:1]

(5)MergeSort

A[2:2]2

(6) Merge(A[1:1],A[2:2])

2 5

MergeSortA[3:3]

(7)

4

(8) Merge(A[1:2],A[3:3])

2 4 5 1 3 6

(9)

1 2 3 4 5 6

Merge(A[4:5],A[6:6])

(15)

(16) Merge(A[1:3], A[4:6])

output A[1:6]

1.4 Problem complexity vs. Algorithm Com-

plexity

Measures how hard the problem is

Bounds the complexity of all algorithms for this problem

Example: Cant do better than O(n lg n) for comparision-based sorting

But can do better (linear time) using non-comparison-

based algo.

The problem itself may be hard! No efficient algorithmexists for those problems.

NP-Complete problems:Guess an answer, verify (yes/no) in polynomial time;

otherwise exponential or worse.

Approximation algorithms: not minimum/optimal cost,but comes within a factor.

1.5 Data Structures

fixed: arrays, matrices

growable: stacks, queues, linked list, tree, heap

key addressable: hash table

structures: graphs, flow networks

1.6 Python

Scripting language, free!

Already installed on unix; download for any platform

Easy, powerful, interactive

Best way is to try it out; helps you debug algorithm

Tutorial at http://www.python.org/doc/current/tut/tut.html

6

8/3/2019 ALGO Notes

8/62

Important features

No Semicolons, no C-style { }

Indentation is significant

Loosely typed

# starts a comment, unless inside string

Interactive mode

% python

>>> 2 + 35

>>> x = (2 + 3) * 4>>> x20

>>> print "hello world"hello world

Lists and Arrays

>>> x = [ 1 , 2 , 3 ]>>> x[1, 2, 3]

>>> x[0]1

>>> x.append(20)[1, 2, 3, 20]

>>> len(x) # get the number of elements4

>>> x.pop() # pop from tail like a stack20

>>> x[1, 2, 3]

>>> x.pop(0) # pop from head like a queue1

>>> x[2, 3]

You can have other data types in lists and mix them:

>>> dayofweek = [Sun,Mon,Tue,Wed]>>> dayofweek[3]Wed

Hash Table (Dictionary)

Q: What if we want to look up the number from the name?

Hash tables let you use the name string as an index. How-

ever, one big difference is that lists are ordered, whereas hash

tables are not.

>>> d = {} # make an empty hash table>>> d[Sun] = 0

>>> d[Mon] = 1>>> d[Tue] = 2>>> d[Wed] = 3>>> d{Wed: 3, Tue: 2, Mon: 1, Sun: 0}>>> d[Tue]2

>>> d.has_key(Mon)1 # this means true

>>> d.has_key(Blah blah)0 # this means false

>>> d.keys()[Wed, Tue, Mon, Sun]

7

8/3/2019 ALGO Notes

9/62

Lecture 2

Growth of Functions, Data structures

This lecture:

big O, big , and big , how they grow

fundamental data structures

2.1 Asymptotic Notations

purpose: express run time as T(n), n = input size

assumption: n N= {0,1, 2, . . .} natural numbers

2.1.1 big O = asymptotic upper bound: (page 44)

f(n) = O(g(n)) is a set of functions that are upper-bounded by 0 f(n) c g(n) for some n n0

f(n) grows no faster than g(n) example: 2n2 = O(n3), for c = 1, n0

2

funny equality, more like a membership: 2n2 O(n3) n2 and n3 are actually functions, not values.

sloppy but convenient

2.1.2 big = asymptotic lower bound (page 45)

swap the place between f(n) and c g(n):0 c g(n) f(n) for some n n0

example:

n = (lg n), for c = 1, n0 = 16

2.1.3 big = asymptitic tight bounds

need two constants c1 and c2 to sandwich f(n).

example: 12 n2 2n = (n2) for c1 = 14 , c2 = 12 , n0 = 8

Theorem: (O and )

g(n)

(n)

O(g(n))

0

g(n)

(g(n))

f(n)

0 0

g(n)

g(n)

f(n)

(g(n))

Example: insertion sort

O(n2) worst case, (n) best caseT(n) = c1 n // for loop

+ c2 (n1) //key A+

n

j=2Insert(j) //loop to insert j+ c8 (n1) //A[i + 1] key Best case of Insert(j) is (1), Worst is (j)

Issue: (n2) +(n) is still (n2)

Example: MergeSort(A,p, r)

where p, rare the lower and upper bound indexes to array A.

if p < r thenq := (p + r)/2MergeSort(A,p, q), MergeSort(A, q + 1, r)Merge(A,p, q, r)

T(n) =

(1) ifn = 12 T( n2 ) +(n) ifn > 1

T(n) is expressed in terms ofT itself! This is called recur-rence. We want a closed-form solution (no T on the right-hand-

side of the equation).

To solve recurrence:

rewrite as T(n) =

c ifn = 12 T( n2 ) + c n ifn > 1

T(n) = 2T( n2 ) + cn Expand

= 2(2T( n/22 ) + c n2 ) + cn= 4T( n4 ) + 2c

n2 + cn = 4T(

n4 ) + 2cn

= 8T( n8 ) + 3cn = . . .

= (2i)T( n2i ) + i cn We can divide n by 2 at most lg n times before n = 1. So,

T(n) = (lg n) cn + cn = (n lg n).Chapter 4 will show how to solve recurrence systematically.

2.1.4 little-o, little- not asymptotically tight

o(g(n)) = {f(n) : c > 0n0 > 0 : 0 f(n) < cg(n) n n0}(g(n)) = {f(n) : c > 0n0 > 0 : 0 cg(n) < f(n) n n0}

8

8/3/2019 ALGO Notes

10/62

Example

2n2 = O(n2) is a tight bound 2n2 = o(n2). 2n = O(n2) is not tight 2n = o(n2). does not make sense to have little-

2.2 Polynomials, log, factorial, fibonacci

2.2.1 Polynomial in n of degree d:

p(n) = di=0 aini = (nd)

Polynomially bounded: f(n) = O(nk) for constant k.

2.2.2 logarithm

useful identity: alogb c = clogb a.

polylogarithmically bounded if f(n) = O(lgkn) for con-stant k

grows slower than any polynimial: lgb

n = o(na) for anyconstant a > 0.

log-star: lg n = min{i 0 : lg(i) n 1}(log of log of log ... of log ofn) grows very very slowly.

Example: lg(265536) = 5.

2.2.3 Factorial:

Loose bound: n! is faster than 2n but slower than nn

precisely, n! = (nn+1/2e lg n) (Stirlings approxima-tion)

lg(n!) = (n lg n)

2.2.4 Fibonacci

grows exponentially

2.2.5 misc

22n

grows faster than n!

(n + 1)! grows faster than n!

n! grows faster than n 2n faster than 2n

2lg n = n

2.3 Python: Functions

function definition: deffunctionName (paramList) :

variables have local scope by default

control flow: for, while, if

range(a, b) function: returns the list [a, ... b 1]>>> range(1,5)[1, 2, 3, 4]

Algorithm from book (p.24) Python code (fileisort.py)

INSERTION-S ORT(A) defInsertionSort(A):1 for j 2 to length[A] for j in range(1, len(A)):2 do key A[j] key = A[j]3 i j 1 i = j 14 while i > 0 and A[i] > key while (i>= 0) and (A[i]>key5 do A[i + 1] A[i] A[i+1] = A[i]6 i

i

1 i = i

1

7 A[i + 1] key A[i+1] = key

Dont forgetthe colon : for def, for, and while constructs

BookA[1 . . . n] Python A[0 . . . n 1]Books 2 . . . n Python 1 . . . n 1.

line 1: range(1,len(A)) [1...len(A)-1]

% python

>>> from isort import * # read file isort.py>>> x = [2,7,3,8,1] # create test case>>> InsertionSort(x) # call routine>>> x # look at result[1, 2, 3, 7, 8]

Works for other data types, too!

2.4 Data Structures Review

Concepts:

Stacks: Last-in First-out

Queue: First-in First-out

Trees: Root, children

2.4.1 Array implementation

Stack S: need a stack pointer (top)

Push(x): S[++top] xPop(): return S[top--]

Note: Actually stacks can grow downward also.

Note: To be safe, check stack underflow/overflow.

Queue Q: need 2 pointers (head, tail)

Enqueue(x): Q[tail] xtail

(tail mod len(Q) ) + 1

Dequeue(): temp Q[head]head (head mod len(Q) ) + 1return temp

Note: To be safe, check queue underflow/overflow.

2.4.2 Stack and Calls/Recursion

Calling routine Push local, param, return address

Return Pop and continue

9

8/3/2019 ALGO Notes

11/62

Example: MergeSort (cf. MergeSort Fig. from last lecture)

L (Left), R (Right), M (Merge)

(1) (2) (3) (4) (5) (6) (7) (8)

[1, 6] [1, 6]L

[1, 3]

[1, 6]L

[1, 3]L

[1, 2]

[1, 6]L

[1, 3]L

[1, 2]L

[1, 1]

[1, 6]L

[1, 3]L

[1, 2]R

[1, 6]L

[1, 3]L

[1, 2]R

[2, 2]

[1, 6]L

[1, 3]L

[1, 2]M

[1, 6]L

[1, 3]R

[1, 6]L

[1, 3]R

[3, 3]

[1, 6]L

[1, 3]M

[1, 6]R

2.4.3 Linked List x

Singly linked (next) or Doubly linked (next, prev)

need a key

need a NI L value

How to implement linked lists?

Pre-allocate arrays prev[1 : n], next[1 : n], key[1 : n], andx is an index {0 . . . n}, where 0 is NI L.

Records (structures in C, classes in Java): x.prev,x.next, x.key, and NULL for NI L.

Rearrange by adjusting links, not moving data (key)

Recall: INSERTIONSORT with array need to copy objectto insert.

Linked list representation: once we know where to in-sert, adjust link in O(1) time.

Array version: best case O(1), worst case O(n), averagecase O( n2 ) = O(n).

2 5 641

2 5 641 6

2 541 65

2 541 64

2 531 64

2 5 641

insert(3)

3

2 5 641

(1) alllocate anode for thedata

headinsert(3)

(2) adjust links

3

Search for the item

e.g., where to insert?

list is unsorted, worst/average case O(n) time to find it!(compare keys one at a time).

list is sorted:

Array: binary search in O(lgn) times

Singly linked list: O(n) avg/worst case, need to dolinear search

Doubly linked list: does not help! still O(n) cant

jump to the middle of the list, because link is to thehead/tail.

2.4.4 Binary search tree

pointer to root; each node has pointer to left-child, right-child

optional pointer to parent

left smaller, right bigger key value O(lgn) search, if BALANCED

Can still be O(n) if not balanced!!

Red-black tree (Ch.13) is a way to keep BST balanced

a Heap (Ch.6) implements balanced BST using array; nopointers

18

2 6 1341

head tail

18 20

6

root

DoublyLinkedList

Binary(Search)Tree

2

1 4 13 20

n-ary treew/ left-child,right-sibling

18

6

root

2

1 4 13 20

2.4.5 rooted trees w/ unbounded branches

Could have arbitrary number of children!

left-child: head of linked list to children (down one level)

right-sibling: link to next on same level

optional link back to parent

10

8/3/2019 ALGO Notes

12/62

Lecture 3

Recurrences; QuickSort

This lecture:

Chapter 4: Recurrences, Master method

Substitution method

Recursion tree Master method: T(n) = aT(n/b) + f(n)

Quicksort (Ch. 7.17.2) (if we have time today)

Summation review (Appendix A)

3.1 Recurrence: by Substitution method

guess-and-verify

rename variables if necessary.

mathematical induction: assume true for n0, show truefor n

n

0.

issue: boundary conditions, floor/ceiling use inequal-ity to bound those cases.

avoid pitfall with constant terms.

Example w/ floor

T(n) =

(1) ifn = 1,2T(n/2) + n ifn > 1.

Guess T(n) = O(n lgn).

sufficient to show T(n) cn lg n for some c > 0.

Assume T(n/2) cn/2 lg(n/2).T(n) 2(cn/2 lg(n/2)) + n substitution

cn lg(n/2) + n drop floor

= cn lg n cnlg 2 + n (lg(a/b) = lg a lg b)

= cn lg n cn + n (lg 2 = 1)

cn lg n c 1

.

Avoid Pitfalls

What if we guessed T(n) = O(n)??

Try to show T(n) cn

T(n) 2(cn/2) + n subst. into T(n) 2T(n/2) + n

cn + n drop floor

= (c + 1)n factor out const

cn Fails!

.

must be careful not to drop constants arbitrarily.

subtractive term in guessed solution

Example: T(n) = T(n/2) + T(n/2) + 1

Guess: O(n). First attempt: verify T(n) cn:

T(n) cn/2+ cn/2+ 1

= cn + 1 cn

Oops! But this does not mean T(n) = O(n).

works if we guess T(n) cn b for some b 0:

T(n) (cn/2b) + (cn/2b) + 1

= cn 2b + 1

cn b works for b 1.

11

8/3/2019 ALGO Notes

13/62

Variable Renaming

introduce a new variable to replace a difficult expression, make

it easier to solveExample T(n) = 2T(n) + lgn

Rename: let m = lgn: T(n = 2lg n = 2m) = 2T(2m2 ) + m

Rename: let S(m) = T(2m) S(m) = 2S( m2 ) + m

This looks like S(m) = O(m lg m)

Substitute back, lg n = m: T(2m = 2lg n = n) = S(m)

= O(m lg m) = O((lg n) lg(lgn)).

3.2 Recursion tree, convert to summation

Example: T(n) = 3T(n/4) +(n2).

Visualize as recursion tree: T(n) = 3T(n/4) + cn2

forsome c > 0

cn2

T(n/4) T(n/4) T(n/4) = c(n/4)2

T(n/16) T(n/16) T(n/16)

cn2

(3/16) cn2

T(1) T(1) T(1)

Convert to summation:

T(n) = cn2 + 316 cn2 + ( 316 )

2cn2 + + ( 316 )Levels1 + width

# Levels = 1 + log4 n, similar argument to MergeSort.root is 1 level, divide log4 n times.

width at level i is 3i, so at the bottom level, width is3log4 n.

Apply the identity alogb c = clogb a, rewrite 3log4 n =nlog4 3.

T(n) =(log4 n)1i=0

(3

16)icn2 +(nlog4 3)

1

intuition: divide-and-conquer

a-way branch (recursive calls) per level,

each subproblem size = n/b,

Combine cost is f(n).

#levels = logb n

#nodes(at level i) = ai

#nodes(at leaf level) = alogb n = nlogb a

Example: MERGESORT

a = 2, b = 2, f(n) = (n) floor/ceiling dont matter

12

8/3/2019 ALGO Notes

14/62

Three cases, compare f(n) to nlogb a

if f(n) = then T(n) =

1. O(n(logb a)) (nlogb a)2. (nlogb a) (nlogb a lgn).

3. (n(logb a+) (f(n)).and a f(n/b)

c f(n)

for const c < 1, n n0,Restated:

1. if f(n) is polynomially slower than #leaves = n(logb a),then

# leaves dominate cost; its(alogb n) = (nlogb a)

2. f(n) (conquer) and nlogb a (divide) are equally fast,(with lg n levels) then its (nlogb a lgn).

3. The conquer cost f(n) is dominant.

Important note:

Master method does not cover all recurrences!

f(n) and nlogb a comparison must be differ polynomially(i.e., n); smaller differences dont apply.

Actually, a more general result for case 2:

iff(n)

nlogb a= (lgkn) for constant k 0

then T(n) = (nlogb a lgk+1 n)

Examples

T(

n) = 9

T(

n/3) +

n

f(n) = n, a = 9, b = 3, nlogb a = nlog3 9 = (n2)

case 1, because f(n) = O(nlog3 9) = O(n21),within a polynomial factor

T(n) = nlogb a = (n2).

T(n) = T(2n/3) + 1

a = 1, b = 3/2, f(n) = 1, nlogb a = nlog3/2 1 = n0 = 1

case 2, because f(n) = 1 = (nlogb a).

T(n) = (n0 lg n) = (lg n)

T(n) = 3T(n/4) + n lg n

a = 3, b = 4, f(n) = n lg n, nlogb a = nlog4 3

case 3: because f(n) = n lg n = (nlog4 3+)

T(n) = (f(n)) = (n lg n)

3.4 QuickSort

Divide and Conquer:

pick a pivot to serve as the dividing point

partition array A around the pivot: one side with ele-ments smaller than pivot, one side larger

recursively sort two sublists

no need for merge!

2 8 7 1 3 5 6 4Input

A[p, r]

pick pivot x = 4 (from A[r])

4

after PARTITION, want to get

x x >xin this case:

42, 1, 3 7, 5, 6, 8

QUICKSORTrecursively

QUICKSORTrecursively

QUICKSORT (A,p, r)1 if p < r2 then q PARTITION(A,p, r)3 QUICKSORT (A,p, q1)4 QUICKSORT (A, q + 1, r)

Algorithm from book (p.146) Python code

PARTITION(A,p, r) defPartition(A,p,r):1 x A[r] x = A[r]2 i p1 i = p - 13 for j p to r1 for j in range(p,r):4 do ifA[j] x if(A[j]

8/3/2019 ALGO Notes

15/62

T(n) = max0qn1

(T(q) + T(n q 1)) +(n)

max0qn1

(cq2 + c(nq 1)2) +(n)

= c max0qn1

(q2 + (n q1)2) +(n)

c(n 1)2

+(n) = c(n2

2n + 1) +(n)= cn2 c(2n 1) +(n) cn2

3.4.2 Best case: balanced partitioning

T(n) 2T(n/2) +(n) Can apply 2nd case of the Master theorem:

if f(n) = (nlogb a), then T(n) = (nlogb a lg n).

T(n) = (nlog2 2=1 lg n) = (n lg n)

3.4.3 Average case

(next time)

14

8/3/2019 ALGO Notes

16/62

Lecture 4

Heap sort, Priority Queues

This lecture:

Quick sort (continued from last time)

Heap property, Heap sort

Python implementation (for homework)

4.1 Heap data structure: Binary heap

nearly balanced binary tree

use array instead of pointer data structure

Definition:

parent(i) = i/2

leftChild(i) = 2i

rightChild(i) = 2i + 1

heap property

by default: Max Heap: A[parent(i)] A[i]Min Heap: A[parent(i)] A[i]

height of heap as a binary tree is (lg n).

x x

(max) heap

zs Left and Rboth z

x

y z

x

y z

ys L and Rboth y

1

2, 3

4, 5, 6, 7

y, z both x

index

Examples: Are these (max)-heaps?

[7, 5, 6,2, 3, 4]?(a) 7 5,6; (b) 5 2, 3; (c) 6 4 Yes.

[7, 6, 5,4, 3, 2]?(a) 7 6,5; (b) 6 4, 3; (c) 5 2 Yes.

[2,5, 7, 4,3, 6]? No. However, it is almost a heap if we dont considerthe root: (b) 5 4, 3; (c) 7 6.This can be fixed with a call to H EAPIFY.

Heap API

HEAPIFY(A, i) (incremental fix) assume L and R subtreesofi are heaps, make A[i . . . n] a heap.

BUILDHEA P(A) input unsorted array, rearrange to make it a heap

HEA PSORT (A) sorting alg., uses HEAPIFY and BUILDHEA P

4.2 Heapify:

Assumption: called when L(i), R(i) are heaps, but

A[i] itself isnt

its children

after call: propagate the largest of children up to rootposition, so that the whole thing is a heap again.

MAX -H EAPIFY(A, i, n) (see page 130, slightly different)1 l LEF T(i) index for left child2 r RIGHT(i) index for right child3 if(l n) and (A[l] > A[i])4 then largest l left child exists and is bigger5 else largest i either no L or root is bigger6 if(r n) and (A[r] > A[largest])7 then largest r right child exists and is bigger

no else clause: either no R or root is bigger. already taken care of by line 5

8 iflargest= i9 then exchange A[i] A[largest]

10 MAX -H EAPIFY(A, largest, n)

2

5 7

4 3 6

index

1

2, 3

4 7

largest 7

5 2

4 3 6

largest

7

5

24 3

6

done

Correctness: by induction.

15

8/3/2019 ALGO Notes

17/62

Time complexity of Heapify

O(lg n) binary tree argument.

Or, O(h) where h = height of the heap as a binary tree.

4.3 BuildHeap

input unsorted array, output a heap after the call

leaf = trivial heap, start with one level above the leaf

BUILD-M AX -H EA P(A) (page 133)1 n length[A]2 for i n/2 downto 13 do MAX -H EAPIFY(A, i, n)

7

2

3 4

5 6 7

index

1

2, 3

4 7

7

6

25 3

4done

HEAPIFY 2

3

5 6 4

2

6 7

5 3 4

HEAPIFY2

3 4

5 6 7

trivial heaps

1

2, 3

4 7

HEAPIFY

Build (Max) Heap:

7

6

5 3

2

4

largestlargest

Correctness: assumption for Heapify always hold.

4.3.1 Complexity:

loose upper bound is O( n2 lg n) = O(n lg n)MAX -H EAPIFY is O(lg n) iterate n/2 times.

However! tighter bound is O(n).HEAPIFY near leaf level is much cheaper than near root

level!

Height ofn-element heap = lg n. (binary tree) # nodes at level h is n/2h+1

Height 0 (leaf) n/20+1 = n/2 nodes.One level higher half as many nodes

HEAPIFY called on a node at height h is O(h) (see lastpage if you forgot)

T(n) =lg n

h=0

(#of HEAPIFY calls at level h)

Time of HEAPIFY at level h

=lg n

h=0

(n

2h+1)O(h),factor out n, mult. by 2,

= O(nlg n

h=0

h

2h).

Apply the integrating and differentiating series (Ap-pendix A)

k=0

kxk =x

(1 x)2 .

plug in x = 12 , we get

h=0

(h

2h) =

(1/2)

(1 1/2)2= 2.

T(n) = O(n

2) = O(n)

4.3.2 HeapSort

Use HEAPIFY to find the largest, stick it in the end A[n]

Use HEAPIFY on [1 . . . n 1] to find the second largest,stick it in position A[n 1]

HEA PSORT(A) (page 136, slightly different)1 BUILD-M AX -H EA P(A, n) O(n) time2 for i n downto 2 n 1 times3 do exchange A[1] A[i] O(1) time4 MAX -H EAPIFY(A,1, i 1) O(lg(i 1)) time

Total time:O(n)BUILDHEA P

+ O(n)loop O(lgn)HEAPIFY= O(n lgn)

7 6 25 34

input: [2, 3, 4, 5, 6, 7]

Line 1: BUILDMAXHEAP

Line 2: loop, i = 6heap

Line 3: swap, A[1], A[i]762 5 34

sortedalmost a heap(root is wrong, butchildren are OK)

Line 4: Heapify(A, 1, 5) 756 2 34

753 2 64

heap again sorted

Line 3: swap, A[1], A[i]

Line 2: loop, i = 5

sortedalmost a heap

Line 4: Heapify(A, 1, 4) 735 2 64

Line 2: loop, i = 4

heap againsorted

Continue until the whole array is sorted.

4.4 Python Programming

See http://e3.uci.edu/01f/15545/heap-hint.html; or for PDF, tryhttp://e3.uci.edu/01f/15545/heap-hint.pdf

4.5 Priority Queues

want to get the highest priority element,

MAX HEA PINSERT(A,x)

EXTRACTMAX(A)

16

8/3/2019 ALGO Notes

18/62

Implementation choices:

unsorted: O(n) to extract, O(1) to insert (anywhere)

sorted linked list: O(1) to extract, O(n) to insert

Heap: O(lg n) to insert , O(lg n) to extract

4.5.1 Heap Insert:

assume A is already a heap

increase size ofA by one, put new element at the end, andpropagate up until its parent is larger (or its the largest)

MAX -H EA P-I NSERT(A, key) (diff. from book page 140)1 heap-size[A] heap-size[A] + 12 i heap-size[A]3 while i > 1 and A[Parent(i)] < key4 A[i] A[Parent(i)]5 i Parent(i)6 A[i] key

7

7

6

5 3 2

key = 87

6

5 3 2

key = 87

6 4

5 3 2

make room

MaxHeapInsert(8)

44

4 7

8

6

5 3 2

key = 8

4

O(lg n) time (climb up from tail to at most root)

assume short circuit evaluation (i.e., check index, ifi > 1then A[parent(i)] is not even evaluated!

Even though the algorithm saysheap-size[A] heap-size[A] + 1,you probably cant do that in any programming lan-

guage. You need to actually make room for A[n + 1], orelse your program can crash. In Python, call A.append

to increase physical size ofA.

in Python, heap-size[A] can be derived from len(A),rather than a global variable.

4.5.2 Extract max:

pull the max (already at the root)

move the last one to the front and heapify (since bothchildren are heaps)

in Python, use A.pop to accomplish heap-size[A] heap-size[A] 1.

HEA P-E XTRACTMAX(A)1 if(heap-size[A] < 1)2 then error heap underflow;

3 max

A[1]

4 A[1] A[heap-size[A]]5 heap-size[A] heap-size[A] 16 MAX -H EAPIFY(A, 1)7 return max

4.5.3 Heap Increase key

HEA P-I NCREASE-K EY (A, i, key)

A is a heap, i is an index to an element.

Purpose: overwrite element A[i] with key (which is ex-

pected to be larger than A[i] hence increase key),while making minor changes to restore its heap property.

Strategy: propagate A[i] up towards the root, similar toMAX -H EA P-I NSERT.

HEA P-I NCREASE-K EY (A, i, key) (page 140)1 ifkey < A[i]2 then error new key is smaller than current key

3 A[i] key4 while i > 1 and A[Parent(i)] < A[i]5 do exchange A[i] A[Parent(i)]6 i Parent(i)

not a common routine in priority queues; used later inmin-spanning tree algorithms

very unusual to implement MAX -H EA P-I NSERT asshown on page 140.

implies a max heap. (HEA P-D ECREASE-K EY would im-ply a min heap.)

timing complexity similar to HEA PINSERT except no re-allocation of storage size

17

8/3/2019 ALGO Notes

19/62

Lecture 5

Linear Time sorting, Medians/Order Stat

This lecture

problem complexity of comparison-based sorting (8.1)

noncomparison sorting: counting sort (8.2), radix sort(8.3), bucket sort (8.4)

5.1 Comparison-based sorting: prob. complex-

ity

Decision Tree argument

Use a tree to represent all possible execution traces of analgorithm (Fig. 8.1)

1:2a[1] a[2]

2 : 3

a[1] a[2] a[3]

a[2] a[3]

1 : 3

a[1] a[3] < a[2]

a[1] a[2]a[3] < a[2]

a[1] a[3]

a[3] < a[1] a[2]

a[3] < a[1]

1 : 3

2 : 3

>

>

a[1] > a[2]

two-way branch on >,

8/3/2019 ALGO Notes

20/62

C[1..k] key count; overloaded to be index

Initialize C[1..k] 0foreach A[i]

do C[A[i]]++foreach i 2 to k

do C[i]+ = C[i

1]

computes Offset[i] := Offset[i 1] +C[i] converts C from count to index to the end of its region

for i n downto 1 doB[Offset[i] ] A[i]

2 5 3 0 2 3 0 3A

input

count #times each key appears in input

2 2 3 1

0 1 2 3 4 5

C

outputlayout

2 530

2 2 3 1

index to endof region

C2 4 7 8

Copy fromA[i] toB[C[i]]in reverseorder

B 3

30

30 3

2 5 3 0 2 3 0 3A

30 32

30 320

30 320 3

30 320 3 5

30 320 3 52

6

1

5

3

0

4

7

2

stable sort (relative positions preserved)

uses 2 buffer

complexity = O(n + k), effectively linear ifk= O(n).

5.2.2 RadixSort

Idea: For keys that have digits

use stable sorting to sort by one digit at a time

start from least significant digit!

3 2 9

4 5 7

6 5 7

8 3 9

4 3 6

7 2 0

3 5 5

input

Stable-sort byleast significantdigit first

7 2 0

3 5 5

4 3 6

4 5 7

6 5 7

3 2 9

8 3 9

Stable-sort by2nd LSD

7 2 0

3 2 9

4 3 6

8 3 9

3 5 5

4 5 7

6 5 7

Stable-sortby MSD

3 2 9

3 5 5

4 3 6

4 5 7

6 5 7

7 2 0

8 3 9

Important!! does not work if you start from MSD!!!

3 2 9

4 5 7

6 5 7

8 3 9

4 3 6

7 2 0

3 5 5

input

Stable-sort byMSD first

Stable-sort by2nd MSD

7 2 0

3 2 9

4 3 6

8 3 9

3 5 5

4 5 7

6 5 7

Stable-sortby MSD

3 2 9

3 5 5

4 3 6

4 5 7

6 5 7

7 2 0

8 3 9

3 2 9

3 5 5

4 5 7

4 3 6

6 5 7

7 2 0

8 3 9

Oops! Not sorted

Need to do a lot of bookkeeping if really want to sort from

MSD.

3 2 9

4 5 7

6 5 7

8 3 9

4 3 6

7 2 0

3 5 5

input

Stable-sort byMSD first

Stable-sort by2nd MSD

7 2 0

3 2 9

4 3 6

8 3 9

3 5 5

4 5 7

6 5 7

Stable-sortby MSD

3 2 9

3 5 5

4 3 6

4 5 7

6 5 7

7 2 0

8 3 9

3 2 9

3 5 5

4 5 7

4 3 6

6 5 7

7 2 0

8 3 9

also useful for sorting by multiple keys by name, bySAT score, etc. start from the secondary before the

primary key

runtime complexity: (d(n + k)), whered = # of digits in key,k= radix (# possible values per digit)

reason: make d passes. Each pass is (n + k) (e.g.,counting sort)

5.2.3 Bucket sort

assume values are distributed within some range.

project the value into n buckets, (insertion-) sort eachbucket

input7 8 1 7 3 9 2 6 7 2 9 4 2 1 1 2 2 3 6 8

bucket#

7 1 3 2 7 9 2 1 2 6

1

2

3

9

12 17

21 2623

.

.

.

39

94

0

Timing Complexity: expected to be linear (to be revisited)

19

8/3/2019 ALGO Notes

21/62

5.3 Median and Order statistics

5.3.1 Find min? max? both min and max?

n 1 comparisons for min (keep the smallest so far)

also n 1 comparisons for max

both min/max: 2n2but could do better: 3n/2 comparisons

1. compare pairs (n/2 comparisons)2. search for min in the smaller set

3. search for max in the bigger set

3 7 5 8 6 1 9 4input

3

7

5

8 6

1

9

4smaller

bigger

n/2 for pairwise compare

n/2 1 for min

n/2 1 for max

1

9

5.3.2 How to find the i-th smallest w/out sorting?

keep track of min & max so far

find the median

Use QuickSorts partitioning:each pass: pivot is in the correct rank (absolute sorted

position)

if pivot is the position we want, return itotherwise, continue partitioning in either one of the two

intervals.

RANDOMIZED -S ELECT(A,p, r, i)1 if p = r2 then return A[p]3 q RANDOMIZED -PARTITION(A,p, r)4 k qp + 15 ifi = k the pivot value is the answer6 then return A[q]7 else ifi < k8 then return RANDOMIZED -S ELECT(A,p, q1, i)9 else return RANDOMIZED -S ELECT(A, q + 1, r, i k)

Complexity of (randomized) Select:

Best case: O(n) first pass, very lucky, pivot is what wewant

Even partitioning: T(n) + T(n/2) + T(n/4) + ... +T(2) + T(1) = T(2n) = O(n)

Worst case: T(n) + T(n 1) + T(n 2).. = O(n2)

Average time: use Randomized1/n probability of each position i as pivot. expected cost

= sum up probability cost ofi as pivot

T(n) =n1i=1

1

nT(max(i, n i)) + O(n)

=n1

i=

n/2

2

nT(i) + O(n)

Guess T(n) an, Verify

T(n) 2n

n1

i=n/2(a i) + cn

2n

n

2

(a(n 1) + an/22

+ cn =3

4an + cn.

Choose a = 4c O(4n) = O(n)

5.3.3 Can show that worst-case select is O(n)

(Theoretical interest only)

Divide n elements into sublists of LENGTH 5 each, forn/5 lists

Find the medians from these lists. Brute force (constanttime) on each list.

Recurse to find the median x of these n/5 medians. Partition around x. Let k= rank(x)

if(i = k) return x

if (i < k) then Select recursively to find ith smallest infirst part

else Select recursively to find (i

k)th smallest in second

part.

5.3.4 Justification:

(x is dont know, s = smaller, L = larger)x x x L L L L

x x x L L L L

s s s * L L L

s s s s x x x

s s s s x x x

pivot

example: for the list [2, 3, 5, 5, 7, 8]

a binary search tree:

5

3

2 5

7

8

another valid BST for the

same list:

5

3

2

5

7

8

6.1.1 API for BST

INORDER-T RE E-WAL K(x): print keys in BST in sortedorder

TRE E-S EARCH(x, k): find tree node with key k.

TRE E-M AX(x) and TRE E-M IN(x), assume x is root

TRE E-S UCCESSOR(x) and TRE E-P REDECESSOR(x),for any tree node (need not be root)

TRE E-I NSERT(T,z) and TRE E-D ELETE(T,z)

INORDER-T RE E-WAL K(x)1 ifx = NI L2 then INORDER -T RE E-WAL K(l e f t [x])3 print key[x]4 INORDER -T RE E-WAL K(right[x])

Note: variants of tree visits (binary trees in general, not

necessarily BST)

inorder = recursive left, root, recursive right

preorder = root, recursive left, recursive right

postorder = recursive left, recursive right, root

Timing complexity of INORDER-T RE E-WAL K:

(n) time

Assume root x, left-subtree[x] has knodes right-subtree[x] has n k 1 nodes.assume each call incurs overhead d

Proof by substitution method: into T(n) = (c + d)n + c

T(n) = T(k) + T(n

k

1) + d

= ((c + d)k+ c)+((c + d)(n k1) + c) + d= (c + d)n + c.

TRE E-S EARCH(x, k)1 ifx =NI L or k= key[x]2 then return x

3 ifk< key[x]4 then return TRE E-S EARCH(l e f t [x], k)5 else return TRE E-S EARCH(right[x], k)

returns x if found (key[x] matches k), NI L if not found

22

8/3/2019 ALGO Notes

24/62

8/3/2019 ALGO Notes

25/62

if z has two children find and cut successor y, pasteover z

successor y must be either leaf or has one child; it

cant have two children.

Delete(5)

5 has 2 children

15

5

3

13

7

16

12

10

6

Step 1: Cut 5s successor = 6(by splicing.)(in case of leaf, just cut)

15

5

3

13

7

16

12

10

6

Step 2: Paste 6 over 5

15

3

13

7

16

12

10

6

Equivalently, could find predecessor x, cut x and pasteover z.

Rotation:

Transform a BST while still preserveing BST property

x

y x

y

LEFT-ROTATE(x)

RIGHT-ROTATE(y)

Right rotate:

x is left child ofy, want to make y the right child of

x.

x gets to keep its left child as before. xs old right tree (which all have values x >> x = Node(3) # makes a new n ode w ith key =

>>> x.key

3

Note that calling Node(3) actually calls Node classsinit (constructor) routine.

Use x.key, x.left instead ofkey[x], left[x], etc.

We can insert additional fields even after the constructoris called!

example: x.color = red # add a new color at-

tribute

Example routine:

def TreeMaximum(x):

while x.right != None:

x = x.right

return x

6.2.2 Tuple-encoded Trees

Tuples vs Lists:

list: L = [1, 2, a, b, c ]

tuple: T = (1, 2, a, b, c)

Similarities:

Both tuples and lists let you access elements by their in-dices: e.g. L[3] and T[3] both yield the value ofb.

len works for both. Both len(L) and len(T) yield5.

Differences:

syntax: tuples use ( ); lists use [ ]

tuples are immutable (read only)not allowed to say T[3] = z

OK to do lists L[3] = z

tuples can be compared lexigraphically:(1, 2) < (1, 2, 3) < (1, 3)

Can have tuples within tuples:(1, (2, 3, (4, 5)), 6)

24

8/3/2019 ALGO Notes

26/62

8/3/2019 ALGO Notes

27/62

8/3/2019 ALGO Notes

28/62

Lecture 7

Dynamic Programming

This lecture: Dynamic programming

Optimization problems and Dynamic programming

Assembly line scheduling

matrix multiply

longest common subsequencce

optimal binary search tree

7.1 Optimization problems

many possible solutions with different costs

want to maximize or minize some cost function

unlike sorting its sorted or not sorted.. partially sorteddoesnt quite count.

examples: matrix-chain multiply (same results, justfaster or slower)

knap-sack problem (thief filling up a sack), compression

7.1.1 Dynamic programming

programming here means tabular method

Instead of re-computing the same subproblem, save re-sults in a table and look up (in constant-time!)

significance: convert an otherwise exponential/factorial-time problem to a polynomial-time one!

Problem characteristic: Recursively decomposable

Search space: a lot of repeated sub-configurations

optimal substructure in solution

7.2 Case study: Assembly line scheduling

two assembly lines 1 and 2

each line i has n stations Si,j for n stages of assembly

process

each station takes time ai,j

chassis at stage j must travel stage (j + 1) next

option to stay in same assembly line or switch to

the other assembly line

time overhead ofti,j if decided to switch to line to

go to Si,j.

assemblyline 1

assemblyline 2

stage 1 stage 2 stage 3 stage 4

2

2

1

3

2

1

stage 5 stage 6

2

3

1

4

42

32

7 9

5

3

6

4

4

8

5

4

75

Want to minimize time for assembly

Line-1 only: 2 + 7 + 9 + 3 + 4 + 8 + 4 + 3 = 40

Line-2 only: 4 + 8 + 5 + 6 + 4 + 5 + 7 + 2 = 41

Optimal: 2 + 7 + (2) + 5 + (1) + 3 + (1) + 4 + 5 +(1) + 4 + 3 = 38

How many possible paths? 2n (two choices each stage)

Optimal substructure

Global optimal contains optimal solutions to subprob-lems

Fastest way through any station Si,j must consist of

shortest path from beginning to Si,j

shortest path from Si,j to the end

That is, cannot take a longer path to Si,j and make

up for it in stage (j + 1) . . . n.

27

8/3/2019 ALGO Notes

29/62

Notation: fi[j] = fastest possible time from start throughstation Si,j (but not continue)

ei,xi are entry/exit costs on line iGoal is to find f global optimal

initially, at stage 1 (for line l = 1 or 2),fl[1] = el (entry time) + al,1 (assembly time)

at any stage j > 1, line l, (and m donotes the other line)

fl[j] = min

fl [j 1] same line l

fm[j 1] + tm,(j1) other line m + transfer

+ al,j (assembly time at station Sl,j)

Can write this as a recursive program:

F(i, j)if j = 1

then return ei + ai,1else return min(F(i, j 1),

F(i%2 + 1, j 1) + ti%2+1,j1)+ai,j

But! There are several problems:

many repeat evaluation of F(i, j), could be O(2n) time use a 2-D array f[i, j] to remember the running mini-mum

does not track the path use array li[j] to remember which path gave us thismin

Iterative version shown in book on p. 329.

Run time is (n).

7.3 Matrix-Chain multiply

Basic Matrix Multiply

A is p q, B is q rq

pq

r

p

r

A B AB

product is p

rmatrix: ci,j =

y=1...qai,y

by,j

total number of scalar multiplications = p q r

Multiply multiple matrices

matrix multiplication is associative: (AB)C= A(BC)Both yield p s matrix

Total # multiplications can be different! (added)

(AB)Cis

pqr to multiply AB first,

+ prs to multiply (AB)(p r) w/C(r s)= pqr+ prs total # multiplications

On the other hand, A(BC) is pqs + qrs

Example: if p = 10, q = 100, r= 5, s = 50, then

pqr+ prs = 5000 + 2500 = 7500

pqs + qrs = 50000 + 25000 = 75000 ten times asmany!

Generalize to matrix chain: A1,A2,A3 . . .AnBut there are many ways!

P(1) = 1 (A) nothing to multiplyP(2) = 1 (AB)P(3) = 2 A(BC), (AB)CP(4) = 5 A(B(CD)),A((BC)D), (AB)(CD), (A(BC))D,((AB)CP(5) = 14 . . . Exponential growth

P(n) =

1 ifn = 1,

n1k=1 P(k) P(nk) for n 2(4n/n3/2) at least exponential!

7.3.1 optimial parenthesization to inimize # scalar

mults

((A1 A2)

A1...2

A3) (A4 A5)

A4...5

1..k k+1..n

A1...5

A1...3

Notation:

let Ai...j denote matrix product AiAi+1 Aj matrix Ai has dimension pi1 pi


if optimal parenthesization for A1...j at the top level is(L)(R) = (A1...k)(Ak+1...j), then

L must be optimal for A1...k, and

R must be optimal for Ak+1...j

Proof by contradiction

Let M(i, j) = Minimum cost from the ith to the jth matrix

M(i, j) =

0 ifi = j,

minik

8/3/2019 ALGO Notes

30/62

Observation

dont enumerate the space!bottom up no need to take the min so many times!

instead of recomputing M(i, k), remember it in arraym[i, k]

book keeping to track optimal partitioning point. SeeFig.7.1

O(n3) time, (n2) space (for m and for s arrays)

7.4 Longest common subsequence

Example sequence X = A, B, C, B, D, A, B,Y = B , D , C , A , B , A

a subsequence ofX is Z= B, C, D, B

Longest common subsequence (LCS) of length 4:

B , C , B , A, B , D , A , B

This is a maximization, also over addition, but add costby 1 (length increment)

Brute force:

enumerate all subsequences ofx (length m), check if itsa subsequence ofy (length n)

#of subsequences of x = 2m (binary decision at eachpoint whether to include each letter)

worst case time is (n

2m) because for each one check

against ys length = n

Better way:

Notation: Xk = length-kprefix of string Xxi is i

th character in string X

Z = z1 . . .zk is an LCS of X = x1 . . .xm and Y =y1 . . .yn

ifxm = yn then zk = xm = yn, andZk1 is an LCS of Xm1,Yn1.

ifxm= yn then

ifzk = xm then Z is LCS ofXm1,Y ifzk = yn then Z is LCS ofX,Yn1.

c[i, j] = LCS length ofXi,Yj

c[i, j] =

c[i 1, j 1] + 1 ifx[i] = x[j](match)max

c[i, j 1]c[i 1, j]

(no match, advance either)

Algorithm

c[1 : m,0] 0; c[0,1 : n] 0for i 1 to m

do for j 1 to ndo if(xi = yj)

then c[i, j] c[i 1, j 1] + 1b[i, j] match ()

else if(c[i 1, j] c[i, j 1])then c[i, j] c[i 1, j] copy the longer length

b[i, j] dec i ()else c[i, j 1] > c[i 1, j]

c[i, j] c[i 1, j]b[i, j] dec j ()

0

0 0 0 0 0 0 0

j 0 1 2 3 4 5 n =6i

0

1

2

3

4

5

6

m=7

A

B

C

B

D

A

B

B D C A B A

0 0 0 0 0 0 0

0

0

0

0

0

0

0

i=1 A 0

B D C A B A

0

0 0 0 1 11

i=2 B

B D C A B A

1 1 1 1 22

i=3 C 0

B D C A B A

1 1 2 2 22

i=4 B 0

B D C A B A

1 1 2 2 33

i=5 D 0

B D C A B A

1 2 2 2 33

i=6 A 0

B D C A B A

1 2 2 3 43

i=7 B 0

B D C A B A

1 2 2 3 44

Time (mn), Space (mn).

7.5 Optimal Binary Search Trees

input: n keys K= k1, k2, . . . ,knn + 1 dummy keys D = d0, d1, . . . ,dn

d0 < k1 < d1 < k2 < d2 < ... < kn < dn

key ki has probability pi, anddummy key di has probability qi, and

n

i=1

pi +n

i=0

qi = 1

want: Binary tree that yields fastest search: (fewer steps)for frequently used words

ki keys should be internal nodes, and di dummy keysshould be leaves.

29

8/3/2019 ALGO Notes

31/62

MATRIX-CHAIN-O RDER(p)1 n length[p] 12 for i 1 to n3 do m[i, i] 04 for l 2 to n: l = length of interval considered5 do for i 1 to (n l) + 1:

starting index, from 1 up to n

length for each length

6 do j i + l 1 ending index, always length away from the starting index

7 m[i, j] 8 for k i to j 1:

different partitions between i and j9 do q m[i, k] + m[k+ 1, j] + pi1pkpj.

10 if(q < m[i, j]):11 then m[i, j] q12 s[i, j] k remember best k between i, j13 return m, s

Ai...jlength

l

i = startingindexof chain

n

n l + 1

j = i + l 1= ending

indexof chain

1

kfromitoj 1

chain length lfrom 2 ton

starting index ifrom 1 ton (implies ending indexj

Figure 7.1: Matrix-Chain-Order Algorithm and graphical illustration.

optimize for common case. Balanced tree might not begood!

Example tree (not optimal):

k2

0.10

k1

0.15

k4

0.10

d0

0.05

d1

0.10

k3

0.05

k5

0.20

d2

0.05

d3

0.05

d4

0.05

d5

0.10

Expected search cost of tree T

Optimal substructure: if root=kr, L = (i . . . r 1), R =(r+ 1 . . . j) L,R must be optimal subtrees.

Expected cost e[i, j]

e[i, j] =

qi1 (a dummy leaf) if j = i 1minirj{ e[i, r 1] (left subtree)

+e[r+ 1, j] (right subtree)+w(i, j) (add one depth) }ifi j

k2

0.10

k1

0.15

k4

0.10

d0

0.05

d1

0.10

k3

0.05

k5

0.20

d2

0.05

d3

0.05

d4

0.05

d5

0.10

d0

0.05

e[1,0]

= q0= 0.05

d1

0.10

e[2,1]

= q1= 0.10

e[1,1] =

p1 (0.15 for k1)+ e[1,0] (left tree)+ e[1,1] (right tree)+ one more depth

for all nodes inLeft & Right subtrees

k1

0.15

d0

0.05

d1

0.10

d0

0.05

d1

0.10

k1

0.15

d0

0.05

d1

0.10

e[1,5] = w[1, 5]+ e[1,0] (left tree)+ e[1,1] (right tree)

w[1,1] = p1

+ one more depthfor all nodesin subtrees

e[1,1] = w[1,1] + e[1,0] + e[1, 1]

k1

0.15

d0

0.05

d1

0.10

k4

0.10

k3

0.05

d2

0.05

d3

0.05

k5

0.20

d4

0.05

d5

0.10

use arrays to remember e[i, j], w[i, j] instead of recom-puting

use array root[i, j] to remember root positions

OPTIMAL -BST(p, q, n) (page 361)1 for i 1 to n + 12 do e[i, i

1]

qi

1

3 w[i, i 1] qi14 for l 1 to n5 do for i 1 to n l + 16 do j i + l 17 e[i, j] 8 w[i, j] w[i, j 1] + pj + qj9 for r i to j

10 do t e[i, r 1] + e[r+ 1, j] + w[i, j]11 ift< e[i, j]12 then e[i, j] t13 root[i, j] r14 return e, root

30

8/3/2019 ALGO Notes

32/62

Lecture 8

Greedy Algorithm

This lecture

greedy algorithms: make choice that looks best at themoment

activity selection problem

Huffman (with Python programming hints)

8.1 Activity Selection problem

Set S ofn activities

si = start time of activity i,fi = finish time ofi.assume 0 si < fi < . (finite, positive/nonzero execu-tion delay)

i, j are compatibleif fi < fj implies fi < sj. (i.e., no overlap)

Goal: find a maximal subset A of compatible activities

maximize |A| = # of activities

NOT maximize utilization!!

there many be several optimal solutions, but its

sufficient to find just one.

Greedy Solution: pick next compatible task with earliestfinish time.

Example

0 2 4 6 8 10 12 14

listoftaskssortedbyfinishtim

efi

optim

alschedules

Notation

S = {a1 . . . an} set of activities.

a0 = start, an+1 = end (fake) activities: f0 =0, sn+1 = .

Si j = {ak S : fi sk < fk sj}(i.e., set of activities compatible with ai, aj) S = S0,n+1

assumption: sorted by finish time:f0 f1 f2 fn fn+1

Claim: ifi j then Si j = /0.impossible to start later and finish earlier

31

8/3/2019 ALGO Notes

33/62


Let A = an optimal set (activity selection) for SAi,j = optimal for Si,j.

ifak Ai,j then Ai,k must be optimal solution to Si,k

Proof: Assume there exist Ai,k with more activitiesthan our Ai,k (that is |Ai,k| > |Ai,k|. we can construct a longer Ai,j by using Ai,k pre-fix. Contradiction to the original claim that A is

optimal.

Formulation:

Want to maximize c[0, n + 1] where c[i, j] = max # of compati-ble activities in Si,j.

c[i, j] = 0 ifSi,j = /0

maxi

8/3/2019 ALGO Notes

34/62

Ratios $6, $5, $4

Greedy solution: take item 1, item 2, but

item 3 wont fit get $160/30 Optimal is take item 2, item 3, leave item 1

$220/50

$6010kg0

10

20

30

40

50

kg

$10020kg

$12030kg

knapsack50kg

$:wtratio

6 5 4

Greedy is not optimal for 0-1 knapsack:

highest $:wt first, keep adding until exceedingknapsack capacity

$6010kg

$10020kg

add first

add second

$12030kg

oops! wont fit!must leave behind

result: $160 / 30kg

Optimal for 0-1:

in this case, best fit

$10020kg

$12030kg

result: $220 / 50kg

Greedy would work for fractional knapsack:take a fraction of the pink part

result: $240 / 50kg

How to solve 0-1 knapsack?

dynamic programming, in O(nW) time.

idea: iterate over total weights up to the limit

need to assume unit weight

8.3 Huffman codes

fixed-length code:

same # bits for all characters

variable length code:

more frequent use fewer bits (e.g. vowels like a ei o u vs. q z)

Goal: minimize codewordcC

frequency(c) bitlength(c)

non-lossy! preserves the same information

Comparison example: 6 characters

a b c d e f

frequency 45 13 12 16 9 5

fixed length codeword 000 001 010 011 100 101

var-length codeword 0 101 100 111 1101 1100

Why is variable length better?

fixed (3-bits per char) 100,000 chars 300,000 bits

variable length: (45 1+(13 +12+16) 3+(9+5) 4) =224, 000 bits (25% saving)

problem: how do we know where to begin each code-

word?

prefix codes: no codeword is a prefix of another code-word. e.g., only a starts with 0. So, 00 means (a a).

1 need to look at more bits:10 is prefix for either b (101) or c (100)

11 is prefix for d, e, f

Example: 001011101 uniquely parses as0(a) 0(a) 101(b) 1101(e), no ambiguity

Algorithm

HUFFMAN(C) (page 388) C is the array of nodes that carry (letter, frequency)1 n |C|2 Q C put all elements into priority queue3 for i 1 to n 14 do z MAKE NEW NODE5 left[z] x EXTRACTMIN(Q)6 right[z] y EXTRACTMIN(Q)7 f[z]

f[x] + f[y]

8 INSERT(Q,z)9 return EXTRACTMIN(Q)

Python version

http://e3.uci.edu/02f/15845/huffman.html

33

8/3/2019 ALGO Notes

35/62

14

25 30

55

100

0 1

1

1

1

0

0

0

0 1

initially: min-priority Q gets character/frequency nodes

'f': 5 'e': 9 'c': 12 'b': 13 'd': 16Q

i=1:

141. extract two nodes2. make a binary tree z with sum freq.3. insert z into Q

'c': 12 'b': 13 'a': 45Q 14

'f' 'e'

'f': 5 'e': 9

z:

i=2:25

'c' 'b'

Q 'a': 4514

'f' 'e'

25

'c' 'b'

'a': 45

'd': 16

'd': 16

i=3:

Q 'a': 4525

'c' 'b'

14

'f' 'e'

'd': 16

30

30

14

'f' 'e'

'd'

continued... when all done,

'a' : 45

'b' : 13'c' : 12 'd' : 16

'e' : 9'f' : 5

Time complexity:

|n| 1 calls, heap is O(lg n), so this is O(n lg n) for ncharacters

Correctness

Greedy property:

C= alphabet. x,y Care two chars with lowest frequen-cies. (most rarely used)

then there exist an optimal prefix code such that

the codewords for x and y have the same length and

differ only by the last bit value.

If a, b are leaves at deepest depth and have higherfrequency, then we can swap a, b with x,y and ob-tain a new tree with a lower cost of sum of freq depth.

Optimal substructure:

T is an optimal tree for alphabet C T is an optimal tree for alphabet Cif

Cand C are the same except x,y Cwhile z C, f is the same except f[z] = f[x] + f[y]

we construct T from T by replacing leaf node forz with internal node that is parent to x,y.

34

8/3/2019 ALGO Notes

36/62

Lecture 9

Basic Graph Algorithm

This lecture

Graph data structure

Graph traversal ordering definition

9.1 Graphs G = (V,E)

V: vertices. Notation: u, v V

E: edges, each connects a pair of verticesE VV; (u, v) E

Variations

undirected or directed graph (digraph)

weighted: by default, weighted edgesw : ERNotation: weight for edge (u, v) is written as w(u, v)

Also possible to have weighted vertices

Undirected Graphs

degree of a vertex: # of edges connected to the vertex

complete graph, aka clique:undirected graph where all pairs of vertices are con-

nected

bipartite graph:undirected graph whose V = V1 V2, and E V1 V2

multigraph: can have multiple edges between the samepair of vertices (including self edges)

hypergraph: an edge can connect more than two vertices

Digraphs

in-degree: # incoming edges,out-degree: # outgoing edges

path:

v0

, v1

, . . . ,vk

, where (vi, v

i+1)

E

simple path: a path where all vi are distinct

cycle: a nontrivial simple path plus (vk, v0) to close thepath

DAG: directed acyclic graph (no cycle)

strongly connected: digraph whose vertices are all reach-able from each other.

Representation

Adjacency list

Associate each vertex with a linked list of its neighborson its outgoing edges

good for sparse graphs

for weighted graph, need to store weight in linked list

Adjacency matrix

A bit matrix to indicate presence/absence of edge

weighted: can store weight in matrix

u v

y x

w

V

u

v

w

x

y

Adjacency list

v y

x w

x

y

v

u

v

w

x

y

u v w x y

1 1

1 1

1

1

1

Adjacency matrix

tradeoffs:

Adj. List: easier to grow #of vertices, good for sparse

Adj. Matrix: faster access, but O(V2) storage

35

8/3/2019 ALGO Notes

37/62

9.2 Breadth-first search

distance here means # of edges, not edge weight!e.g., start from u:

= 0: u

= 1: v, y

= 2: w, x

u v

y x

w

= 1

= 2

start

visit vertices in increasing shortest distancevisit all of distance d before visiting distance d+ 1 ver-tices

BFS order may not be unique!!

Example: BFS(G, u) can be[u, v,y, w,x], [u, v,y,x, w], [u,y, v, w,x], [u,y, v,x,y]

Algorithm

BFS(G, s)1 for each u V[G]{s}2 do color[u] WHITE3 d[u] 4 [u] NI L5 color[s] COLOR6 d[s] 07 [s] NI L8 Q /09 ENQUEUE(Q, s)

10 while Q = /011 do u DEQUEUE(Q)12 for each v Adj[u]13 do ifcolor[v] = WHITE

14 then color[v] GRAY15 d[v] d[u] + 116 [v] u17 ENQUEUE(Q, v)18 color[u] BLACK

main idea: for each vertex ofd, enqueue vertices ofd+1

avoid queuing more than once: use color

WHITE: never been enqueued

GRAY: currently in queue

BLACK: already visited

actually COLOR is somewhat redundant... enough

to test (predecessor)

Lemma 22.1: for any edge (u, v) E, (s, v) (s, u)+1 if we can reach u in edges, we can reach v in +1

edges (by taking edge (u, v))

v might actually be closer than u, but thats ok

d[v] computed by BFS satisfies d[v] (s, v) initialize to d[u] = (s, u)

base case: since we start with d[s] = 0, all those uwith (s, u) will have d[u] = (s, u) = 1

induction: assume d[u] = (s, u), consider edge(u, v).ifcolor[v] = WHITE then it has been visited beforeand d[v] < (s, u) + 1.ifcolor[v] = WHITE then it is being visited for thefirst time: d[v] = (s, u) + 1 = d[u] + 1 = (s, v).

9.3 Depth-first search

depth is not necessarily the same as distance in BFS!

discover vertices (use a stack) before we visit!

pushing vertices on stack (if we havent pushed it before)

DFS order may not be unique!

Example: DFS with tree

u

v

x y

w

z

Output when closing parenthesis: [x,z,y, v, w, u]

start

u

v

x

z

w

(u (v (x x) (y (z z) y) v) (w w) u)

1 2 3 4 5

y

6 7 8 9 10 11 12

DFS(G)1 for each v V[G]2 do color[u] WHITE3 [u] NI L4 time 05 for each u V[G]6 do ifcolor[u] = WHITE7 then DFS-VISIT(u)

DFS-VISIT(u)1 color[u] GRAY discovered u2 time time +13 d[u] time4 for each v Adj[u]5 do ifcolor[v] = WHITE6 then [v] u7 DFS-VISIT(v)8 color[u] BLACK9 f[u] time time +1

36

8/3/2019 ALGO Notes

38/62

8/3/2019 ALGO Notes

39/62

(Lemma 22.13) Component graph GT is a DAG

Notation:

Cdenote a strongly connected component

d[C] = minvCd[v] (earliest of discovery times) f[C] = maxv

C f[v] (latest of finish times)

C,C two distinct SCCs ofG

(Lemma 22.14) if edge (u, v) E and C C, thenf(C) > f(C).

Intuition: C is deeper than C and therefore fin-ishes earlier.

(Lemma 22.15 paraphrased from book!)(v, u) ET and C C, then f(C) < f(C).

Proof: (v, u) ET (u, v) Ebecause v C is deeper (finishes earlier), f(C) >> Adj[t][x, y, z]

Vertices V ofG

>>> Adj.keys()[t, z, s, x, y]

# Note: could be in another

Documents

ALGO Notes