ALGO Notes

Embed Size (px)

Citation preview

  • 8/3/2019 ALGO Notes

    1/62

    ENGRECE 235: Design and Analysis of AlgorithmsCourse Code: 15845Quarter: Fall 2003

    Professor Pai H. Chou

    Electrical Engineering and Computer Science

    September 26, 2003

  • 8/3/2019 ALGO Notes

    2/62

    Contents

    1 Introduction to Analysis and Design of Algorithms 5

    1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.1 Whats an algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.2 Classes of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.3 Example: sorting problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.4 Many sorting algorithms available: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.1 Execution model assumption: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.2 Algorithm runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.3 Algorithm design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.1 Approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.3.2 Example: merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.4 Problem complexity vs. Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.6 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2 Growth of Functions, Data structures 8

    2.1 Asymptotic Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.1 big O = asymptotic upper bound: (page 44) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.2 big = asymptotic lower bound (page 45) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 big = asymptitic tight bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.4 little-o, little- not asymptotically tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2 Polynomials, log, factorial, fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.1 Polynomial in n of degree d: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.3 Factorial: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.4 Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.5 misc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.3 Python: Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4 Data Structures Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4.1 Array implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4.2 Stack and Calls/Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4.3 Linked List x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.4.4 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.4.5 rooted trees w/ unbounded branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3 Recurrences; QuickSort 11

    3.1 Recurrence: by Substitution method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Recursion tree, convert to summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.3 The Master Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.4 QuickSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.4.1 Worst case: uneven partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.4.2 Best case: balanced partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.4.3 Average case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    1

  • 8/3/2019 ALGO Notes

    3/62

    4 Heap sort, Priority Queues 15

    4.1 Heap data structure: Binary heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.2 Heapify: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.3 BuildHeap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.3.1 Complexity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.3.2 HeapSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.4 Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.5 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.1 Heap Insert: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.5.2 Extract max: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.5.3 Heap Increase key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    5 Linear Time sorting, Medians/Order Stat 18

    5.1 Comparison-based sorting: prob. complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.2 Non-comparison sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.2.1 Counting sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.2.2 RadixSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.2.3 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.3 Median and Order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.3.1 Find min? max? both min and max? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.3.2 How to find the i-th smallest w/out sorting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.3.3 Can show that worst-case select is O(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3.4 Justification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    6 Binary Search Trees and Red-Black Trees 22

    6.1 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    6.1.1 API for BST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    6.2 Python Programming with Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    6.2.1 Node data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    6.2.2 Tuple-encoded Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    6.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.3.1 Five Properties: (important!!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.3.2 How to maintain the Red-Black Tree properties? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    6.3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    6.3.5 Tree data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    7 Dynamic Programming 27

    7.1 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    7.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    7.2 Case study: Assembly line scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    7.3 Matrix-Chain multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.3.1 optimial parenthesization to inimize # scalar mults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.4 Longest common subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    7.5 Optimal Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    8 Greedy Algorithm 31

    8.1 Activity Selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    8.2 Greedy choice property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    8.3 Huffman codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    9 Basic Graph Algorithm 35

    9.1 Graphs G = (V,E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.2 Breadth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    9.3 Depth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    9.4 Topological sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    9.5 Strongly Connected Components (SCC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    9.6 Python Programming: How to represent a graph!? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    9.6.1 Simple: Directed graph, no weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    9.6.2 Weighted graph, use dictionary of dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    2

  • 8/3/2019 ALGO Notes

    4/62

    9.6.3 Wrap data structure in a class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    10 Minimum Spanning Trees 40

    10.1 Min-spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    10.2 Kruskals algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    10.2.1 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    10.3 Prims algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    11 Single Src Shortest Paths 43

    11.1 DAG shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    11.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    11.2 D ijkstras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    11.2.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    11.2.2 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    11.2.3 Correctness proof sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    11.2.4 What about negative weights? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    11.3 B ellman-Ford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    11.3.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    11.3.2 Correctness: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    11.3.3 Complexity: O(V E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4511.4 Bellman-Ford: special case of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    11.4.1 Bellman Ford as a Constraint Solver! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    12 All Pairs Shortest Paths 47

    12.1 Special case Linear Programming (contd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    12.1.1 Correctness sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    12.2 All Pairs shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    12.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    12.2.2 Repackage this as a matrix structure: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    12.2.3 Floyd-Warshall: O(V3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.2.4 Algorithm - very simple! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    12.3 Transitive closure: E (Warshalls) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.4 Sparse graphs: Johnsons algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    12.4.1 Proof weight works: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    13 Flow Networks 50

    13.1 F low network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    13.1.1 Maxflow methods: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    13.2 FOR D-F ULKERSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    13.3 EDMONDS-K AR P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    13.4 Maximal bipartite matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    14 NP-Completeness 53

    14.1 Other Graph problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    14.1.1 Euler Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    14.1.2 Hamiltonian cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    14.1.3 Traveling Salesman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    14.1.4 Constructive vs. Decision vs. Verification Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    14.2 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    14.2.1 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    14.2.2 Decision problem in Formal language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    14.2.3 Decision in polynomial time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    14.2.4 P, NP, co-NP in terms of formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    14.3 R eduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    14.4 NP, NP-complete, NP-hard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    14.4.1 How to show L NP-Complete? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5514.4.2 Examples Circuit Satisfiability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    14.4.3 NP completeness proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    14.5 NPC Proof for (Formula) Satisfiability problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    14.5.1 3-CNF satisfiability (3-SAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    3

  • 8/3/2019 ALGO Notes

    5/62

    14.6 NPC proof contd: k-clique problem (CLIQUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    15 Approx. Algorithms 58

    15.1 k-VERTEX-COVER problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    15.1.1 Observation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    15.1.2 Approximation algorithm for Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    15.2 Hamiltonian (HAM-CYCLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    15.3 Traveling Salesman Problem (TSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5915.3.1 Approximated TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    15.3.2 General TSP: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    15.4 Subset-sum problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    15.4.1 Exact solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    15.4.2 NP Complete proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    15.4.3 polynomial-time approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    15.5 Set Covering problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    15.5.1 Greedy approximation for SET-C OVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4

  • 8/3/2019 ALGO Notes

    6/62

    Lecture 1

    Introduction to Analysis and Design of Algorithms

    Administrative

    Course syllabus: http://e3.uci.edu/03f/15845/syllabus.html

    1.1 Algorithm

    1.1.1 Whats an algorithm?

    input, output problem statement

    sequence of steps

    data structures

    must always halt and output must satisfy property

    1.1.2 Classes of algorithms

    sorting (classified by problem)

    tree traversal, graph algorithms (classified by data struc-tures / representation)

    dynamic programming (classified by technique)

    incremental, greedy, divide-&-conquer algorithm (clas-sified by decision criteria)

    Why this class?

    fundamental cross cutting applications

    analysis aspect: problem complexity, what problems areeasy/hard, how to quantify

    many solutions to the same problem

    many applications of a given solution

    1.1.3 Example: sorting problem

    input: array A = [a1, a2, . . .an];

    output: array A = [a1, a2, . . . ,a

    n] such that a

    1 a2

    . . . an

    1.1.4 Many sorting algorithms available:

    insertion sort

    See Chapter 2 Sec. 2.1 for explanation; We will revisit sorting

    later.

    idea: partition A into two: [1 . . .(j 1)] already nonde-creasing, [j . . . n] not yet sorted.

    grow this sorted subarray by one more element

    insert it into the tail of the sorted one, but need to makeit sorted

    shift everybody up by one until the nondecreasingproperty is satisfied again

    done when j

    1 = n

    5 2 4 6 1 3input A

    5 2 4 6 1 3

    sortedA[1..j-1]

    not sortedA[j .. n]

    A[j]

    5 4 6 1 3

    new A[j]

    j = 2

    j = 3

    2 5 6 1 3

    2

    4j = 4

    new A[j]

    j =52 5 64 1 3

    new A[j]

    j =62 5 64 3

    new A[j]

    1

    Output 2 5 641 3

    5

  • 8/3/2019 ALGO Notes

    7/62

    1.2 Analysis

    1.2.1 Execution model assumption:

    RAM (random access machine) as opposed to stack,queue, or FSM!

    sequential instruction execution

    alternative assumptions: parallelism, communication,synchronization

    associative memory (content-addressable)

    data structures pointers/linked list instead of copying

    1.2.2 Algorithm runtime

    measure it, but want a formula T(n) where n is problemsize

    want to factor out processor speed as a scaling factor

    worst case, best case, average case

    algorithm memory consumption (visit later)

    Chapter 3 deals with Order of growth (n2)

    Insertion sort: O(n2)

    Quick sort: O(n2) worst case, O(n lg n) avg.

    Merge sort, Heap Sort: (n lgn)

    1.3 Algorithm design

    1.3.1 Approaches:

    incremental

    divide-and-conquer (e.g., MergeSort, QuickSort)

    greedy (e.g., Min-Spanning Tree)

    backtracking

    randomization

    genetic

    1.3.2 Example: merge sort

    divide array into two halves

    sort left and sort right

    interleave the two halves

    uses Recursion

    (n lg n) complexity, where lg = log base 2

    but uses at least twice as much memory in order to keepmerge linear

    5 2 4 6 1 3input A[1:6]

    5 2 4MergeSortA[1:3] 6 1 3

    MergeSortA[4:6]

    (1)

    (2)

    MergeSortA[1:2]

    (3)5 2

    (4)

    4

    5

    MergeSort

    A[1:1]

    (5)MergeSort

    A[2:2]2

    (6) Merge(A[1:1],A[2:2])

    2 5

    MergeSortA[3:3]

    (7)

    4

    (8) Merge(A[1:2],A[3:3])

    2 4 5 1 3 6

    (9)

    1 2 3 4 5 6

    Merge(A[4:5],A[6:6])

    (15)

    (16) Merge(A[1:3], A[4:6])

    output A[1:6]

    1.4 Problem complexity vs. Algorithm Com-

    plexity

    Measures how hard the problem is

    Bounds the complexity of all algorithms for this problem

    Example: Cant do better than O(n lg n) for comparision-based sorting

    But can do better (linear time) using non-comparison-

    based algo.

    The problem itself may be hard! No efficient algorithmexists for those problems.

    NP-Complete problems:Guess an answer, verify (yes/no) in polynomial time;

    otherwise exponential or worse.

    Approximation algorithms: not minimum/optimal cost,but comes within a factor.

    1.5 Data Structures

    fixed: arrays, matrices

    growable: stacks, queues, linked list, tree, heap

    key addressable: hash table

    structures: graphs, flow networks

    1.6 Python

    Scripting language, free!

    Already installed on unix; download for any platform

    Easy, powerful, interactive

    Best way is to try it out; helps you debug algorithm

    Tutorial at http://www.python.org/doc/current/tut/tut.html

    6

  • 8/3/2019 ALGO Notes

    8/62

    Important features

    No Semicolons, no C-style { }

    Indentation is significant

    Loosely typed

    # starts a comment, unless inside string

    Interactive mode

    % python

    >>> 2 + 35

    >>> x = (2 + 3) * 4>>> x20

    >>> print "hello world"hello world

    Lists and Arrays

    >>> x = [ 1 , 2 , 3 ]>>> x[1, 2, 3]

    >>> x[0]1

    >>> x.append(20)[1, 2, 3, 20]

    >>> len(x) # get the number of elements4

    >>> x.pop() # pop from tail like a stack20

    >>> x[1, 2, 3]

    >>> x.pop(0) # pop from head like a queue1

    >>> x[2, 3]

    You can have other data types in lists and mix them:

    >>> dayofweek = [Sun,Mon,Tue,Wed]>>> dayofweek[3]Wed

    Hash Table (Dictionary)

    Q: What if we want to look up the number from the name?

    Hash tables let you use the name string as an index. How-

    ever, one big difference is that lists are ordered, whereas hash

    tables are not.

    >>> d = {} # make an empty hash table>>> d[Sun] = 0

    >>> d[Mon] = 1>>> d[Tue] = 2>>> d[Wed] = 3>>> d{Wed: 3, Tue: 2, Mon: 1, Sun: 0}>>> d[Tue]2

    >>> d.has_key(Mon)1 # this means true

    >>> d.has_key(Blah blah)0 # this means false

    >>> d.keys()[Wed, Tue, Mon, Sun]

    7

  • 8/3/2019 ALGO Notes

    9/62

    Lecture 2

    Growth of Functions, Data structures

    This lecture:

    big O, big , and big , how they grow

    fundamental data structures

    2.1 Asymptotic Notations

    purpose: express run time as T(n), n = input size

    assumption: n N= {0,1, 2, . . .} natural numbers

    2.1.1 big O = asymptotic upper bound: (page 44)

    f(n) = O(g(n)) is a set of functions that are upper-bounded by 0 f(n) c g(n) for some n n0

    f(n) grows no faster than g(n) example: 2n2 = O(n3), for c = 1, n0

    2

    funny equality, more like a membership: 2n2 O(n3) n2 and n3 are actually functions, not values.

    sloppy but convenient

    2.1.2 big = asymptotic lower bound (page 45)

    swap the place between f(n) and c g(n):0 c g(n) f(n) for some n n0

    example:

    n = (lg n), for c = 1, n0 = 16

    2.1.3 big = asymptitic tight bounds

    need two constants c1 and c2 to sandwich f(n).

    example: 12 n2 2n = (n2) for c1 = 14 , c2 = 12 , n0 = 8

    Theorem: (O and )

    g(n)

    (n)

    O(g(n))

    0

    g(n)

    (g(n))

    f(n)

    0 0

    g(n)

    g(n)

    f(n)

    (g(n))

    Example: insertion sort

    O(n2) worst case, (n) best caseT(n) = c1 n // for loop

    + c2 (n1) //key A+

    n

    j=2Insert(j) //loop to insert j+ c8 (n1) //A[i + 1] key Best case of Insert(j) is (1), Worst is (j)

    Issue: (n2) +(n) is still (n2)

    Example: MergeSort(A,p, r)

    where p, rare the lower and upper bound indexes to array A.

    if p < r thenq := (p + r)/2MergeSort(A,p, q), MergeSort(A, q + 1, r)Merge(A,p, q, r)

    T(n) =

    (1) ifn = 12 T( n2 ) +(n) ifn > 1

    T(n) is expressed in terms ofT itself! This is called recur-rence. We want a closed-form solution (no T on the right-hand-

    side of the equation).

    To solve recurrence:

    rewrite as T(n) =

    c ifn = 12 T( n2 ) + c n ifn > 1

    T(n) = 2T( n2 ) + cn Expand

    = 2(2T( n/22 ) + c n2 ) + cn= 4T( n4 ) + 2c

    n2 + cn = 4T(

    n4 ) + 2cn

    = 8T( n8 ) + 3cn = . . .

    = (2i)T( n2i ) + i cn We can divide n by 2 at most lg n times before n = 1. So,

    T(n) = (lg n) cn + cn = (n lg n).Chapter 4 will show how to solve recurrence systematically.

    2.1.4 little-o, little- not asymptotically tight

    o(g(n)) = {f(n) : c > 0n0 > 0 : 0 f(n) < cg(n) n n0}(g(n)) = {f(n) : c > 0n0 > 0 : 0 cg(n) < f(n) n n0}

    8

  • 8/3/2019 ALGO Notes

    10/62

    Example

    2n2 = O(n2) is a tight bound 2n2 = o(n2). 2n = O(n2) is not tight 2n = o(n2). does not make sense to have little-

    2.2 Polynomials, log, factorial, fibonacci

    2.2.1 Polynomial in n of degree d:

    p(n) = di=0 aini = (nd)

    Polynomially bounded: f(n) = O(nk) for constant k.

    2.2.2 logarithm

    useful identity: alogb c = clogb a.

    polylogarithmically bounded if f(n) = O(lgkn) for con-stant k

    grows slower than any polynimial: lgb

    n = o(na) for anyconstant a > 0.

    log-star: lg n = min{i 0 : lg(i) n 1}(log of log of log ... of log ofn) grows very very slowly.

    Example: lg(265536) = 5.

    2.2.3 Factorial:

    Loose bound: n! is faster than 2n but slower than nn

    precisely, n! = (nn+1/2e lg n) (Stirlings approxima-tion)

    lg(n!) = (n lg n)

    2.2.4 Fibonacci

    grows exponentially

    2.2.5 misc

    22n

    grows faster than n!

    (n + 1)! grows faster than n!

    n! grows faster than n 2n faster than 2n

    2lg n = n

    2.3 Python: Functions

    function definition: deffunctionName (paramList) :

    variables have local scope by default

    control flow: for, while, if

    range(a, b) function: returns the list [a, ... b 1]>>> range(1,5)[1, 2, 3, 4]

    Algorithm from book (p.24) Python code (fileisort.py)

    INSERTION-S ORT(A) defInsertionSort(A):1 for j 2 to length[A] for j in range(1, len(A)):2 do key A[j] key = A[j]3 i j 1 i = j 14 while i > 0 and A[i] > key while (i>= 0) and (A[i]>key5 do A[i + 1] A[i] A[i+1] = A[i]6 i

    i

    1 i = i

    1

    7 A[i + 1] key A[i+1] = key

    Dont forgetthe colon : for def, for, and while constructs

    BookA[1 . . . n] Python A[0 . . . n 1]Books 2 . . . n Python 1 . . . n 1.

    line 1: range(1,len(A)) [1...len(A)-1]

    % python

    >>> from isort import * # read file isort.py>>> x = [2,7,3,8,1] # create test case>>> InsertionSort(x) # call routine>>> x # look at result[1, 2, 3, 7, 8]

    Works for other data types, too!

    2.4 Data Structures Review

    Concepts:

    Stacks: Last-in First-out

    Queue: First-in First-out

    Trees: Root, children

    2.4.1 Array implementation

    Stack S: need a stack pointer (top)

    Push(x): S[++top] xPop(): return S[top--]

    Note: Actually stacks can grow downward also.

    Note: To be safe, check stack underflow/overflow.

    Queue Q: need 2 pointers (head, tail)

    Enqueue(x): Q[tail] xtail

    (tail mod len(Q) ) + 1

    Dequeue(): temp Q[head]head (head mod len(Q) ) + 1return temp

    Note: To be safe, check queue underflow/overflow.

    2.4.2 Stack and Calls/Recursion

    Calling routine Push local, param, return address

    Return Pop and continue

    9

  • 8/3/2019 ALGO Notes

    11/62

    Example: MergeSort (cf. MergeSort Fig. from last lecture)

    L (Left), R (Right), M (Merge)

    (1) (2) (3) (4) (5) (6) (7) (8)

    [1, 6] [1, 6]L

    [1, 3]

    [1, 6]L

    [1, 3]L

    [1, 2]

    [1, 6]L

    [1, 3]L

    [1, 2]L

    [1, 1]

    [1, 6]L

    [1, 3]L

    [1, 2]R

    [1, 6]L

    [1, 3]L

    [1, 2]R

    [2, 2]

    [1, 6]L

    [1, 3]L

    [1, 2]M

    [1, 6]L

    [1, 3]R

    [1, 6]L

    [1, 3]R

    [3, 3]

    [1, 6]L

    [1, 3]M

    [1, 6]R

    2.4.3 Linked List x

    Singly linked (next) or Doubly linked (next, prev)

    need a key

    need a NI L value

    How to implement linked lists?

    Pre-allocate arrays prev[1 : n], next[1 : n], key[1 : n], andx is an index {0 . . . n}, where 0 is NI L.

    Records (structures in C, classes in Java): x.prev,x.next, x.key, and NULL for NI L.

    Rearrange by adjusting links, not moving data (key)

    Recall: INSERTIONSORT with array need to copy objectto insert.

    Linked list representation: once we know where to in-sert, adjust link in O(1) time.

    Array version: best case O(1), worst case O(n), averagecase O( n2 ) = O(n).

    2 5 641

    2 5 641 6

    2 541 65

    2 541 64

    2 531 64

    2 5 641

    insert(3)

    3

    2 5 641

    (1) alllocate anode for thedata

    headinsert(3)

    (2) adjust links

    3

    Search for the item

    e.g., where to insert?

    list is unsorted, worst/average case O(n) time to find it!(compare keys one at a time).

    list is sorted:

    Array: binary search in O(lgn) times

    Singly linked list: O(n) avg/worst case, need to dolinear search

    Doubly linked list: does not help! still O(n) cant

    jump to the middle of the list, because link is to thehead/tail.

    2.4.4 Binary search tree

    pointer to root; each node has pointer to left-child, right-child

    optional pointer to parent

    left smaller, right bigger key value O(lgn) search, if BALANCED

    Can still be O(n) if not balanced!!

    Red-black tree (Ch.13) is a way to keep BST balanced

    a Heap (Ch.6) implements balanced BST using array; nopointers

    18

    2 6 1341

    head tail

    18 20

    6

    root

    DoublyLinkedList

    Binary(Search)Tree

    2

    1 4 13 20

    n-ary treew/ left-child,right-sibling

    18

    6

    root

    2

    1 4 13 20

    2.4.5 rooted trees w/ unbounded branches

    Could have arbitrary number of children!

    left-child: head of linked list to children (down one level)

    right-sibling: link to next on same level

    optional link back to parent

    10

  • 8/3/2019 ALGO Notes

    12/62

    Lecture 3

    Recurrences; QuickSort

    This lecture:

    Chapter 4: Recurrences, Master method

    Substitution method

    Recursion tree Master method: T(n) = aT(n/b) + f(n)

    Quicksort (Ch. 7.17.2) (if we have time today)

    Summation review (Appendix A)

    3.1 Recurrence: by Substitution method

    guess-and-verify

    rename variables if necessary.

    mathematical induction: assume true for n0, show truefor n

    n

    0.

    issue: boundary conditions, floor/ceiling use inequal-ity to bound those cases.

    avoid pitfall with constant terms.

    Example w/ floor

    T(n) =

    (1) ifn = 1,2T(n/2) + n ifn > 1.

    Guess T(n) = O(n lgn).

    sufficient to show T(n) cn lg n for some c > 0.

    Assume T(n/2) cn/2 lg(n/2).T(n) 2(cn/2 lg(n/2)) + n substitution

    cn lg(n/2) + n drop floor

    = cn lg n cnlg 2 + n (lg(a/b) = lg a lg b)

    = cn lg n cn + n (lg 2 = 1)

    cn lg n c 1

    .

    Avoid Pitfalls

    What if we guessed T(n) = O(n)??

    Try to show T(n) cn

    T(n) 2(cn/2) + n subst. into T(n) 2T(n/2) + n

    cn + n drop floor

    = (c + 1)n factor out const

    cn Fails!

    .

    must be careful not to drop constants arbitrarily.

    subtractive term in guessed solution

    Example: T(n) = T(n/2) + T(n/2) + 1

    Guess: O(n). First attempt: verify T(n) cn:

    T(n) cn/2+ cn/2+ 1

    = cn + 1 cn

    Oops! But this does not mean T(n) = O(n).

    works if we guess T(n) cn b for some b 0:

    T(n) (cn/2b) + (cn/2b) + 1

    = cn 2b + 1

    cn b works for b 1.

    11

  • 8/3/2019 ALGO Notes

    13/62

    Variable Renaming

    introduce a new variable to replace a difficult expression, make

    it easier to solveExample T(n) = 2T(n) + lgn

    Rename: let m = lgn: T(n = 2lg n = 2m) = 2T(2m2 ) + m

    Rename: let S(m) = T(2m) S(m) = 2S( m2 ) + m

    This looks like S(m) = O(m lg m)

    Substitute back, lg n = m: T(2m = 2lg n = n) = S(m)

    = O(m lg m) = O((lg n) lg(lgn)).

    3.2 Recursion tree, convert to summation

    Example: T(n) = 3T(n/4) +(n2).

    Visualize as recursion tree: T(n) = 3T(n/4) + cn2

    forsome c > 0

    cn2

    T(n/4) T(n/4) T(n/4) = c(n/4)2

    T(n/16) T(n/16) T(n/16)

    cn2

    (3/16) cn2

    T(1) T(1) T(1)

    Convert to summation:

    T(n) = cn2 + 316 cn2 + ( 316 )

    2cn2 + + ( 316 )Levels1 + width

    # Levels = 1 + log4 n, similar argument to MergeSort.root is 1 level, divide log4 n times.

    width at level i is 3i, so at the bottom level, width is3log4 n.

    Apply the identity alogb c = clogb a, rewrite 3log4 n =nlog4 3.

    T(n) =(log4 n)1i=0

    (3

    16)icn2 +(nlog4 3)

    1

    intuition: divide-and-conquer

    a-way branch (recursive calls) per level,

    each subproblem size = n/b,

    Combine cost is f(n).

    #levels = logb n

    #nodes(at level i) = ai

    #nodes(at leaf level) = alogb n = nlogb a

    Example: MERGESORT

    a = 2, b = 2, f(n) = (n) floor/ceiling dont matter

    12

  • 8/3/2019 ALGO Notes

    14/62

    Three cases, compare f(n) to nlogb a

    if f(n) = then T(n) =

    1. O(n(logb a)) (nlogb a)2. (nlogb a) (nlogb a lgn).

    3. (n(logb a+) (f(n)).and a f(n/b)

    c f(n)

    for const c < 1, n n0,Restated:

    1. if f(n) is polynomially slower than #leaves = n(logb a),then

    # leaves dominate cost; its(alogb n) = (nlogb a)

    2. f(n) (conquer) and nlogb a (divide) are equally fast,(with lg n levels) then its (nlogb a lgn).

    3. The conquer cost f(n) is dominant.

    Important note:

    Master method does not cover all recurrences!

    f(n) and nlogb a comparison must be differ polynomially(i.e., n); smaller differences dont apply.

    Actually, a more general result for case 2:

    iff(n)

    nlogb a= (lgkn) for constant k 0

    then T(n) = (nlogb a lgk+1 n)

    Examples

    T(

    n) = 9

    T(

    n/3) +

    n

    f(n) = n, a = 9, b = 3, nlogb a = nlog3 9 = (n2)

    case 1, because f(n) = O(nlog3 9) = O(n21),within a polynomial factor

    T(n) = nlogb a = (n2).

    T(n) = T(2n/3) + 1

    a = 1, b = 3/2, f(n) = 1, nlogb a = nlog3/2 1 = n0 = 1

    case 2, because f(n) = 1 = (nlogb a).

    T(n) = (n0 lg n) = (lg n)

    T(n) = 3T(n/4) + n lg n

    a = 3, b = 4, f(n) = n lg n, nlogb a = nlog4 3

    case 3: because f(n) = n lg n = (nlog4 3+)

    T(n) = (f(n)) = (n lg n)

    3.4 QuickSort

    Divide and Conquer:

    pick a pivot to serve as the dividing point

    partition array A around the pivot: one side with ele-ments smaller than pivot, one side larger

    recursively sort two sublists

    no need for merge!

    2 8 7 1 3 5 6 4Input

    A[p, r]

    pick pivot x = 4 (from A[r])

    4

    after PARTITION, want to get

    x x >xin this case:

    42, 1, 3 7, 5, 6, 8

    QUICKSORTrecursively

    QUICKSORTrecursively

    QUICKSORT (A,p, r)1 if p < r2 then q PARTITION(A,p, r)3 QUICKSORT (A,p, q1)4 QUICKSORT (A, q + 1, r)

    Algorithm from book (p.146) Python code

    PARTITION(A,p, r) defPartition(A,p,r):1 x A[r] x = A[r]2 i p1 i = p - 13 for j p to r1 for j in range(p,r):4 do ifA[j] x if(A[j]

  • 8/3/2019 ALGO Notes

    15/62

    T(n) = max0qn1

    (T(q) + T(n q 1)) +(n)

    max0qn1

    (cq2 + c(nq 1)2) +(n)

    = c max0qn1

    (q2 + (n q1)2) +(n)

    c(n 1)2

    +(n) = c(n2

    2n + 1) +(n)= cn2 c(2n 1) +(n) cn2

    3.4.2 Best case: balanced partitioning

    T(n) 2T(n/2) +(n) Can apply 2nd case of the Master theorem:

    if f(n) = (nlogb a), then T(n) = (nlogb a lg n).

    T(n) = (nlog2 2=1 lg n) = (n lg n)

    3.4.3 Average case

    (next time)

    14

  • 8/3/2019 ALGO Notes

    16/62

    Lecture 4

    Heap sort, Priority Queues

    This lecture:

    Quick sort (continued from last time)

    Heap property, Heap sort

    Python implementation (for homework)

    4.1 Heap data structure: Binary heap

    nearly balanced binary tree

    use array instead of pointer data structure

    Definition:

    parent(i) = i/2

    leftChild(i) = 2i

    rightChild(i) = 2i + 1

    heap property

    by default: Max Heap: A[parent(i)] A[i]Min Heap: A[parent(i)] A[i]

    height of heap as a binary tree is (lg n).

    x x

    (max) heap

    zs Left and Rboth z

    x

    y z

    x

    y z

    ys L and Rboth y

    1

    2, 3

    4, 5, 6, 7

    y, z both x

    index

    Examples: Are these (max)-heaps?

    [7, 5, 6,2, 3, 4]?(a) 7 5,6; (b) 5 2, 3; (c) 6 4 Yes.

    [7, 6, 5,4, 3, 2]?(a) 7 6,5; (b) 6 4, 3; (c) 5 2 Yes.

    [2,5, 7, 4,3, 6]? No. However, it is almost a heap if we dont considerthe root: (b) 5 4, 3; (c) 7 6.This can be fixed with a call to H EAPIFY.

    Heap API

    HEAPIFY(A, i) (incremental fix) assume L and R subtreesofi are heaps, make A[i . . . n] a heap.

    BUILDHEA P(A) input unsorted array, rearrange to make it a heap

    HEA PSORT (A) sorting alg., uses HEAPIFY and BUILDHEA P

    4.2 Heapify:

    Assumption: called when L(i), R(i) are heaps, but

    A[i] itself isnt

    its children

    after call: propagate the largest of children up to rootposition, so that the whole thing is a heap again.

    MAX -H EAPIFY(A, i, n) (see page 130, slightly different)1 l LEF T(i) index for left child2 r RIGHT(i) index for right child3 if(l n) and (A[l] > A[i])4 then largest l left child exists and is bigger5 else largest i either no L or root is bigger6 if(r n) and (A[r] > A[largest])7 then largest r right child exists and is bigger

    no else clause: either no R or root is bigger. already taken care of by line 5

    8 iflargest= i9 then exchange A[i] A[largest]

    10 MAX -H EAPIFY(A, largest, n)

    2

    5 7

    4 3 6

    index

    1

    2, 3

    4 7

    largest 7

    5 2

    4 3 6

    largest

    7

    5

    24 3

    6

    done

    Correctness: by induction.

    15

  • 8/3/2019 ALGO Notes

    17/62

    Time complexity of Heapify

    O(lg n) binary tree argument.

    Or, O(h) where h = height of the heap as a binary tree.

    4.3 BuildHeap

    input unsorted array, output a heap after the call

    leaf = trivial heap, start with one level above the leaf

    BUILD-M AX -H EA P(A) (page 133)1 n length[A]2 for i n/2 downto 13 do MAX -H EAPIFY(A, i, n)

    7

    2

    3 4

    5 6 7

    index

    1

    2, 3

    4 7

    7

    6

    25 3

    4done

    HEAPIFY 2

    3

    5 6 4

    2

    6 7

    5 3 4

    HEAPIFY2

    3 4

    5 6 7

    trivial heaps

    1

    2, 3

    4 7

    HEAPIFY

    Build (Max) Heap:

    7

    6

    5 3

    2

    4

    largestlargest

    Correctness: assumption for Heapify always hold.

    4.3.1 Complexity:

    loose upper bound is O( n2 lg n) = O(n lg n)MAX -H EAPIFY is O(lg n) iterate n/2 times.

    However! tighter bound is O(n).HEAPIFY near leaf level is much cheaper than near root

    level!

    Height ofn-element heap = lg n. (binary tree) # nodes at level h is n/2h+1

    Height 0 (leaf) n/20+1 = n/2 nodes.One level higher half as many nodes

    HEAPIFY called on a node at height h is O(h) (see lastpage if you forgot)

    T(n) =lg n

    h=0

    (#of HEAPIFY calls at level h)

    Time of HEAPIFY at level h

    =lg n

    h=0

    (n

    2h+1)O(h),factor out n, mult. by 2,

    = O(nlg n

    h=0

    h

    2h).

    Apply the integrating and differentiating series (Ap-pendix A)

    k=0

    kxk =x

    (1 x)2 .

    plug in x = 12 , we get

    h=0

    (h

    2h) =

    (1/2)

    (1 1/2)2= 2.

    T(n) = O(n

    2) = O(n)

    4.3.2 HeapSort

    Use HEAPIFY to find the largest, stick it in the end A[n]

    Use HEAPIFY on [1 . . . n 1] to find the second largest,stick it in position A[n 1]

    HEA PSORT(A) (page 136, slightly different)1 BUILD-M AX -H EA P(A, n) O(n) time2 for i n downto 2 n 1 times3 do exchange A[1] A[i] O(1) time4 MAX -H EAPIFY(A,1, i 1) O(lg(i 1)) time

    Total time:O(n)BUILDHEA P

    + O(n)loop O(lgn)HEAPIFY= O(n lgn)

    7 6 25 34

    input: [2, 3, 4, 5, 6, 7]

    Line 1: BUILDMAXHEAP

    Line 2: loop, i = 6heap

    Line 3: swap, A[1], A[i]762 5 34

    sortedalmost a heap(root is wrong, butchildren are OK)

    Line 4: Heapify(A, 1, 5) 756 2 34

    753 2 64

    heap again sorted

    Line 3: swap, A[1], A[i]

    Line 2: loop, i = 5

    sortedalmost a heap

    Line 4: Heapify(A, 1, 4) 735 2 64

    Line 2: loop, i = 4

    heap againsorted

    Continue until the whole array is sorted.

    4.4 Python Programming

    See http://e3.uci.edu/01f/15545/heap-hint.html; or for PDF, tryhttp://e3.uci.edu/01f/15545/heap-hint.pdf

    4.5 Priority Queues

    want to get the highest priority element,

    MAX HEA PINSERT(A,x)

    EXTRACTMAX(A)

    16

  • 8/3/2019 ALGO Notes

    18/62

    Implementation choices:

    unsorted: O(n) to extract, O(1) to insert (anywhere)

    sorted linked list: O(1) to extract, O(n) to insert

    Heap: O(lg n) to insert , O(lg n) to extract

    4.5.1 Heap Insert:

    assume A is already a heap

    increase size ofA by one, put new element at the end, andpropagate up until its parent is larger (or its the largest)

    MAX -H EA P-I NSERT(A, key) (diff. from book page 140)1 heap-size[A] heap-size[A] + 12 i heap-size[A]3 while i > 1 and A[Parent(i)] < key4 A[i] A[Parent(i)]5 i Parent(i)6 A[i] key

    7

    7

    6

    5 3 2

    key = 87

    6

    5 3 2

    key = 87

    6 4

    5 3 2

    make room

    MaxHeapInsert(8)

    44

    4 7

    8

    6

    5 3 2

    key = 8

    4

    O(lg n) time (climb up from tail to at most root)

    assume short circuit evaluation (i.e., check index, ifi > 1then A[parent(i)] is not even evaluated!

    Even though the algorithm saysheap-size[A] heap-size[A] + 1,you probably cant do that in any programming lan-

    guage. You need to actually make room for A[n + 1], orelse your program can crash. In Python, call A.append

    to increase physical size ofA.

    in Python, heap-size[A] can be derived from len(A),rather than a global variable.

    4.5.2 Extract max:

    pull the max (already at the root)

    move the last one to the front and heapify (since bothchildren are heaps)

    in Python, use A.pop to accomplish heap-size[A] heap-size[A] 1.

    HEA P-E XTRACTMAX(A)1 if(heap-size[A] < 1)2 then error heap underflow;

    3 max

    A[1]

    4 A[1] A[heap-size[A]]5 heap-size[A] heap-size[A] 16 MAX -H EAPIFY(A, 1)7 return max

    4.5.3 Heap Increase key

    HEA P-I NCREASE-K EY (A, i, key)

    A is a heap, i is an index to an element.

    Purpose: overwrite element A[i] with key (which is ex-

    pected to be larger than A[i] hence increase key),while making minor changes to restore its heap property.

    Strategy: propagate A[i] up towards the root, similar toMAX -H EA P-I NSERT.

    HEA P-I NCREASE-K EY (A, i, key) (page 140)1 ifkey < A[i]2 then error new key is smaller than current key

    3 A[i] key4 while i > 1 and A[Parent(i)] < A[i]5 do exchange A[i] A[Parent(i)]6 i Parent(i)

    not a common routine in priority queues; used later inmin-spanning tree algorithms

    very unusual to implement MAX -H EA P-I NSERT asshown on page 140.

    implies a max heap. (HEA P-D ECREASE-K EY would im-ply a min heap.)

    timing complexity similar to HEA PINSERT except no re-allocation of storage size

    17

  • 8/3/2019 ALGO Notes

    19/62

    Lecture 5

    Linear Time sorting, Medians/Order Stat

    This lecture

    problem complexity of comparison-based sorting (8.1)

    noncomparison sorting: counting sort (8.2), radix sort(8.3), bucket sort (8.4)

    5.1 Comparison-based sorting: prob. complex-

    ity

    Decision Tree argument

    Use a tree to represent all possible execution traces of analgorithm (Fig. 8.1)

    1:2a[1] a[2]

    2 : 3

    a[1] a[2] a[3]

    a[2] a[3]

    1 : 3

    a[1] a[3] < a[2]

    a[1] a[2]a[3] < a[2]

    a[1] a[3]

    a[3] < a[1] a[2]

    a[3] < a[1]

    1 : 3

    2 : 3

    >

    >

    a[1] > a[2]

    two-way branch on >,

  • 8/3/2019 ALGO Notes

    20/62

    C[1..k] key count; overloaded to be index

    Initialize C[1..k] 0foreach A[i]

    do C[A[i]]++foreach i 2 to k

    do C[i]+ = C[i

    1]

    computes Offset[i] := Offset[i 1] +C[i] converts C from count to index to the end of its region

    for i n downto 1 doB[Offset[i] ] A[i]

    2 5 3 0 2 3 0 3A

    input

    count #times each key appears in input

    2 2 3 1

    0 1 2 3 4 5

    C

    outputlayout

    2 530

    2 2 3 1

    index to endof region

    C2 4 7 8

    Copy fromA[i] toB[C[i]]in reverseorder

    B 3

    30

    30 3

    2 5 3 0 2 3 0 3A

    30 32

    30 320

    30 320 3

    30 320 3 5

    30 320 3 52

    6

    1

    5

    3

    0

    4

    7

    2

    stable sort (relative positions preserved)

    uses 2 buffer

    complexity = O(n + k), effectively linear ifk= O(n).

    5.2.2 RadixSort

    Idea: For keys that have digits

    use stable sorting to sort by one digit at a time

    start from least significant digit!

    3 2 9

    4 5 7

    6 5 7

    8 3 9

    4 3 6

    7 2 0

    3 5 5

    input

    Stable-sort byleast significantdigit first

    7 2 0

    3 5 5

    4 3 6

    4 5 7

    6 5 7

    3 2 9

    8 3 9

    Stable-sort by2nd LSD

    7 2 0

    3 2 9

    4 3 6

    8 3 9

    3 5 5

    4 5 7

    6 5 7

    Stable-sortby MSD

    3 2 9

    3 5 5

    4 3 6

    4 5 7

    6 5 7

    7 2 0

    8 3 9

    Important!! does not work if you start from MSD!!!

    3 2 9

    4 5 7

    6 5 7

    8 3 9

    4 3 6

    7 2 0

    3 5 5

    input

    Stable-sort byMSD first

    Stable-sort by2nd MSD

    7 2 0

    3 2 9

    4 3 6

    8 3 9

    3 5 5

    4 5 7

    6 5 7

    Stable-sortby MSD

    3 2 9

    3 5 5

    4 3 6

    4 5 7

    6 5 7

    7 2 0

    8 3 9

    3 2 9

    3 5 5

    4 5 7

    4 3 6

    6 5 7

    7 2 0

    8 3 9

    Oops! Not sorted

    Need to do a lot of bookkeeping if really want to sort from

    MSD.

    3 2 9

    4 5 7

    6 5 7

    8 3 9

    4 3 6

    7 2 0

    3 5 5

    input

    Stable-sort byMSD first

    Stable-sort by2nd MSD

    7 2 0

    3 2 9

    4 3 6

    8 3 9

    3 5 5

    4 5 7

    6 5 7

    Stable-sortby MSD

    3 2 9

    3 5 5

    4 3 6

    4 5 7

    6 5 7

    7 2 0

    8 3 9

    3 2 9

    3 5 5

    4 5 7

    4 3 6

    6 5 7

    7 2 0

    8 3 9

    also useful for sorting by multiple keys by name, bySAT score, etc. start from the secondary before the

    primary key

    runtime complexity: (d(n + k)), whered = # of digits in key,k= radix (# possible values per digit)

    reason: make d passes. Each pass is (n + k) (e.g.,counting sort)

    5.2.3 Bucket sort

    assume values are distributed within some range.

    project the value into n buckets, (insertion-) sort eachbucket

    input7 8 1 7 3 9 2 6 7 2 9 4 2 1 1 2 2 3 6 8

    bucket#

    7 1 3 2 7 9 2 1 2 6

    1

    2

    3

    9

    12 17

    21 2623

    .

    .

    .

    39

    94

    0

    Timing Complexity: expected to be linear (to be revisited)

    19

  • 8/3/2019 ALGO Notes

    21/62

    5.3 Median and Order statistics

    5.3.1 Find min? max? both min and max?

    n 1 comparisons for min (keep the smallest so far)

    also n 1 comparisons for max

    both min/max: 2n2but could do better: 3n/2 comparisons

    1. compare pairs (n/2 comparisons)2. search for min in the smaller set

    3. search for max in the bigger set

    3 7 5 8 6 1 9 4input

    3

    7

    5

    8 6

    1

    9

    4smaller

    bigger

    n/2 for pairwise compare

    n/2 1 for min

    n/2 1 for max

    1

    9

    5.3.2 How to find the i-th smallest w/out sorting?

    keep track of min & max so far

    find the median

    Use QuickSorts partitioning:each pass: pivot is in the correct rank (absolute sorted

    position)

    if pivot is the position we want, return itotherwise, continue partitioning in either one of the two

    intervals.

    RANDOMIZED -S ELECT(A,p, r, i)1 if p = r2 then return A[p]3 q RANDOMIZED -PARTITION(A,p, r)4 k qp + 15 ifi = k the pivot value is the answer6 then return A[q]7 else ifi < k8 then return RANDOMIZED -S ELECT(A,p, q1, i)9 else return RANDOMIZED -S ELECT(A, q + 1, r, i k)

    Complexity of (randomized) Select:

    Best case: O(n) first pass, very lucky, pivot is what wewant

    Even partitioning: T(n) + T(n/2) + T(n/4) + ... +T(2) + T(1) = T(2n) = O(n)

    Worst case: T(n) + T(n 1) + T(n 2).. = O(n2)

    Average time: use Randomized1/n probability of each position i as pivot. expected cost

    = sum up probability cost ofi as pivot

    T(n) =n1i=1

    1

    nT(max(i, n i)) + O(n)

    =n1

    i=

    n/2

    2

    nT(i) + O(n)

    Guess T(n) an, Verify

    T(n) 2n

    n1

    i=n/2(a i) + cn

    2n

    n

    2

    (a(n 1) + an/22

    + cn =3

    4an + cn.

    Choose a = 4c O(4n) = O(n)

    5.3.3 Can show that worst-case select is O(n)

    (Theoretical interest only)

    Divide n elements into sublists of LENGTH 5 each, forn/5 lists

    Find the medians from these lists. Brute force (constanttime) on each list.

    Recurse to find the median x of these n/5 medians. Partition around x. Let k= rank(x)

    if(i = k) return x

    if (i < k) then Select recursively to find ith smallest infirst part

    else Select recursively to find (i

    k)th smallest in second

    part.

    5.3.4 Justification:

    (x is dont know, s = smaller, L = larger)x x x L L L L

    x x x L L L L

    s s s * L L L

    s s s s x x x

    s s s s x x x

    pivot

    example: for the list [2, 3, 5, 5, 7, 8]

    a binary search tree:

    5

    3

    2 5

    7

    8

    another valid BST for the

    same list:

    5

    3

    2

    5

    7

    8

    6.1.1 API for BST

    INORDER-T RE E-WAL K(x): print keys in BST in sortedorder

    TRE E-S EARCH(x, k): find tree node with key k.

    TRE E-M AX(x) and TRE E-M IN(x), assume x is root

    TRE E-S UCCESSOR(x) and TRE E-P REDECESSOR(x),for any tree node (need not be root)

    TRE E-I NSERT(T,z) and TRE E-D ELETE(T,z)

    INORDER-T RE E-WAL K(x)1 ifx = NI L2 then INORDER -T RE E-WAL K(l e f t [x])3 print key[x]4 INORDER -T RE E-WAL K(right[x])

    Note: variants of tree visits (binary trees in general, not

    necessarily BST)

    inorder = recursive left, root, recursive right

    preorder = root, recursive left, recursive right

    postorder = recursive left, recursive right, root

    Timing complexity of INORDER-T RE E-WAL K:

    (n) time

    Assume root x, left-subtree[x] has knodes right-subtree[x] has n k 1 nodes.assume each call incurs overhead d

    Proof by substitution method: into T(n) = (c + d)n + c

    T(n) = T(k) + T(n

    k

    1) + d

    = ((c + d)k+ c)+((c + d)(n k1) + c) + d= (c + d)n + c.

    TRE E-S EARCH(x, k)1 ifx =NI L or k= key[x]2 then return x

    3 ifk< key[x]4 then return TRE E-S EARCH(l e f t [x], k)5 else return TRE E-S EARCH(right[x], k)

    returns x if found (key[x] matches k), NI L if not found

    22

  • 8/3/2019 ALGO Notes

    24/62

  • 8/3/2019 ALGO Notes

    25/62

    if z has two children find and cut successor y, pasteover z

    successor y must be either leaf or has one child; it

    cant have two children.

    Delete(5)

    5 has 2 children

    15

    5

    3

    13

    7

    16

    12

    10

    6

    Step 1: Cut 5s successor = 6(by splicing.)(in case of leaf, just cut)

    15

    5

    3

    13

    7

    16

    12

    10

    6

    Step 2: Paste 6 over 5

    15

    3

    13

    7

    16

    12

    10

    6

    Equivalently, could find predecessor x, cut x and pasteover z.

    Rotation:

    Transform a BST while still preserveing BST property

    x

    y x

    y

    LEFT-ROTATE(x)

    RIGHT-ROTATE(y)

    Right rotate:

    x is left child ofy, want to make y the right child of

    x.

    x gets to keep its left child as before. xs old right tree (which all have values x >> x = Node(3) # makes a new n ode w ith key =

    >>> x.key

    3

    Note that calling Node(3) actually calls Node classsinit (constructor) routine.

    Use x.key, x.left instead ofkey[x], left[x], etc.

    We can insert additional fields even after the constructoris called!

    example: x.color = red # add a new color at-

    tribute

    Example routine:

    def TreeMaximum(x):

    while x.right != None:

    x = x.right

    return x

    6.2.2 Tuple-encoded Trees

    Tuples vs Lists:

    list: L = [1, 2, a, b, c ]

    tuple: T = (1, 2, a, b, c)

    Similarities:

    Both tuples and lists let you access elements by their in-dices: e.g. L[3] and T[3] both yield the value ofb.

    len works for both. Both len(L) and len(T) yield5.

    Differences:

    syntax: tuples use ( ); lists use [ ]

    tuples are immutable (read only)not allowed to say T[3] = z

    OK to do lists L[3] = z

    tuples can be compared lexigraphically:(1, 2) < (1, 2, 3) < (1, 3)

    Can have tuples within tuples:(1, (2, 3, (4, 5)), 6)

    24

  • 8/3/2019 ALGO Notes

    26/62

  • 8/3/2019 ALGO Notes

    27/62

  • 8/3/2019 ALGO Notes

    28/62

    Lecture 7

    Dynamic Programming

    This lecture: Dynamic programming

    Optimization problems and Dynamic programming

    Assembly line scheduling

    matrix multiply

    longest common subsequencce

    optimal binary search tree

    7.1 Optimization problems

    many possible solutions with different costs

    want to maximize or minize some cost function

    unlike sorting its sorted or not sorted.. partially sorteddoesnt quite count.

    examples: matrix-chain multiply (same results, justfaster or slower)

    knap-sack problem (thief filling up a sack), compression

    7.1.1 Dynamic programming

    programming here means tabular method

    Instead of re-computing the same subproblem, save re-sults in a table and look up (in constant-time!)

    significance: convert an otherwise exponential/factorial-time problem to a polynomial-time one!

    Problem characteristic: Recursively decomposable

    Search space: a lot of repeated sub-configurations

    optimal substructure in solution

    7.2 Case study: Assembly line scheduling

    two assembly lines 1 and 2

    each line i has n stations Si,j for n stages of assembly

    process

    each station takes time ai,j

    chassis at stage j must travel stage (j + 1) next

    option to stay in same assembly line or switch to

    the other assembly line

    time overhead ofti,j if decided to switch to line to

    go to Si,j.

    assemblyline 1

    assemblyline 2

    stage 1 stage 2 stage 3 stage 4

    2

    2

    1

    3

    2

    1

    stage 5 stage 6

    2

    3

    1

    4

    42

    32

    7 9

    5

    3

    6

    4

    4

    8

    5

    4

    75

    Want to minimize time for assembly

    Line-1 only: 2 + 7 + 9 + 3 + 4 + 8 + 4 + 3 = 40

    Line-2 only: 4 + 8 + 5 + 6 + 4 + 5 + 7 + 2 = 41

    Optimal: 2 + 7 + (2) + 5 + (1) + 3 + (1) + 4 + 5 +(1) + 4 + 3 = 38

    How many possible paths? 2n (two choices each stage)

    Optimal substructure

    Global optimal contains optimal solutions to subprob-lems

    Fastest way through any station Si,j must consist of

    shortest path from beginning to Si,j

    shortest path from Si,j to the end

    That is, cannot take a longer path to Si,j and make

    up for it in stage (j + 1) . . . n.

    27

  • 8/3/2019 ALGO Notes

    29/62

    Notation: fi[j] = fastest possible time from start throughstation Si,j (but not continue)

    ei,xi are entry/exit costs on line iGoal is to find f global optimal

    initially, at stage 1 (for line l = 1 or 2),fl[1] = el (entry time) + al,1 (assembly time)

    at any stage j > 1, line l, (and m donotes the other line)

    fl[j] = min

    fl [j 1] same line l

    fm[j 1] + tm,(j1) other line m + transfer

    + al,j (assembly time at station Sl,j)

    Can write this as a recursive program:

    F(i, j)if j = 1

    then return ei + ai,1else return min(F(i, j 1),

    F(i%2 + 1, j 1) + ti%2+1,j1)+ai,j

    But! There are several problems:

    many repeat evaluation of F(i, j), could be O(2n) time use a 2-D array f[i, j] to remember the running mini-mum

    does not track the path use array li[j] to remember which path gave us thismin

    Iterative version shown in book on p. 329.

    Run time is (n).

    7.3 Matrix-Chain multiply

    Basic Matrix Multiply

    A is p q, B is q rq

    pq

    r

    p

    r

    A B AB

    product is p

    rmatrix: ci,j =

    y=1...qai,y

    by,j

    total number of scalar multiplications = p q r

    Multiply multiple matrices

    matrix multiplication is associative: (AB)C= A(BC)Both yield p s matrix

    Total # multiplications can be different! (added)

    (AB)Cis

    pqr to multiply AB first,

    + prs to multiply (AB)(p r) w/C(r s)= pqr+ prs total # multiplications

    On the other hand, A(BC) is pqs + qrs

    Example: if p = 10, q = 100, r= 5, s = 50, then

    pqr+ prs = 5000 + 2500 = 7500

    pqs + qrs = 50000 + 25000 = 75000 ten times asmany!

    Generalize to matrix chain: A1,A2,A3 . . .AnBut there are many ways!

    P(1) = 1 (A) nothing to multiplyP(2) = 1 (AB)P(3) = 2 A(BC), (AB)CP(4) = 5 A(B(CD)),A((BC)D), (AB)(CD), (A(BC))D,((AB)CP(5) = 14 . . . Exponential growth

    P(n) =

    1 ifn = 1,

    n1k=1 P(k) P(nk) for n 2(4n/n3/2) at least exponential!

    7.3.1 optimial parenthesization to inimize # scalar

    mults

    ((A1 A2)

    A1...2

    A3) (A4 A5)

    A4...5

    1..k k+1..n

    A1...5

    A1...3

    Notation:

    let Ai...j denote matrix product AiAi+1 Aj matrix Ai has dimension pi1 pi

    Optimal substructure

    if optimal parenthesization for A1...j at the top level is(L)(R) = (A1...k)(Ak+1...j), then

    L must be optimal for A1...k, and

    R must be optimal for Ak+1...j

    Proof by contradiction

    Let M(i, j) = Minimum cost from the ith to the jth matrix

    M(i, j) =

    0 ifi = j,

    minik

  • 8/3/2019 ALGO Notes

    30/62

    Observation

    dont enumerate the space!bottom up no need to take the min so many times!

    instead of recomputing M(i, k), remember it in arraym[i, k]

    book keeping to track optimal partitioning point. SeeFig.7.1

    O(n3) time, (n2) space (for m and for s arrays)

    7.4 Longest common subsequence

    Example sequence X = A, B, C, B, D, A, B,Y = B , D , C , A , B , A

    a subsequence ofX is Z= B, C, D, B

    Longest common subsequence (LCS) of length 4:

    B , C , B , A, B , D , A , B

    This is a maximization, also over addition, but add costby 1 (length increment)

    Brute force:

    enumerate all subsequences ofx (length m), check if itsa subsequence ofy (length n)

    #of subsequences of x = 2m (binary decision at eachpoint whether to include each letter)

    worst case time is (n

    2m) because for each one check

    against ys length = n

    Better way:

    Notation: Xk = length-kprefix of string Xxi is i

    th character in string X

    Z = z1 . . .zk is an LCS of X = x1 . . .xm and Y =y1 . . .yn

    ifxm = yn then zk = xm = yn, andZk1 is an LCS of Xm1,Yn1.

    ifxm= yn then

    ifzk = xm then Z is LCS ofXm1,Y ifzk = yn then Z is LCS ofX,Yn1.

    c[i, j] = LCS length ofXi,Yj

    c[i, j] =

    c[i 1, j 1] + 1 ifx[i] = x[j](match)max

    c[i, j 1]c[i 1, j]

    (no match, advance either)

    Algorithm

    c[1 : m,0] 0; c[0,1 : n] 0for i 1 to m

    do for j 1 to ndo if(xi = yj)

    then c[i, j] c[i 1, j 1] + 1b[i, j] match ()

    else if(c[i 1, j] c[i, j 1])then c[i, j] c[i 1, j] copy the longer length

    b[i, j] dec i ()else c[i, j 1] > c[i 1, j]

    c[i, j] c[i 1, j]b[i, j] dec j ()

    0

    0 0 0 0 0 0 0

    j 0 1 2 3 4 5 n =6i

    0

    1

    2

    3

    4

    5

    6

    m=7

    A

    B

    C

    B

    D

    A

    B

    B D C A B A

    0 0 0 0 0 0 0

    0

    0

    0

    0

    0

    0

    0

    i=1 A 0

    B D C A B A

    0

    0 0 0 1 11

    i=2 B

    B D C A B A

    1 1 1 1 22

    i=3 C 0

    B D C A B A

    1 1 2 2 22

    i=4 B 0

    B D C A B A

    1 1 2 2 33

    i=5 D 0

    B D C A B A

    1 2 2 2 33

    i=6 A 0

    B D C A B A

    1 2 2 3 43

    i=7 B 0

    B D C A B A

    1 2 2 3 44

    Time (mn), Space (mn).

    7.5 Optimal Binary Search Trees

    input: n keys K= k1, k2, . . . ,knn + 1 dummy keys D = d0, d1, . . . ,dn

    d0 < k1 < d1 < k2 < d2 < ... < kn < dn

    key ki has probability pi, anddummy key di has probability qi, and

    n

    i=1

    pi +n

    i=0

    qi = 1

    want: Binary tree that yields fastest search: (fewer steps)for frequently used words

    ki keys should be internal nodes, and di dummy keysshould be leaves.

    29

  • 8/3/2019 ALGO Notes

    31/62

    MATRIX-CHAIN-O RDER(p)1 n length[p] 12 for i 1 to n3 do m[i, i] 04 for l 2 to n: l = length of interval considered5 do for i 1 to (n l) + 1:

    starting index, from 1 up to n

    length for each length

    6 do j i + l 1 ending index, always length away from the starting index

    7 m[i, j] 8 for k i to j 1:

    different partitions between i and j9 do q m[i, k] + m[k+ 1, j] + pi1pkpj.

    10 if(q < m[i, j]):11 then m[i, j] q12 s[i, j] k remember best k between i, j13 return m, s

    Ai...jlength

    l

    i = startingindexof chain

    n

    n l + 1

    j = i + l 1= ending

    indexof chain

    1

    kfromitoj 1

    chain length lfrom 2 ton

    starting index ifrom 1 ton (implies ending indexj

    Figure 7.1: Matrix-Chain-Order Algorithm and graphical illustration.

    optimize for common case. Balanced tree might not begood!

    Example tree (not optimal):

    k2

    0.10

    k1

    0.15

    k4

    0.10

    d0

    0.05

    d1

    0.10

    k3

    0.05

    k5

    0.20

    d2

    0.05

    d3

    0.05

    d4

    0.05

    d5

    0.10

    Expected search cost of tree T

    Optimal substructure: if root=kr, L = (i . . . r 1), R =(r+ 1 . . . j) L,R must be optimal subtrees.

    Expected cost e[i, j]

    e[i, j] =

    qi1 (a dummy leaf) if j = i 1minirj{ e[i, r 1] (left subtree)

    +e[r+ 1, j] (right subtree)+w(i, j) (add one depth) }ifi j

    k2

    0.10

    k1

    0.15

    k4

    0.10

    d0

    0.05

    d1

    0.10

    k3

    0.05

    k5

    0.20

    d2

    0.05

    d3

    0.05

    d4

    0.05

    d5

    0.10

    d0

    0.05

    e[1,0]

    = q0= 0.05

    d1

    0.10

    e[2,1]

    = q1= 0.10

    e[1,1] =

    p1 (0.15 for k1)+ e[1,0] (left tree)+ e[1,1] (right tree)+ one more depth

    for all nodes inLeft & Right subtrees

    k1

    0.15

    d0

    0.05

    d1

    0.10

    d0

    0.05

    d1

    0.10

    k1

    0.15

    d0

    0.05

    d1

    0.10

    e[1,5] = w[1, 5]+ e[1,0] (left tree)+ e[1,1] (right tree)

    w[1,1] = p1

    + one more depthfor all nodesin subtrees

    e[1,1] = w[1,1] + e[1,0] + e[1, 1]

    k1

    0.15

    d0

    0.05

    d1

    0.10

    k4

    0.10

    k3

    0.05

    d2

    0.05

    d3

    0.05

    k5

    0.20

    d4

    0.05

    d5

    0.10

    use arrays to remember e[i, j], w[i, j] instead of recom-puting

    use array root[i, j] to remember root positions

    OPTIMAL -BST(p, q, n) (page 361)1 for i 1 to n + 12 do e[i, i

    1]

    qi

    1

    3 w[i, i 1] qi14 for l 1 to n5 do for i 1 to n l + 16 do j i + l 17 e[i, j] 8 w[i, j] w[i, j 1] + pj + qj9 for r i to j

    10 do t e[i, r 1] + e[r+ 1, j] + w[i, j]11 ift< e[i, j]12 then e[i, j] t13 root[i, j] r14 return e, root

    30

  • 8/3/2019 ALGO Notes

    32/62

    Lecture 8

    Greedy Algorithm

    This lecture

    greedy algorithms: make choice that looks best at themoment

    activity selection problem

    Huffman (with Python programming hints)

    8.1 Activity Selection problem

    Set S ofn activities

    si = start time of activity i,fi = finish time ofi.assume 0 si < fi < . (finite, positive/nonzero execu-tion delay)

    i, j are compatibleif fi < fj implies fi < sj. (i.e., no overlap)

    Goal: find a maximal subset A of compatible activities

    maximize |A| = # of activities

    NOT maximize utilization!!

    there many be several optimal solutions, but its

    sufficient to find just one.

    Greedy Solution: pick next compatible task with earliestfinish time.

    Example

    0 2 4 6 8 10 12 14

    listoftaskssortedbyfinishtim

    efi

    optim

    alschedules

    Notation

    S = {a1 . . . an} set of activities.

    a0 = start, an+1 = end (fake) activities: f0 =0, sn+1 = .

    Si j = {ak S : fi sk < fk sj}(i.e., set of activities compatible with ai, aj) S = S0,n+1

    assumption: sorted by finish time:f0 f1 f2 fn fn+1

    Claim: ifi j then Si j = /0.impossible to start later and finish earlier

    31

  • 8/3/2019 ALGO Notes

    33/62

    Optimal substructure

    Let A = an optimal set (activity selection) for SAi,j = optimal for Si,j.

    ifak Ai,j then Ai,k must be optimal solution to Si,k

    Proof: Assume there exist Ai,k with more activitiesthan our Ai,k (that is |Ai,k| > |Ai,k|. we can construct a longer Ai,j by using Ai,k pre-fix. Contradiction to the original claim that A is

    optimal.

    Formulation:

    Want to maximize c[0, n + 1] where c[i, j] = max # of compati-ble activities in Si,j.

    c[i, j] = 0 ifSi,j = /0

    maxi

  • 8/3/2019 ALGO Notes

    34/62

    Ratios $6, $5, $4

    Greedy solution: take item 1, item 2, but

    item 3 wont fit get $160/30 Optimal is take item 2, item 3, leave item 1

    $220/50

    $6010kg0

    10

    20

    30

    40

    50

    kg

    $10020kg

    $12030kg

    knapsack50kg

    $:wtratio

    6 5 4

    Greedy is not optimal for 0-1 knapsack:

    highest $:wt first, keep adding until exceedingknapsack capacity

    $6010kg

    $10020kg

    add first

    add second

    $12030kg

    oops! wont fit!must leave behind

    result: $160 / 30kg

    Optimal for 0-1:

    in this case, best fit

    $10020kg

    $12030kg

    result: $220 / 50kg

    Greedy would work for fractional knapsack:take a fraction of the pink part

    result: $240 / 50kg

    How to solve 0-1 knapsack?

    dynamic programming, in O(nW) time.

    idea: iterate over total weights up to the limit

    need to assume unit weight

    8.3 Huffman codes

    fixed-length code:

    same # bits for all characters

    variable length code:

    more frequent use fewer bits (e.g. vowels like a ei o u vs. q z)

    Goal: minimize codewordcC

    frequency(c) bitlength(c)

    non-lossy! preserves the same information

    Comparison example: 6 characters

    a b c d e f

    frequency 45 13 12 16 9 5

    fixed length codeword 000 001 010 011 100 101

    var-length codeword 0 101 100 111 1101 1100

    Why is variable length better?

    fixed (3-bits per char) 100,000 chars 300,000 bits

    variable length: (45 1+(13 +12+16) 3+(9+5) 4) =224, 000 bits (25% saving)

    problem: how do we know where to begin each code-

    word?

    prefix codes: no codeword is a prefix of another code-word. e.g., only a starts with 0. So, 00 means (a a).

    1 need to look at more bits:10 is prefix for either b (101) or c (100)

    11 is prefix for d, e, f

    Example: 001011101 uniquely parses as0(a) 0(a) 101(b) 1101(e), no ambiguity

    Algorithm

    HUFFMAN(C) (page 388) C is the array of nodes that carry (letter, frequency)1 n |C|2 Q C put all elements into priority queue3 for i 1 to n 14 do z MAKE NEW NODE5 left[z] x EXTRACTMIN(Q)6 right[z] y EXTRACTMIN(Q)7 f[z]

    f[x] + f[y]

    8 INSERT(Q,z)9 return EXTRACTMIN(Q)

    Python version

    http://e3.uci.edu/02f/15845/huffman.html

    33

  • 8/3/2019 ALGO Notes

    35/62

    14

    25 30

    55

    100

    0 1

    1

    1

    1

    0

    0

    0

    0 1

    initially: min-priority Q gets character/frequency nodes

    'f': 5 'e': 9 'c': 12 'b': 13 'd': 16Q

    i=1:

    141. extract two nodes2. make a binary tree z with sum freq.3. insert z into Q

    'c': 12 'b': 13 'a': 45Q 14

    'f' 'e'

    'f': 5 'e': 9

    z:

    i=2:25

    'c' 'b'

    Q 'a': 4514

    'f' 'e'

    25

    'c' 'b'

    'a': 45

    'd': 16

    'd': 16

    i=3:

    Q 'a': 4525

    'c' 'b'

    14

    'f' 'e'

    'd': 16

    30

    30

    14

    'f' 'e'

    'd'

    continued... when all done,

    'a' : 45

    'b' : 13'c' : 12 'd' : 16

    'e' : 9'f' : 5

    Time complexity:

    |n| 1 calls, heap is O(lg n), so this is O(n lg n) for ncharacters

    Correctness

    Greedy property:

    C= alphabet. x,y Care two chars with lowest frequen-cies. (most rarely used)

    then there exist an optimal prefix code such that

    the codewords for x and y have the same length and

    differ only by the last bit value.

    If a, b are leaves at deepest depth and have higherfrequency, then we can swap a, b with x,y and ob-tain a new tree with a lower cost of sum of freq depth.

    Optimal substructure:

    T is an optimal tree for alphabet C T is an optimal tree for alphabet Cif

    Cand C are the same except x,y Cwhile z C, f is the same except f[z] = f[x] + f[y]

    we construct T from T by replacing leaf node forz with internal node that is parent to x,y.

    34

  • 8/3/2019 ALGO Notes

    36/62

    Lecture 9

    Basic Graph Algorithm

    This lecture

    Graph data structure

    Graph traversal ordering definition

    9.1 Graphs G = (V,E)

    V: vertices. Notation: u, v V

    E: edges, each connects a pair of verticesE VV; (u, v) E

    Variations

    undirected or directed graph (digraph)

    weighted: by default, weighted edgesw : ERNotation: weight for edge (u, v) is written as w(u, v)

    Also possible to have weighted vertices

    Undirected Graphs

    degree of a vertex: # of edges connected to the vertex

    complete graph, aka clique:undirected graph where all pairs of vertices are con-

    nected

    bipartite graph:undirected graph whose V = V1 V2, and E V1 V2

    multigraph: can have multiple edges between the samepair of vertices (including self edges)

    hypergraph: an edge can connect more than two vertices

    Digraphs

    in-degree: # incoming edges,out-degree: # outgoing edges

    path:

    v0

    , v1

    , . . . ,vk

    , where (vi, v

    i+1)

    E

    simple path: a path where all vi are distinct

    cycle: a nontrivial simple path plus (vk, v0) to close thepath

    DAG: directed acyclic graph (no cycle)

    strongly connected: digraph whose vertices are all reach-able from each other.

    Representation

    Adjacency list

    Associate each vertex with a linked list of its neighborson its outgoing edges

    good for sparse graphs

    for weighted graph, need to store weight in linked list

    Adjacency matrix

    A bit matrix to indicate presence/absence of edge

    weighted: can store weight in matrix

    u v

    y x

    w

    V

    u

    v

    w

    x

    y

    Adjacency list

    v y

    x w

    x

    y

    v

    u

    v

    w

    x

    y

    u v w x y

    1 1

    1 1

    1

    1

    1

    Adjacency matrix

    tradeoffs:

    Adj. List: easier to grow #of vertices, good for sparse

    Adj. Matrix: faster access, but O(V2) storage

    35

  • 8/3/2019 ALGO Notes

    37/62

    9.2 Breadth-first search

    distance here means # of edges, not edge weight!e.g., start from u:

    = 0: u

    = 1: v, y

    = 2: w, x

    u v

    y x

    w

    = 1

    = 2

    start

    visit vertices in increasing shortest distancevisit all of distance d before visiting distance d+ 1 ver-tices

    BFS order may not be unique!!

    Example: BFS(G, u) can be[u, v,y, w,x], [u, v,y,x, w], [u,y, v, w,x], [u,y, v,x,y]

    Algorithm

    BFS(G, s)1 for each u V[G]{s}2 do color[u] WHITE3 d[u] 4 [u] NI L5 color[s] COLOR6 d[s] 07 [s] NI L8 Q /09 ENQUEUE(Q, s)

    10 while Q = /011 do u DEQUEUE(Q)12 for each v Adj[u]13 do ifcolor[v] = WHITE

    14 then color[v] GRAY15 d[v] d[u] + 116 [v] u17 ENQUEUE(Q, v)18 color[u] BLACK

    main idea: for each vertex ofd, enqueue vertices ofd+1

    avoid queuing more than once: use color

    WHITE: never been enqueued

    GRAY: currently in queue

    BLACK: already visited

    actually COLOR is somewhat redundant... enough

    to test (predecessor)

    Lemma 22.1: for any edge (u, v) E, (s, v) (s, u)+1 if we can reach u in edges, we can reach v in +1

    edges (by taking edge (u, v))

    v might actually be closer than u, but thats ok

    d[v] computed by BFS satisfies d[v] (s, v) initialize to d[u] = (s, u)

    base case: since we start with d[s] = 0, all those uwith (s, u) will have d[u] = (s, u) = 1

    induction: assume d[u] = (s, u), consider edge(u, v).ifcolor[v] = WHITE then it has been visited beforeand d[v] < (s, u) + 1.ifcolor[v] = WHITE then it is being visited for thefirst time: d[v] = (s, u) + 1 = d[u] + 1 = (s, v).

    9.3 Depth-first search

    depth is not necessarily the same as distance in BFS!

    discover vertices (use a stack) before we visit!

    pushing vertices on stack (if we havent pushed it before)

    DFS order may not be unique!

    Example: DFS with tree

    u

    v

    x y

    w

    z

    Output when closing parenthesis: [x,z,y, v, w, u]

    start

    u

    v

    x

    z

    w

    (u (v (x x) (y (z z) y) v) (w w) u)

    1 2 3 4 5

    y

    6 7 8 9 10 11 12

    DFS(G)1 for each v V[G]2 do color[u] WHITE3 [u] NI L4 time 05 for each u V[G]6 do ifcolor[u] = WHITE7 then DFS-VISIT(u)

    DFS-VISIT(u)1 color[u] GRAY discovered u2 time time +13 d[u] time4 for each v Adj[u]5 do ifcolor[v] = WHITE6 then [v] u7 DFS-VISIT(v)8 color[u] BLACK9 f[u] time time +1

    36

  • 8/3/2019 ALGO Notes

    38/62

  • 8/3/2019 ALGO Notes

    39/62

    (Lemma 22.13) Component graph GT is a DAG

    Notation:

    Cdenote a strongly connected component

    d[C] = minvCd[v] (earliest of discovery times) f[C] = maxv

    C f[v] (latest of finish times)

    C,C two distinct SCCs ofG

    (Lemma 22.14) if edge (u, v) E and C C, thenf(C) > f(C).

    Intuition: C is deeper than C and therefore fin-ishes earlier.

    (Lemma 22.15 paraphrased from book!)(v, u) ET and C C, then f(C) < f(C).

    Proof: (v, u) ET (u, v) Ebecause v C is deeper (finishes earlier), f(C) >> Adj[t][x, y, z]

    Vertices V ofG

    >>> Adj.keys()[t, z, s, x, y]

    # Note: could be in another