114
Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008 Tema 2: Aspectos de encaminamiento Tema 2: Aspectos de encaminamiento Algoritmos básicos de encaminamiento Link state Distance Vector Encaminamiento en Internet RIP OSPF BGP Multi-Protocol Label Switching (MPLS). IP multicast

Tema 2: Aspectos de encaminamiento - grc.upv.es · Transmisión de Datos Multimedia -Master IC 2007/2008 2 1 3 2 0111 value in arriving packet’s header routing algorithm local forwarding

Embed Size (px)

Citation preview

Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008

Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento

�Algoritmos básicos de encaminamiento � Link state

� Distance Vector

�Encaminamiento en Internet � RIP

� OSPF

� BGP

�Multi-Protocol Label Switching (MPLS).

�IP multicast

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

1

23

0111

value in arriving

packet’s header

routing algorithm

local forwarding table

header value output link

0100

01010111

1001

3

22

1

Interplay between routing, forwarding

Computer Networking: A Top Down Approach Featuring the

Internet, 3rd edition.

Jim Kurose, Keith RossAddison-Wesley, July 2004.

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

u

yx

wv

z2

2

13

1

1

2

53

5

Graph: G = (N,E)

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

Graph abstraction

Remark: Graph abstraction is useful in other network contexts

Example: P2P, where N is set of peers and E is set of TCP connections

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

Graph abstraction: costs

u

yx

wv

z2

2

13

1

1

2

53

5• c(x,x’) = cost of link (x,x’)

- e.g., c(w,z) = 5

• cost could always be 1, or

inversely related to bandwidth,

or inversely related to

congestion

Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)

Question: What’s the least-cost path between u and z ?

Routing algorithm: algorithm that finds least-cost path

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

Routing Algorithm classification

Global or decentralized information?

Global:

� all routers have complete topology, link cost info

� “link state” algorithms

Decentralized:

� router knows physically-connected neighbors, link costs to neighbors

� iterative process of computation, exchange of info with neighbors

� “distance vector” algorithms

Static or dynamic?

Static:

� routes change slowly over time

Dynamic:

� routes change more quickly

� periodic update

� in response to link cost changes

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

A Link-State Routing Algorithm: Dijsktra’s Algorithm

� net topology, link costs known to all nodes

� accomplished via “link state broadcast”

� all nodes have same info� computes least cost paths from one

node (‘source”) to all other nodes

� gives forwarding table for that node

� iterative: after k iterations, know least cost path to k destinations

Notation:

� c(x,y): link cost from node x to y; = ∞ if not direct neighbors

� D(v): current value of cost of path from source to destination v

� p(v): predecessor node along path from source to v

� N': set of nodes whose least cost path definitively known

1 Initialization:

2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞7 8 Loop

9 find w not in N' such that D(w) is a minimum 10 add w to N'11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

1 AD 2, A 4, D 2, D ~

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

1 AD 2, A 4, D 2, D ~

2 ADE 2, A 3, E 4, E

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

1 AD 2, A 4, D 2, D ~

2 ADE 2, A 3, E 4, E

3 ADEB 3, E 4, E

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

1 AD 2, A 4, D 2, D ~

2 ADE 2, A 3, E 4, E

3 ADEB 3, E 4, E

4 ADEBC 4, E

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

2

Dijkstra’s algorithm example

A F

B

D E

C2

2

2

3

1

1

1

3

5

step SPT D(b), P(b) D(c), P(c) D(d), P(d) D(e), P(e) D(f), P(f)

0 A 2, A 5, A 1, A ~ ~

1 AD 2, A 4, D 2, D ~

2 ADE 2, A 3, E 4, E

3 ADEB 3, E 4, E

4 ADEBC 4, E

5

B C D E F

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

3

Dijkstra’s algorithm example

A

ED

CB

F

Resulting shortest-path tree from A:

BD

E

C

F

(A,B)

(A,D)

(A,D)

(A,D)

(A,D)

destination link

Resulting forwarding table in A:

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

4

Dijkstra’s algorithm, discussion

Algorithm complexity: n nodes

� each iteration: need to check all nodes, w, not in N

� n(n+1)/2 comparisons: O(n2)

� more efficient implementations possible: O(nlogn)

Oscillations possible:

� e.g., link cost = amount of carried traffic

A

D

C

B1 1+e

e0

e

1 1

0 0

A

D

C

B2+e 0

001+e 1

A

D

C

B0 2+e

1+e10 0

A

D

C

B2+e 0

e01+e 1

initially… recompute

routing… recompute … recompute

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

5

Distance Vector Algorithm

� Bellman-Ford Equation (dynamic programming)

� Define: dx(y) := cost of least-cost path from x to y

� Then

� where min is taken over all neighbors v of x

vdx(y) = min {c(x,v) + dv(y) }

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

6

Bellman-Ford example

u

yx

wv

z2

2

13

1

1

2

53

5

Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3

du(z) = min { c(u,v) + dv(z),c(u,x) + dx(z),c(u,w) + dw(z) }

= min {2 + 5,1 + 3,5 + 3} = 4

Node that achieves minimum is nexthop in shortest path ➜ forwarding table

B-F equation says:

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

7

Distance Vector Algorithm

� Node x maintains the Distance vector:

� Dx = [Dx(y): y є N ]

� Where Dx(y) = estimate of least cost from x to y

� Node x also maintains its neighbors’ distance vectors

� For each neighbor v, x maintains Dv = [Dv(y): y є N ]

� Node x knows the cost to each neighbor v: c(x,v)

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

8

Distance vector algorithm

Basic idea:

� Each node periodically sends its own distance vector estimate toneighbors

� When a node x receives new DV estimate from neighbor, it updatesits own DV using B-F equation:

� Under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)

Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

9

Distance Vector Algorithm

� Iterative, asynchronous: each local iteration caused by:

� local link cost change

� DV update message from neighbor

� Distributed:

� each node notifies neighbors only when its DV changes

� neighbors then notify their neighbors if necessary

wait for (change in local link

cost of msg from neighbor)

recompute estimates

if DV to any dest has

changed, notify neighbors

Each node:

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

0

x y z

xyz

0 2 7

∞∞ ∞∞∞ ∞

from

cost tofrom

from

x y z

xyz

0 2 3

from

cost tox y z

xyz

0 2 3

from

cost to

x y z

xyz

∞ ∞

∞∞ ∞

cost tox y z

xyz

0 2 7from

cost to

x y z

xyz

0 2 3

from

cost to

x y z

xyz

0 2 3from

cost tox y z

xyz

0 2 7

from

cost to

x y z

xyz

∞∞ ∞7 1 0

cost to

∞2 0 1

∞ ∞ ∞

2 0 17 1 0

2 0 17 1 0

2 0 13 1 0

2 0 13 1 0

2 0 1

3 1 0

2 0 1

3 1 0

time

x z12

7

y

node x table

node y table

node z table

Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}

= min{2+0 , 7+1} = 2Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}

= min{2+1 , 7+0} = 3

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

1

Comparison of LS and DV algorithms

� Message complexity

� LS: with n nodes, E links, O(nE) msgs sent

� DV: exchange between neighbors only

� convergence time varies

� Speed of Convergence

� LS: O(n2) algorithm requires O(nE) msgs

� may have oscillations

� DV: convergence time varies

� may be routing loops

� count-to-infinity problem

� Robustness: what happens if router malfunctions?

� LS:

� node can advertise incorrect link cost

� each node computes only its own table

� DV:

� DV node can advertise incorrect path cost

� each node’s table used by others

– error propagate thru network

Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008

Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento

�Algoritmos básicos de encaminamiento � Link state

� Distance Vector

�Encaminamiento en Internet � RIP

� OSPF

� BGP

�Multi-Protocol Label Switching (MPLS).

�IP multicast

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

3

Hierarchical Routing

� Our routing study thus far - idealization

� all routers identical

� network “flat”

� … not true in practice

� scale: with 200 million destinations:

� can’t store all dest’s in routing tables!

� routing table exchange would swamp links!

� administrative autonomy

� internet = network of networks

� each network admin may want to control routing in its own network

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

4

Hierarchical Routing

Gateway router

� Direct link to router in another AS

� aggregate routers into regions, “autonomous systems” (AS)

� routers in same AS run same routing protocol

� “intra-AS” routing protocol

� routers in different AS can run different intra-AS routing protocol

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

5

3b

1d

3a

1c2aAS3

AS1

AS21a

2c2b

1b

Intra-AS

Routing

algorithm

Inter-AS

Routing

algorithm

Forwardingtable

3c

Interconnected ASes

� Forwarding table is configured by both intra- and inter-AS routing algorithm

� Intra-AS sets entries for internal dests

� Inter-AS & Intra-As sets entries for external dests

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

6

3b

1d

3a

1c2aAS3

AS1

AS21a

2c2b

1b

3c

Inter-AS tasks

� Suppose router in AS1 receives datagram for which dest is outside of AS1

� Router should forward packet towards one of the gateway routers, but which one?

� AS1 needs:

� to learn which dests are reachable through AS2 and which through AS3

� to propagate this reachabilityinfo to all routers in AS1

� Job of inter-AS routing!

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

7

Learn from inter-AS protocol that subnet x is reachable via multiple gateways

Use routing infofrom intra-AS

protocol to determinecosts of least-cost

paths to eachof the gateways

Hot potato routing:Choose the gateway

that has the smallest least cost

Determine fromforwarding table the interface I that leads

to least-cost gateway. Enter (x,I) in

forwarding table

Example: Choosing among multiple ASes

� Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2.

� To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x.

� This is also the job on inter-AS routing protocol!

� Hot potato routing: send packet towards closest of two routers.

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

8

Intra-AS Routing

� Also known as Interior Gateway Protocols (IGP)

� Most common Intra-AS routing protocols:

� RIP: Routing Information Protocol

� OSPF: Open Shortest Path First

� IGRP: Interior Gateway Routing Protocol (Cisco proprietary)

Transm

isión de Datos Multimedia -Master IC 2007/2008

2

9

RIP ( Routing Information Protocol)

� Distance vector algorithm

� Included in BSD-UNIX Distribution in 1982

� Distance metric: # of hops (max = 15 hops)

DC

BA

u v

w

x

yz

destination hopsu 1v 2w 2x 3y 3z 2

From router A to subsets:

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

0

RIP advertisements

� Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)

� Each advertisement: list of up to 25 destination nets within AS

� Table processing:

� RIP routing tables managed by application-level process called route-d (daemon)

� advertisements sent in UDP packets, periodically repeated

physical

link

network forwarding(IP) table

Transprt(UDP)

routed

physical

link

network(IP)

Transprt(UDP)

routed

forwardingtable

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

1

RIP: Link Failure and Recovery

� If no advertisement heard after 180 sec --> neighbor/link declared dead

� routes via neighbor invalidated

� new advertisements sent to neighbors

� neighbors in turn send out new advertisements (if tables changed)

� link failure info quickly propagates to entire net

� poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

2

OSPF (Open Shortest Path First)

� “open”: publicly available

� Uses Link State algorithm

� LS packet dissemination

� Topology map at each node

� Route computation using Dijkstra’s algorithm

� OSPF advertisement carries one entry per neighbor router

� Advertisements disseminated to entire AS (via flooding)

� Carried in OSPF messages directly over IP (rather than TCP or UDP

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

3

OSPF “advanced” features (not in RIP)

� Security: all OSPF messages authenticated (to prevent malicious intrusion)

� Multiple same-cost paths allowed (only one path in RIP)

� For each link, multiple cost metrics for different TOS (e.g., satellite link cost set “low” for best effort; high for real time)

� Integrated uni- and multicast support:

� Multicast OSPF (MOSPF) uses same topology data base as OSPF

� Hierarchical OSPF in large domains.

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

4

Hierarchical OSPF

� Two-level hierarchy: local area, backbone.

� Link-state advertisements only in area

� each nodes has detailed area topology; only know direction (shortest path) to nets in other areas.

� Area border routers: “summarize”distances to nets in own area, advertise to other Area Border routers.

� Backbone routers: run OSPF routing limited to backbone.

� Boundary routers: connect to other AS’s.

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

5

Internet inter-AS routing: BGP

� BGP (Border Gateway Protocol): the de facto standard

� BGP provides each AS a means to:1. Obtain subnet reachability information from neighboring ASs.

2. Propagate the reachability information to all routers internal to the AS.

3. Determine “good” routes to subnets based on reachability information and policy.

� Allows a subnet to advertise its existence to rest of the Internet: “I am here”

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

6

BGP basics

� Pairs of routers (BGP peers) exchange routing info over semi-permanent TCP connections: BGP sessions

� Note that BGP sessions do not correspond to physical links.

� When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix.

� AS2 can aggregate prefixes in its advertisement

3b

1d

3a

1c2a

AS3

AS1

AS21a

2c

2b

1b

3c

eBGP session

iBGP session

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

7

BGP route selection

� Router may learn about more than 1 route to some prefix. Router must select route.

� Elimination rules:

� Local preference value attribute: policy decision

� Shortest AS-PATH

� Closest NEXT-HOP router: hot potato routing

� Additional criteria

Transm

isión de Datos Multimedia -Master IC 2007/2008

3

8

Why different Intra- and Inter-AS routing ?

� Policy:

� Inter-AS: admin wants control over how its traffic routed, who routes through its net.

� Intra-AS: single admin, so no policy decisions needed

� Scale:

� hierarchical routing saves table size, reduced update traffic

� Performance:

� Intra-AS: can focus on performance

� Inter-AS: policy may dominate over performance

Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008

Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento

�Algoritmos básicos de encaminamiento � Link state

� Distance Vector

�Encaminamiento en Internet � RIP

� OSPF

� BGP

�Multi-Protocol Label Switching (MPLS).

�IP multicast

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

0

MPLS - The Motivation

� IP Protocol Suite - the most predominant networking technology.

� Voice & Data convergence on a single network infrastructure.

� Continual increase in number of users.

� Demand for higher connection speeds.

� Increase in traffic volumes.

� Ever-increasing number of ISP networks.

� MPLS Working Groups and Standards

� Standardized by the IETF - currently in Draft stage.

� MPLS recommendations are done by IP players for IP services

� MPLS core components are generic

� MPLS doesn’t use specific technology process

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

1

MPLS and ISO model

PPP

Physical (Optical - Electrical) 1

2

IP 3

4

Applications7to

5

Frame

RelayATM (*)

TCP UDP

PPP Frame Relay ATM (*)

MPLS

(*) ATM overlay model

(without addressing and

P-NNI) is considered as an ISO layer 2 protocol.

IETF main goal is that when a layer is added, no modification is needed on the existing layers.All new protocol must be backward compatible

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

2

Some MPLS Terms...

� LER - Label Edge Router

� LSR - Label Switch Router

� FEC - Forward Equivalence Class

� Label - Associates a packet to a FEC

� Label Stack - Multiple labels containing information on how a packet is forwarded.

� Shim - Header containing a Label Stack

� Label Switch Path - path that a packet follows for a specific FEC

� LDP - Label Distribution Protocol, used to distribute Label information between MPLS-aware network devices

� Label Swapping - manipulation of labels to forward packets towards the destination.

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

3

Routing protocol OSPF OSPF OSPF

Attributes Precedence

Local tableLabel table Local table Local table

LSP (Label-Switched Paths)Label swapping Label removal

Classification

Label assignment

Ingress

Node

Core

Node

Egress

Node

Label SwitchLayer 2

Layer 1

Layer 2

Layer 1

Layer 2

Layer 1

Layer 2

Layer 1

Layer 2

Layer 1

MPLS Architecture

FEC table Local table Local table Local table

Forward Equivalence Class

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

4

Label swapping

Label removal

Classification

Label assignment

Label swapping

Label removal

Classification

Label assignment

OSPF / RIP / IS-IS

Label Switch Path

Label table

Ingress

Node

Core

Node

Egress

Node

Layer 2

Layer 1

Layer 2

Layer 1

Layer 2

Layer 1

Precedence

Label table Label table

Layer 2

Layer 1

Layer 2

Layer 1

FEC FEC FEC

MPLS process

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

5

LSR

LER

LSR

LER

IP PacketIP Packet w/ Label

L3 RoutingL3 Routing

Label SwappingLabel Swapping

LER: Label Edge Router

LERLER

L3 RoutingL3 Routing

L3 Routing

MPLS Cloud

LSR: Label Switch(ing) Router

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

6

Ingress Label FEC Egress Label

6 138.120.6/24 - xxxx 9

Ingress Label AttributeFEC Egress LabelIngress Label FEC Egress Label

6 138.120.6/24 - xxxx 9

Attribute

A

6 138.120.6/24 - xxxx 12B

•FECs are manually initiated by the operator

•A FEC is associated to at least one Label

FEC Classification

� A packet can be mapped to a particular FEC based on the following criteria:

� destination IP address,

� source IP address,

� TCP/UDP port,

� in case of inter AS-MPLS, Source-AS and Dest-AS,

� class of service,

� application used,

� …

� any combination of the previous criteria.

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

7

L2 Type L2 TypePort PortIngress Label Egress LabelFEC

ATM 1-1 12 (i.e. 4/65) F1 22 (i.e. 5/65)3-4ATM

ATM 1-1 15 (i.e. 0/25) F4 9 (i.e. 101) 5-1FR

Gig Eth 5-1 7 F1 22 (i.e. 4/65)3-4ATM

What is a Label?

� A label is a short, fixed length, locally significant identifier used to identify a FEC.

� The label can be identified by the L2 technology identifier (e.g. VPI/VCI for ATM, DLCI for FR or MPLS label for PPP/Ethernet).

� MPLS Label Assignment Schemes� Topology Driven

� Label assignment in response to routing protocols (OSPF and BGP) updates

� Control Driven � Label assignment in response to RSVP, CR-LDP requests

� Traffic Driven � Label assignment in response to flow detection & triggering

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

8

Label

(20 bits)

Exp

(3 bits)

S

(1 bit)

TTL

(8bits)

Label : Label value (0 to 15 are reserved for special use)

Exp : Experimental Use

S : Bottom of Stack (set to 1 for the last entry in the label)

TTL : Time To Live

The MPLS Shim Header

� The Label (Shim Header) is represented as a sequence of Label Stack Entry

� Each Label Stack Entry is coded by 4 bytes (32 bits) as described

� 20 Bits is reserved for the Label Identifier (also named Label)

� Based on the contents of the label a swap, push (impose) or pop(dispose) operation can be performed on the packet's label stack

Transm

isión de Datos Multimedia -Master IC 2007/2008

4

9

Label Switched Path

5 12

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 312

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 x4

53

IngressInterface

IngressLabel

FEC Egress

Interface

EgressLabel

1 x 138.120

MPLS switch

MPLS switch

MPLS switch

MPLS switch1

2

3

1 2

3

1

2

3

4

1

2

3

138.120

192.168127.20

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

0

MPLS switch

MPLS switch

MPLS switch

MPLS switch1

2

3

1 2

3

1

2

3

4

1

2

3

138.120

192.168127.20

138.120.6.12

138.120.6.12

138.120.6.12138.120.6.12

138.120.6.12

138.120.6.12

??

138.120.6.12

Default3

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 x None

??

138.120.6.12

Default Default

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 None 3

??

138.120.6.12 ??

138.120.6.12

Default

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 None x4

??138.120.6.12

Hop by Hop IP forwarding

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

1

5 12

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 312

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 x4

53

IngressInterface

IngressLabel

FEC EgressInterface

Egress

Label

1 x 138.120

MPLS switch

MPLS switch

MPLS switch

MPLS switch1

2

3

1 2

3

1

2

3

4

1

2

3

138.120

192.168127.20

138.120.6.12

138.120.6.12

138.120.6.12

138.120.6.12 138.120.6.12

IP forwarding using Label Switch Path

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

2

MPLS Label Distribution Protocol

� LDP - a set of procedures by which one LSR informs the other of the FEC-to-Label binding it has made.

� Currently, several protocols used as Label Distribution Protocol(LDP) are available:

� RSVP-TE (MPLS extension)

� LDP and CR-LDP

� BGP-4 MPLS extensions

� Label Distribution schemes

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

3

Downstream stream on demand

Mapping 12Mapping 5

5 12

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 312

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 x4

53

IngressInterface

IngressLabel

FEC Egress

Interface

EgressLabel

1 x 138.120

Request 138.120

Request 138.120

MPLS switch

MPLS switch

MPLS switch

MPLS switch1

2

3

1 2

3

1

2

3

4

1

2

3

138.120

192.168127.20

The label is requested by the

upstream node and the

downstream node defines the label

used.

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

4

Unsolicited Downstream

MPLS switch

MPLS switch

MPLS switch

MPLS switch1

2

3

1 2

3

1

2

3

4

1

2

3

138.120

192.168

127.20Mapping 12

Mapping 5

5 12

12

5

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 3

Ingress

Interface

Ingress

LabelFEC Egress

Interface

Egress

Label

1 138.120 x4

3

Ingress

Interface

Ingress

LabelFEC Egress

Interface

EgressLabel

1 x 138.120

The downstream node defines

the label and advertises it to the

upstream node.

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

5

Edge LSR Features

� Routing protocols

� FEC Classification

� Initiates LSP setup for Downstream On Demand method

� Adaptation of non-MPLS data to MPLS data

� Layer 2 translation for MPLS data

� Terminated MPLS-VPN

� At least one LDP protocol

� Edge LSR is counted into the TTL count as a regular router

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

6

Core LSR Features

� Routing protocols

� Propagates Downstream On Demand method (request and mapping)

� Layer 2 translation

� High speed label forwarding/switching

� At least one LDP protocol

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

7

MPLS Advantages

� Simplified Forwarding

� Efficient Explicit Routing

� Traffic Engineering

� QoS Routing

� Mappings from IP Packet to Forwarding Equivalence Class (FEC)

� Partitioning of Functionality

� Common Operation over Packet and Cell media

Transmisión de Datos Multimedia – http://www.grc.upv.es/docencia/tdm – Master IC 2007/2008

Tema 2: Aspectos de encaminamientoTema 2: Aspectos de encaminamiento

�Algoritmos básicos de encaminamiento � Link state

� Distance Vector

�Encaminamiento en Internet � RIP

� OSPF

� BGP

�Multi-Protocol Label Switching (MPLS).

�IP multicast

Transm

isión de Datos Multimedia -Master IC 2007/2008

5

9

Multicast = Efficient Data Distribution

Src Src

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

0

Why Multicast ?

� Need for efficient one-to-many delivery of same data

� Applications:

� News/sports/stock/weather updates

� Distance learning

� Configuration, routing updates, service location

� Pointcast-type “push” apps

� Teleconferencing (audio, video, shared whiteboard, text editor)

� Distributed interactive gaming or simulations

� Email distribution lists

� Content distribution; Software distribution

� Web-cache updates

� Database replication

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

1

Why Not Broadcast or Unicast?

� Broadcast:

� Send a copy to every machine on the net

� Simple, but inefficient

� All nodes must process packet even if they don’t care

� Wastes more CPU cycles of slower machines (“broadcast radiation”)

� Network loops lead to “broadcast storms”

� Replicated Unicast:

� Sender sends a copy to each receiver in turn

� Receivers need to register or sender must be pre-configured

� Sender is focal point of all control traffic

� Reliability => per-receiver state, separate sessions/processes at sender

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

2

Multicast Apps Characteristics

� Number of (simultaneous) senders to the group

� The size of the groups

� Number of members (receivers)

� Geographic extent or scope

� Diameter of the group measured in router hops

� The longevity of the group

� Number of aggregate packets/second

� The peak/average used by source

� Level of human interactivity

� Lecture mode vs interactive

� Data-only (eg database replication) vs multimedia

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

3

Reliable Multicast vs. Unreliable Multicast

� When a multicast message is sent by a process, the runtime support of the multicast mechanism is responsible for delivering the message to each process currently in the multicast group.

� As each participating process may be on a separate host, due to factors such as failures of network links and/or network hosts, routing delays, and differences in software and hardware, the time between when a message is sent and when it is received may vary among the recipient processes.

� Moreover, a message may not be received by one or more of the processes at all.

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

4

Classification of multicasting mechanisms in terms of message delivery

� Unreliable multicast:

� The arrival of the correct message at each process is not guaranteed.

� Reliable multicast:

� Guarantees that each message is eventually delivered in a non-corrupted form to each process in the group.

� The definition of reliable multicast requires that each participating process receives exactly one copy of each message sent. It doesnot put any restriction of the order the messages delivered.

� Reliable multicast can be further classified based on the order of the delivery of the messages: unordered, FIFO, causal order, atomic order.

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

5

Classification of reliable multicast -- unordered

� An unordered reliable multicast system guarantees the safe delivery of each message, but it provides no guarantee on the delivery order of the messages.

� Example: Processes P1, P2, and P3 have formed a multicast group.Three messages, m1, m2, m3 have been sent to the group. An unordered reliable multicast system may deliver the messages to each of the three processes in any of these:

m1-m2-m3,

m1-m3-m2,

m2-m1-m3,

m2-m3-m1,

m3-m1-m2,

m3-m2-m1

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

6

Classification of reliable multicast - FIFO

� If process P sent messages mi and mj, in that order, then each process in the multicast group will be delivered the messages miand mj, in that order.

� Note that FIFO multicast places no restriction on the delivery order among messages sent by different processes. For example, P1 sends messages m11 then m12, and P2 sends messages m21 then m22. It is possible for different processes to receive any of the following orders:

m11-m12-m21-m22,

m11-m21-m12-m22,

m11-m21-m22-m12,

m21-m11-m12-m22

m21-m11-m22-m12

m21-m22-m11-m12.

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

7

Classification of reliable multicast – Causal order

� If message mi causes (results in) the occurrence of message mj, then mi will be delivered to each process prior to mj. Messages mi and mj are said to have a causal or happen-before relationship.

� For example, P1 sends a message m1, to which P2 replies with a multicast message m2. Since m2 is triggered by m1, the two messages share a causal relationship of m1-> m2. A causal-order multicast message system ensures that these two messages will bedelivered to each of the processes in the order of m1- m2.

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

8

Classification of reliable multicast –Atomic order

� In an atomic-order multicast system, all messages are guaranteed to be delivered to each participant in the exact same order. Note that the delivery order does not have to be FIFO or causal, but must be identical for each process.

� Example:

� P1 sends m1, P2 sends m2, and P3 sends m3.

� An atomic system will guarantee that the messages will be delivered to each process in only one of the six orders:

m1-m2-m3, m1-m3- m2, m2-m1-m3,

m2-m3-m1, m3-m1-m2, m3-m2-m1.

Transm

isión de Datos Multimedia -Master IC 2007/2008

6

9

IP Multicast Architecture

Hosts

Routers

Service modelService model

Host-to-router protocol

(IGMP)

Multicast routing protocols

(various)

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

0

IP Multicast model: RFC 1112

� Message sent to multicast “group” (of receivers)

� Senders need not be group members

� A group identified by a single “group address”

� Use “group address” instead of destination address in IP packet sent to group

� Groups can have any size;

� Group members can be located anywhere on the Internet

� Group membership is not explicitly known

� Receivers can join/leave at will

� Packets are not duplicated or delivered to destinations outside the group

� Distribution tree constructed for delivery of packets

� No more than one copy of packet appears on any subnet

� Packets delivered only to “interested” receivers => multicast delivery tree changes dynamically

� Network has to actively discover paths between senders and receivers

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

1

IP Multicast Addresses

� Class D IP addresses

� 224.0.0.0 – 239.255.255.255

� Address allocation:

� Well-known (reserved) multicast addresses, assigned by IANA: 224.0.0.x and 224.0.1.x Transient multicast addresses, assigned and reclaimed dynamically, e.g., by “sdr” program

� Each multicast address represents a group of arbitrary size, called a “host group”

� There is no structure within class D address space like subnetting=> flat address space

1 1 1 0 Group ID

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

2

IP Multicast Service

� Sending� Uses normal IP-Send operation, with an IP multicast address specified

as the destination

� Must provide sending application a way to:� Specify outgoing network interface, if >1 available

� Specify IP time-to-live (TTL) on outgoing packet

� Enable/disable loop-back if the sending host is/isn't a member of the destination group on the outgoing interface

� Receiving� Two new operations

� Join-IP-Multicast-Group(group-address, interface)

� Leave-IP-Multicast-Group(group-address, interface)

� Receive multicast packets for joined groups via normal IP-Receive operation

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

3

Link-Layer Transmission/Reception

� Transmission

� IP multicast packet is transmitted as a link-layer multicast, on those links that support multicast

� Link-layer destination address is determined by an algorithm specific to the type of link

� Reception

� Necessary steps are taken to receive desired multicasts on a particular link, such as modifying address reception filters on LAN interfaces

� Multicast routers must be able to receive all IP multicasts on a link, without knowing in advance which groups will be used

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

4

Using Link-Layer Multicast Addresses

� Ethernet and other LANs using 802 addresses:

� Direct mapping! Simpler than unicast! No ARP etc.

� 32 class D addresses may map to one MAC address

� Special OUI for IETF: 0x01-00-5E.

� No mapping needed for point-to-point links

LAN multicast address

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0

1 1 1 0 28 bits

23 bits

IP multicast address

Group bit

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

5

Multicast over LANs & Scoping

� Multicasts are flooded across MAC-layer bridges along a spanning tree

� But flooding may steal sending opportunity for non-member stations which want to transmit

� Almost like broadcast!

� Scope: How far do transmissions propagate?

� Implicit scoping: Reserved Mcast addresses => don’t leave subnet.

� Also called “link-local” addresses

� TTL-based scoping:

� Multicast routers have a configured TTL threshold

� Multicast datagram dropped if TTL <= TTL threshold

� Useful as a blanket parameter.

� Administrative scoping:

� Use a portion of class D address space (239.0.0.0 thru 239.255.255.255)

� Truly local to admin domain; address reuse possible.

� In IPv6, scoping is an internal attribute of an IPv6 multicast address

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

6

Multicast Scope Control – Small TTLs

� TTL expanding-ring search to reach or find a nearby subset of a group

� Rings can be nested, but not overlapping

s

1

2

3

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

7

IP Multicast Architecture

Hosts

Routers

Service model

HostHost--toto--router protocolrouter protocol

(IGMP)(IGMP)

Multicast routing protocols

(various)

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

8

Internet Group Management Protocol

� IGMP: “signaling” protocol to establish, maintain, remove groups on a subnet.

� Objective: keep router up-to-date with group membership of entire LAN

� Routers need not know who all the members are, only that membersexist

� Each host keeps track of which mcast groups are subscribed to

� Socket API informs IGMP process of all joins

Transm

isión de Datos Multimedia -Master IC 2007/2008

7

9

How IGMP Works

� On each link, one router is elected the “querier”

� Querier periodically sends a Membership Query message to the all-systems group (224.0.0.1), with TTL = 1

� On receipt, hosts start random timers (between 0 and 10 seconds)for each multicast group to which they belong

QRouters:

Hosts:

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

0

How IGMP Works (cont.)

� When a host’s timer for group G expires, it sends a Membership Report to group G, with TTL = 1

� Other members of G hear the report and stop (suppress) their timers

� Routers hear all reports, and time out non-responding groups

Q

G G G G

Routers:

Hosts:

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

1

How IGMP Works (cont.)

� Normal case: only one report message per group present is sent in response to a query

� Query interval is typically 60-90 seconds

� When a host first joins a group, it sends immediate reports, instead of waiting for a query

� IGMPv2: Hosts may send a “Leave group” message to “all routers”(224.0.0.2) address

� Querier responds with a Group-specific Query message: see if any group members are present

� Lower leave latency

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

2

The Java Basic Multicast API

� At the transport layer, the basic multicast supported by Java is an extension of UDP (the User Datagram Protocol)

� For the basic multicast, Java provides a set of classes which are closely related to the datagram socket API classes

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

3

Datagram - recap

a byte array

a DatagramPacket object

receiver's

address

a DatagramSocket

object

sender process

a byte array

a DatagramPacket object

a DatagramSocket

object

receiver process

send

receive

object reference

data flow

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

4

The Java Basic Multicast API - 2

� There are four major classes in the API, the first three of which we have already seen in the context of datagram sockets.

� InetAddress: In the datagram socket API, this class represents the IP address of the sender or receiver. In multicasting, this class can be used to identify a multicast group.

� DatagramPacket: As with datagram sockets, an object of this class represents an actual datagram; in multicast, a DatagramPacketobject represents a packet of data sent to all participants or received by each participant in a multicast group.

� DatagramSocket: In the datagram socket API, this class represents a socket through which a process may send or receive data.

� MulticastSocket : A MulticastSocket is a DatagramSocket, with additional capabilities for joining and leaving a multicast group. An object of the multicast datagram socket class can be used for sending and receiving multicast packets. In the Java API, a MulticastSocket object is bound to a port address, e.g. 3456, and methods of the object allows for the joining and leaving of a multicast address, e.g. 239.1.2.3

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

5

Joining a multicast group

To join a multicast group at IP address m and UDP port p, a MulticastSocket object must be instantiated with p, then the object’s joinGroup method can be invoked specifying the address m:

// join a Multicast group at IP address

// 239.1.2.3 and port 3456

InetAddress group =

InetAddress.getByName("239.1.2.3"); MulticastSocket s =

new MulticastSocket(3456); s.joinGroup(group);

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

6

Sending to a multicast group

A multicast message can be sent using syntax similar with the datagram socket API.

String msg = "a multicast message.";

InetAddress group =

InetAddress.getByName("239.1.2.3");

MulticastSocket s =

new MulticastSocket(3456);

s.joinGroup(group); // optional

DatagramPacket hi =

new DatagramPacket(msg.getBytes( ),

msg.length( ),group, 3456);

s.send(hi);

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

7

Receiving messages sent to a multicast group

A process that has joined a multicast group may receive messages sent to the group using syntax similar to receiving data using a datagram socket API.

byte[] buf = new byte[1000];

InetAddress group =

InetAddress.getByName("239.1.2.3");

MulticastSocket s =

new MulticastSocket(3456);

s.joinGroup(group);

DatagramPacket recv =

new DatagramPacket(buf,buf.length);

s.receive(recv);

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

8

Leaving a multicast group

� A process may leave a multicast group by invoking the leaveGroupmethod of a MulticastSocket object, specifying the multicast address of the group.

� s.leaveGroup(group);

Transm

isión de Datos Multimedia -Master IC 2007/2008

8

9

Setting the “time-to-live”

� The runtime support needs to propagate a multicast message from a host to a neighboring host in an algorithm which, when executed properly, will eventually deliver the message to all the participants.

� Under some anomalous circumstances, however, it is possible that the algorithm which controls the propagation does not terminate properly, resulting in a packet circulating in the network indefinitely.

� Indefinite message propagation causes unnecessary overhead on the systems and the network.

� To avoid this occurrence, it is recommended that a “time to live”parameter be set with each multicast datagram.

� The time-to-live (ttl) parameter, when set, limits the count of network links or hops that the packet will be forwarded on the network.

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

0

Setting the “time-to-live”

� The recommended ttl settings are:

� 0 if the multicast is restricted to processes on the same host

� 1 if the multicast is restricted to processes on the same subnet

� 32 if the multicast is restricted to processes on the same site

� 64 if the multicast is restricted to is processes on the same region

� 128 is if the multicast is restricted to processes on the same continent

� 255 is the multicast is unrestricted

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

1

Setting the “time-to-live”

String msg = "Hello everyone!";

InetAddress group =

InetAddress.getByName("239.1.2.3"); MulticastSocket s = new MulticastSocket(3456); s.setTimeToLive(1);

// set time-to-live to 1 hop

DatagramPacket hi =

new DatagramPacket(msg.getBytes( ),msg.length( ),group, 3456);

s.send(hi);

The value specified for the ttl must be in the range 0 <= ttl <= 255; an IllegalArgumentException will be thrown otherwise.

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

2

The C version: Joining Multicast Groups

� To join a group, you use the setsockopt() kernel service call with a new parameter. The new parameter is the ip_mreq structure:

� The imr_multiaddr field specifies the multicast group you want tojoin. It is the same format as the sin_addr field in the sockaddr_instructure. The imr_interface field lets you choose a particular hostinterface. This is similar to a bind(), which lets you specify the hostinterface (or leave the host option wide open with an INADDR_ANYvalue).

/************************************************************/

/*** The ip_mreq structure for selecting a multicast addr ***/

/************************************************************/

struct ip_mreq

{

struct in_addr imr_multiaddr; /* known multicast group */

struct in_addr imr_interface; /* network interface */

} ;

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

3

The C version: Joining Multicast Groups

� The following code snippet shows you how to join a group using theip_mreq structure. It sets the imr_interface field to INADDR_ANYmerely for demonstration. Do not use it unless you have only oneinterface on your host; the results can be unpredictable .

/************************************************************/

/*** Join a multicast group ***/

/************************************************************/

const char *GroupID = "224.0.0.10";

struct ip_mreq mreq;

if ( inet_aton(GroupID, &mreq.imr_multiaddr) == 0 )

panic("address (%s) bad", GroupID);

mreq.imr_interface.s_addr = INADDR_ANY;

if ( setsockopt(sd, SOL_IP, IP_ADD_MEMBERSHIP,&mreq,sizeof(mreq))!= 0)

panic("Join multicast failed");

/************************************************************/

/*** Drop a multicast group ***/

/************************************************************/

if ( setsockopt(sd, SOL_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)) != 0 )

panic("Drop multicast failed");

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

4

IP Multicast Architecture

Hosts

Routers

Service model

Host-to-router protocol

(IGMP)

Multicast routing

protocols

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

5

Multicast Routing

� Basic objective – build distribution tree for multicast packets

� The “leaves” of the distribution tree are the subnets containing at least one group member (detected by IGMP)

� Multicast service model makes it hard

� Anonymity

� Dynamic join/leave

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

6

Simple Multicast Routing Techniques

� Flood and prune

� Begin by flooding traffic to entire network

� Prune branches with no receivers

� Examples: DVMRP, PIM-DM

� Unwanted state where there are no receivers

� Link-state multicast protocols

� Routers advertise groups for which they have receivers to entirenetwork

� Compute trees on demand

� Example: MOSPF

� Unwanted state where there are no senders

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

7

How to Flood Efficiently ?

� A router forwards a packet from source (S) iff it arrives via the shortest path from the router back to S

� Reverse path check!

� Packet is replicated out all but the incoming interface

� Reverse shortest paths easy to compute � just use info in DV routing tables

� DV gives shortest reverse paths

� Efficient if costs are symmetric

xxyy

tt

SS

a

zz

Forward packets that arrive

on shortest path from “t” to “S”

(assume symmetric routes)

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

8

Problem

� Flooding can cause a given packet to be sent multiple times over the same link: can filter better than this!

� Solution: Reverse Path Broadcasting

xx yy

zz

SS

a

b

duplicate packet

Transm

isión de Datos Multimedia -Master IC 2007/2008

9

9

Reverse Path Broadcasting (RPB)

� Basic idea: forward a packet from S only on child links for S

� Child link of router x for source S: link that has x as parent on the shortest path from the link to S

xx yy

zz

SS

a

b

5 6

child link of x

for S

forward only

to child link

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

How to Find Child Links?

� Routing updates !

� If z tells x that it can reach S through y,

� and if the cost of this path is >= the cost of the path from z to S through x,

� then x knows that the link to z is a child link

� In case of tie, lower address wins

xx yy

zz

SS

a

b

5 6

child link of x

for S

forward only

to child link

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Truncated RPB

� This is still a broadcast algorithm – the traffic goes everywhere –lousy filtering!

� First order solution: Truncated RPB

� Don't forward traffic onto networks with no receivers

� Identify leaves

� Leaf links are the child links that no other router uses to reach source S

� Use periodic updates of form: – “this is my next-link to source S”

� If child is not the “next-link” for anyone, it is a leaf

� Detect group membership in leaf (IGMP)

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Reverse Path Multicast (RPM)

� Prune back transmission so that only absolutely necessary links carry traffic

� Use on-demand pruning so that router group state scales with number of active groups

xx yy

tt

SS

vv bbaa

aa bb

data message

prune message

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Basic RPM Idea

� Prune (Source,Group) at leaf if no members

� Send Non-Membership Report (NMR) up the tree

� If all children of router R prune (S,G)

� Propagate prune for (S,G) to parent R

� On timeout:

� Prune dropped

� Flow is reinstated

� Down stream routers re-prune

� Note: this is a soft-state approach

� Grafting: Explicitly reinstate sub-tree when

� IGMP detects new members at leaf, or when a child asks for a graft.

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Putting it together: Topology

G G

S

G

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Flood with Truncated Broadcast

G G

S

G

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Pruning

G G

S

Prune (s,g)

Prune (s,g)

G

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Graft (s,g)

Graft (s,g)

Grafting

G G

S

G

G

Report (g)

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

After Grafting Complete

G G

S

G

G

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

0

Reliable Multicast: The Goal

� Implement reliability on top of IP multicast

� Why is this hard ?

� Sender cannot keep state for unknown number of dynamic receivers

� Remember open & dynamic group semantic?

� Algorithms like TCP that estimate path properties such as RTT and congestion window don’t generalize to trees.

� Remember: TCP is only for a unicast session!

� Has to address (N)ACK implosions

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

R1

Implosion

S

R3 R4

R2

21

R1

S

R3 R4

R2

Packet 1 is lost All 4 receivers request a resend

Resend request

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

Retransmission

� Re-transmitter

� Options: sender, other receivers

� How to retransmit

� Unicast, multicast, scoped multicast, retransmission group, …

� Problem: retransmissions (aka repairs) may reach destinations that don’t require a retransmission

� A.k.a “exposure” problem

� Solution: subcast the re-transmission only to receivers that need it.

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

R1

Why Subcast? Exposure problem…

S

R3 R4

R2

21

R1

S

R3 R4

R2

Packet 1 does not reach R1;Receiver 1 requests a resend

Packet 1 resent to all 4 receivers

1

1

Resend request Resent packet

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

Ideal Recovery Model

S

R3 R4

R2

2

1

S

R3 R4

R2

Packet 1 reaches R1 but is lost before reaching other Receivers

Only one receiver sends NACK to the nearest S or R with packet

Resend request

1 1Resent packet

Repair sent

only to

those that

need packet

R1 R1

Transm

isión de Datos Multimedia -Master IC 2007/2008

1

1

Reliable Multicast Transport: Issues

� Retransmission can make reliable multicast as inefficient as replicated unicast

� (N)ACK-implosion if all destinations ack at once

� “Crying baby”: a bad link affects entire group

� Heterogeneity: receivers, links, group sizes

� Anonymous/Open/Dynamic Group Model:

� Source does not know # of destinations, and destinations may vanish

� Multicast applications do not need strong reliability of the type provided by TCP.

� Can tolerate some reordering, delay, etc