11 minutes
ICOM6012 Network Layer

Overview
Services and protocols
- transport segment from sending to receiving host
 - network layer protocols in every Internet device, including hosts and routers
 - IP provides best-effort services only
 - two functions
- forwarding (data plane): local action, move arriving packets from router's input link to appropriate router output link
 - routing (control plane): global action, generated by routing algorithms and determine source-destination paths (end-to-end)
- centralized (e.g. telephone network)
- "emerging" approach under the context of Software-Defined Networking (SDN)
 - routing is done by controller (a centralized server)
- Q: Different routing algorithms can be easily used (why)?
 - A: Yes, the routing paths are determined by ourselves. This time I choose x path (the shortest way), the next time I can choose y path (the lowest delay way). Because routers do not collaborate.
 
 - the controller determines the paths (based on various packet header fields), and configures the forwarding tables at routers
 - In SDN, routers are called "openflow switches", because the routing function is done by the controller
- Q: Can we use this method in the whole network?
 - A: It is impossible. This method can only be used in small network, such as campus network, enterprise network and datacenter network.
 
 - should obey the Openflow Specification (standard)
 - routing can be designed by software (programming, you can control the routing path by yourself)
 
 - distributed
- routers collaborate with each other to find (shortest) paths (based on destination), and configure their own fowarding tables accordingly
 - self-healing
 - routing protocol is implemented inside routers
 - network operator lacks control of routing paths
 
 
 - centralized (e.g. telephone network)
 
 
Forwarding
What is inside a router
- router architecture overview

- input port functions
- decentralized switching (the red one)
- given data dest, lookup out port using forwarding table in input port memory
 - goal: complete input port processing at "line rate/speed" (completely use the bandwidth, not want to become bottlenect)
 - queuing: if datagrams arrive faster than forwarding rate into switch fabric

 
 
 - decentralized switching (the red one)
 - switching fabrics
- memory: links have been generated
 - bus: just like broadcasting
 - crossbar: lost of buses, just like a small switch
- configure some points connection or disconnection to make packets can be transmitted parallelly

 
 - configure some points connection or disconnection to make packets can be transmitted parallelly
 - some high-speed router combine different techniques together
 - switching rate: rate at which packets can be transferred
- output-queued switch: switching rate = N times of line rate
- allow N packets (R) come to one output port together
 - one packet come, output immediately (ideal), but expensive
 
 - input-queued switch: switching rate = line rate
- only allow single packet to one output port, others would wait (if buffer flow, tcp would help)
 - in practice, use this method

 
 
 - output-queued switch: switching rate = N times of line rate
 
 - output ports
- buffering required when datagrams arrive from fabric faster than the transmission rate
 - scheduling discipline chooses among queued datagrams for transmission

 
 
 - input port functions
 
IP: the Internet protocol
- routing protocols and ICMP all rely on IP, so IP is the "only" standard in the network layer
 - datagram format
- ver: 4 bits, e.g. IPv4, IPv6
 - header length: 4 bits, 4 bytes as a union (just like TCP), the minimum header size is 20 bytes
 - type of service: 8 bits, choose which type of service should be used, because the designer believe IP can do more than best-effort things (but majority people do not use this)
 - length: theoretically, maximum size of datagram is \(2^{16}-1\)
- actually, smaller than that, due to frame has its limitation
 
 - 16-bit identifier: judge the packets belong to the same original message
 - flags: show the order of one packet, e.g. this packet is the last one of the original message
 - fragment offset: reassembly original message in order
- many routers may not support fragmentation
 - try not to make the message be fragmented (in designer view)
 - IPv6 removes 16-bit identifier, flags and fragment offset
- fragmentation would be done by host, not router
 
 
 - time to live: set by sending host, 255 (max, initially, due to 8 bits)
- arrive at each router, minus 1, until 0, drop it
 - prevent looping
 
 - upper layer: link the transport layer and network layer
- 6: TCP
 - 17: UDP
 - 89: OSPF
 
 - header checksum: hop-by-hop (due to the value of time to live has changed) basis error detection, only calculate header (check before forwarding)
- checksum of TCP and UDP are end-to-end basis, calculate all things

 
 - checksum of TCP and UDP are end-to-end basis, calculate all things
 
 
IP addressing
- IP address is associated with interface, not host
- multiple interfaces mean multiple IP addresses
 - interface: connection between host/router and physical/wireless link
- router has multiple interfaces, typically
 - host has one or two interfaces, typically (e.g. wired Ethernet, WiFi, Bluetooth)
 - Q: How are interfaces actually connected?
 - A: Ethernet switches, WiFi base station, etc.
 
 
 - subnet: a network inside a network
- device interfaces can communicate with each other without routers
 - should have unique subnet address
 - each forwarding table entry corresponds to a subnet, or a range of addresses, to make forwarding table simple
- forwarding table would be bottleneck, due to finding routes
 
 
 - structure
- subnet part: high order bits (subnet mask)
 - host part: remaining low order bits
- within a subnet, host address must be unique
 
 
 - old days: classful addressing

 - CIDR (classless interdomain routing)
- subnet portion of address of arbitrary length
 - address format: a.b.c.d/x, where x is number bits in subnet portion of address
 - specific (0.0.0.0 means you have no IP address)

 
 - how to get IP addresses
- host part: DHCP (dynamic host configuration protocol)
- DHCP server would be in the subnet (e.g. at home, WiFi router has a DHCP server), without configuration by user
 - overview (dora)
- host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
- broadcast
 - src: 0.0.0.0 68
 - dest: 255.255.255.255 67
 
 - DHCP server responds with "DHCP offer" msg (optional, can be multiple, but only accept one)
- broadcast
 - src: 223.1.2.5 67
 - dest: 255.255.255.255 68
 - lifetime: 3600s
 
 - host requests IP address: "DHCP request" msg
- broadcast (let other DHCP servers know you want to accept one offer, others would know their offers are not successful)
 - src: 0.0.0.0 68
 - dest: 255.255.255.255 67
 - lifetime: 3600s
 
 - DHCP server sends address: "DHCP ack" msg
- broadcast
 - src: 223.1.2.5 67
 - dest: 255.255.255.255 68
 - lifetime: 3600s
 
 
 - host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
 - other parts of DHCP
- IP address of first-hop router
 - name and IP address of DNS server (e.g. dns.google.com -> 8.8.8.8, due to collection of data, it is free)
 - subnet mask (indicating network versus host portion of address)
- it can dertermine whether the packet transmission needs a router
 
 
 
 - subnet part: from ISP/ICANN -> public IP addresses (unique on the Internet)
- but in setting up a new WiFi router, network part address -> private IP addresses (unique on the home network, so your this IP address can be the same as the others)
 
 
 - host part: DHCP (dynamic host configuration protocol)
 NAT (network address translation)
- used in routers
- use port number (should be unique, other processes can not use) to match with both sides
 
 - translates a set of IP addresses to another set of IP addresses (using translation table)
 - help preserve the limited amount of IPv4 public IP addresses (with private IP addresses)
- public IP addresses
- publicly registered
 - directly access the Internet with a public IP address
 
 - private IP addresses
- not publicly registered
 - cannot directly access the Internet with a private IP address
 - only used internally, those IP addressed can not be seen on the Internet or routers
 - if the packet contains these IP addresses, it may be considered as an error and dropped immediately
 
 
 - public IP addresses
 
name start IP address end IP address subnet remark 24-bit block 10.0.0.0 10.255.255.255 10.0.0.0/8 apple use this 20-bit block 172.16.0.0 173.31.255.255 172.16.0.0/12 not many use this 16-bit block 192.168.0.0 192.168.255.255 192.168.0.0/16 most use this, e.g. asus - advantage: good for security, just like a firewall, that is why we may use this even if we will use the IPv6
 - outside hosts want to communicate with an internal host
- DDNS
 - configure NAT translation table in advance (the router would allow you to do that), called port forwarding
 
 - Q: The IP address you find on your iPhone is your smart phone IP address?
- A: No. This IP address is assigned by WiFi router, you can dial *3001#12345#* to test for iPhone.
 
 
- used in routers
 IPv6
- differences between IPv4 and IPv6

 - motivations
- to solve the IPv4 address space shortage problem
 - to speed up packet processing/forwarding (by flow label)
 - to facilitate QoS
 
 - transition from IPv4 to IPv6
- use tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers

 - set the value of upper layer protocol (in IPv4 header) is 41, making routers know IPv4 datagram is covered by IPv6 datagram
 - problems
- more overhead
 - packets would be too big and fragmentated by router
- consider the maximum length of Ethernet frame is 1500 bytes
 
 
 - example

 
 - use tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers
 
- differences between IPv4 and IPv6
 
Routing
Classification overview

Link-state routing
- net topology, link costs known to all nodes
- via "link state broadcast", each router knows its neighbours by configuration
 - all nodes have same info
 
 - each node computes its shortest paths to all other nodes using Dijkstra's algorithm
- based on the shortest paths found, configure the local forwarding table
 
 - OSPF (open shortest path first)
- open: publicly available (cisco may monopoly the market in the past)
 - link state routing
- LS packet dissemination
 - topology map at each other
 - route computation using Dijkstra's algorithm
 
 - OSPF advertisement message carries one entry per neighbour
 - advertisement flooded to entire network
- directly over IP (rather than TCP or UDP) with "upper layer = 89"
 
 - reliability (although use IP)
- by retransmission, just like DNS
 
 
 

Distance vector routing
- distance = hop count
 - vector = next hop
 - by periodically exchanging distance vectors (DVs) with neighbours, each router knows neighbours' distance to destinations
 - each router uses Bellman-Ford algorithm to refine its own DVs.
- e.g., using neighbour with the shortest distance to a destination as next hop
 - "routing by rumors"

 
 - LS vs DV
- LS has the global topology and information, and each router calculate the ways by themselves
 - DV just trust others, they share the information with each other, each router just knows their neighbours know
 - the result may be same
 - message complexity
- LS: with n nodes & E links, O(nE) messages sent
 - DV: exchange between neighbours only
 
 - speed of convergence
- LS: relatively fast
 - DV: convergence time varies
 
 - robustness
- LS: node can advertise incorrect link cost, but each node computes only its own table
 - DV: node can advertise incorrect path cost, each node's table just trust and use it (error propagation)
 
 
 - RIP (routing information protocol)
- distance metric: the number of hops (max = 15), each link has cost 1
 - DVs exchanged with neighbours every 30 sec in advertisement messages
 - advertisements sent in UDP segments
 
 

Hierarchical routing
- aggregate routers into regions, called "autonomous systems" (AS)
- each AS is assigned a unique AS number (16 bits, but change to 32 bits today, due to shortage)
 - routers in same AS run same routing protocol
- "intra-AS" routing protocol, e.g. OSPF, RIP
 - routers in different AS can run different intra-AS routing protocol
 
 - ASes must be interconnected via gateway routers
 - forwarding table configured by both intra- and inter-AS routing algorithm
- intra-AS sets entries for internal dests
 - inter-AS & intra-AS sets entries for external dests
 - for multiple ASes, we can use hot potato routing (shorest way) or obey the policy (e.g. content provider), etc.
 - two layers is enough, but in the future, it is hard to say
 
 
 

BGP (border gateway protocol)
- may be version 4
 - glue that holds the Internet/ASes together
 - tasks
- for outbound traffic (how to find suitable way to other ASes)
- obtain subnet reachability information from neighbouring ASes
 - propagate reachability information to all AS-internal routers
 - determine good routers to other networks based on reachability information and policy
 
 - for inbound traffic
- advertise subnets that the AS can help to reach
 - hosts of outside ASes go to my ASes, make sure hosts of my AS are visible
 
 
 - for outbound traffic (how to find suitable way to other ASes)
 BGP session: two BGP routers ("peers") exchange BGP messages
- advertising paths to different subnets ("path vector" protocol)
 - exchange over TCP connections (sever port 179)
- reliable data transfer
 - permenant connection, only one overhead
 
 - catagory
- eBGP (external BGP)
- logical and TCP connection
 - direct link
 
 - iBGP (internal BGP)
- logical and TCP connection
 - share information (purpose)
 
 
 - eBGP (external BGP)
 

- conprehensive example
- AS3 is willing to carry transit traffic N1: router 3a advertises path (N1, AS3) to router 1c over eBGP session
 - 1c applies its IMPORT policies to decide whether it wants to forward packets to N1 via 3a
- if yes, forwarding table in 1c is updated to indicate 3a as the next-hop for N1
 
 - based on its EXPORT policies, assume AS1 is willing to carry transit traffic (from other ASes) to N1 (if AS3 is AS1 customer)
 - AS1 advertise path (N1, AS3, AS1) to AS2 via eBGP session. (note: 1b receives path (N1, AS3) from 1c via iBGP session)

 
 - elimination rules
- local preference (LOCAL_PREF) value attribute: policy decision
 - shortest AS-PATH: AS hops rather than router hops
 - closest NEXT-HOP router: hot potato routing
 - additional criteria: backbone, Tier-1 ISP, etc.
 
 
Internet Control Message Protocol (ICMP)
Overview
- mainly focus on error reporting (also echo request/reply by ping)

 - network-layer above IP
- upper layer protocol = 1
 
 - ICMP message: type and code

 
Traceroute and ICMP
- source sends three UDP segments to dest
- first set has TTL = 1
 - second set has TTL = 2, etc.
 - unlikely port number (very likely no specific ports in the dest)
 
 - when nth set of datagrams arrives to nth router
- router discards datagrams
 - and sends source ICMP messages (type 11, code 0)
 - ICMP messages includes name of router & IP address
 
 - when ICMP messages arrive, source records RTTs
 - stopping criteria
- UDP segment eventually arrives at destination host
 - destination returns ICMP "port unreachable" message (type 3, code 3)
 - source stops
 
 
Datacenter Networks
Load balancer: application-layer routing
- receives external client requests
 - directs workload within datacenter
- datacenter TCP (specifically)
 
 - returns results to external client (hiding datacenter internals from client)
 
