![]() |
DEPLOYING MPLS AND DiffServ IN THE NETWORK |
| Existing IP networks are mainly carrying Internet traffic. Traditionally this has always been dealt with as 'best efforts' traffic, meaning that there are no guarantees on packet loss, delay or jitter in the network. Because an increasing amount of people depend on the Internet for their work or private life, the demand for high quality Internet services has increased. Internet Service Providers (ISPs) are now providing Service Level Agreements (SLAs) to their customers, specifying what quality they intend to offer in terms of availability, packet loss and other parameters.
By deploying Virtual Private Networks (VPNs) on an IP backbone, the network operator can offer services it controls end-to-end. In the Internet this is not normally possible because most traffic originates or terminates on a different ISP's network, and the quality is not fully controlled by one ISP alone. Companies can use these VPNs to connect their offices, data centres, or even to facilitate transactions between them and other companies (IP Extranet). Many of the applications running on this service are business critical to these companies (eg, financial transactions and trading) and hence require a very high availability. The network operator will not only need to offer an SLA meeting the requirements of these companies, but also to design/engineer the network to provide the quality needed. Voice over IP (VoIP) and video applications generate even higher requirements for the network. They not only need a high availability and the lowest delay, but also require a low and bounded jitter (variance in delay). The interactive nature of these applications results in a direct service quality impact if any of the network requirements are not met. Of course these different quality offerings also allow the network operator to price the services differently from each other. This way the operator can make more money off its value added services, thereby increasing revenue and margins. Network philosophyIt may sound very obvious (and it is!), but good network design is the key to providing good service. In a network that has a redundant architecture and sufficient capacity all the services/applications will be of high quality. A lot of services are billed on actual usage, and, therefore, more transported packets means more revenue for the operator. The network should not have any bottlenecks in normal condition; in other words it is over-provisioned. In more detail this means that with the use of Traffic Engineering the network can handle all traffic even if the most critical link fails. The key to QoS is to maximise the amount of time the network runs in this optimal condition. Factors contributing to this include redundancy (dual equipment), stable routing design, regular configuration audits and mechanisms to deal with Denial of Service (DoS) attacks. In order to provide QoS, it is better to spend resources and effort on these factors, than on the various mechanisms described later that deal with failure scenarios. The deployment of too complex a future set, with too many components, will make the network unreliable and unstable. Unfortunately, even well designed networks have problems: links fail, routers crash and configuration mistakes happen. In such situations the performance of the network will not be optimal, and when this happens it is necessary to start differentiating between the various services and applications. Once some services are considered as being more important in some sense than others, it is necessary to decide how many 'levels' or 'classes' of service are needed. The following two criteria can be used to validate the number of classes:
An example of classes based on the requirements described above is:
It is obvious in this case that each class has a clear application, but the answer to the second question as to whether end users can distinguish between classes really depends on the network's architecture and implementation. The next few paragraphs will describe how to distinguish these classes, by using MPLS and DiffServ. MPLSMulti Protocol Label Switching (MPLS) is a label swapping technology for forwarding data packets in a network. A short, fixed-length label is attached to every packet entering the network, and that label will then define the path a packet follows through the network. This path is predefined, and can be set up to match certain constraints, such as a minimum amount of bandwidth, or maximum delay. In 'traditional' IP networks every router makes an independent decision on the forwarding of a packet (normally based on the shortest path); with MPLS the first router in the network decides on the entire path through the network. The deployment of MPLS enables an operator to use traffic engineering and fast reroute on its backbone, and also to use MPLS to build VPNs. The first major application of MPLS was traffic engineering. In traditional IP networks packets follow the shortest path from the source to their destination. This can cause congestion at certain places in the network because of an uneven traffic distribution. With MPLS traffic engineering, predefined paths are set up between the routers in the network and each path has a requirement on how much bandwidth it needs. The network will now configure itself, using a constraint-based routing protocol, to match all these requirements. Traffic might not always follow the shortest path anymore, but it will have a path where the bandwidth it needs is available. Even on an over-provisioned network traffic engineering can be useful. Although in theory sufficient bandwidth should be available, hotspots can always occur in the network, created by such factors as major outages, explosive traffic growth (in combination with slow capacity deployment) or uneven traffic distributions. With the use of traffic engineering, traffic can be rerouted around these hotspots. From every core router a Label Switched Path (LSP), an MPLS 'tunnel', is set up to every other core router in the network. The required bandwidth constraint is now configured on each LSP (based on statistics collected) and the network will route those LSPs over paths where the required bandwidth is available. The bandwidth values need to be adjusted daily or weekly to keep the network optimised. Another, more recent, application of MPLS is Fast Reroute (FRR). When links or nodes fail, it takes time for the network to distribute the changed topology information to all the other nodes, which then re-converge to route around the failure. This typically takes in the order of several seconds in a best case scenario. Certain applications however (eg, voice or trading) require a faster convergence than several seconds. FRR is the tool to achieve this. Fig. 1 shows the working of FRR. For each protected link in the network, the routers on each side of that link pre-compute and set up a back-up LSP to the router on the other side of the link. This LSP will be set up over a redundant path. If the link fails, the router will detect the failure and can immediately switch traffic to the local backup LSP. In the meantime it will signal back to the head-end router to compute a new end-to-end LSP, and traffic will flow over the local backup LSP until this new LSP is ready. It is important to note that the backup LSPs do not make bandwidth reservations like the primary LSPs do. This means that during the FRR period (a few seconds) the network might experience some congestion. DiffServ will be used to deal with that. FRR is able to restore connectivity in milliseconds when a failure occurs (in practice <100ms). The key to this is that the back-up LSP has already been established and that only the router detecting the failure needs to take action; it does not require signalling between routers as is needed in networks without FRR. This brings SDH/SONET-like restoration times to IP networks. Although FRR is the only mechanism available to achieve this kind of restoration times, work is going on to improve the convergence of non-MPLS networks as well. By tuning the routing protocols (like IS-IS and OSPF) it appears possible to achieve convergence times under or around one second. The deployment of MPLS is not trivial. It requires a 'seamless' network, built as a single Autonomous System (AS) and with a single internal routing protocol (IGP) without multiple areas or levels. This is often not the case in existing networks, and can slowdown the deployment of MPLS. Although interoperability between different router vendors is not normally a problem, having more than one vendor might slow down the deployment of new MPLS features (eg, FRR) because both of them need to fully support the standard. Another important issue is the education and training of the operations staff. MPLS means a lot of new complexity in the network, and the network operators will need to understand the technology. Differentiated servicesAlthough MPLS is deployed to prevent congestion, congestion could still happen because of major failures or during an FRR. In these periods it is essential to make sure that the critical services receive the quality they need, for example the delay and jitter guarantees for VoIP traffic. Queuing and scheduling mechanisms in the routers are used to achieve this objective. Normally all traffic shares the same queue on an outgoing interface, and when more traffic is sent than the interface can service, the queue will fill up and eventually drop. This will cause delay, jitter and packet loss. Assuming the example of three classes (best efforts, assured and real-time), three parallel queues can be set up, each of which is serviced according to pre-defined priorities. Two basic queuing/scheduling mechanisms are available: Weighted Round Robin (WRR) and Strict Priority. The WRR mechanism will allocate a certain amount of bandwidth to each queue, to guarantee that each service will get its fair share (as defined) of the link bandwidth. The Strict Priority queue however will be serviced whenever there is a packet in this queue, meaning that traffic in this class has absolute priority over any other traffic. This can be used for the Real-time service, as it guarantees low delay and jitter. Although the theory is not very complex, the configuration of the queue parameters proves to be very hard. The question is how to translate the service/application requirements to actual queue configurations. Experimentation is needed to find the right values and fine-tune the configuration, bearing in mind that this only ever comes into use when there is congestion. In the classes defined earlier, the Strict Priority queue was used for real-time traffic, and WRR for assured and best efforts. If it is expected that the remaining traffic after serving the Strict Priority queue will be 20% of the assured class and 80% best efforts, the WRR queue is configured with 90% of the bandwidth for assured traffic and 10% for best efforts. In this set-up the assured service will be able to use up to 90% of the available bandwidth (excluding real-time traffic) to make sure it will not drop any packets (as only 20% of the traffic is expected to be assured, based on statistics collected earlier). This 90/10 split is a rule of thumb based on experience; other configurations are possible as well. When deploying DiffServ it is also very important to set up the monitoring and collection of statistics to deal with the classes defined on the network. ConclusionMPLS is used for Traffic Engineering and for Fast Reroute, and DiffServ can deal with congestion. The issue now is to determine whether end users can distinguish between classes? For the real-time service FRR and Strict Priority queuing is used to meet the availability, delay and jitter guarantees; for assured traffic most of the available bandwidth left is allocated so that its availability requirement (packet loss) is met. The rest of the bandwidth is available for best efforts traffic. MPLS traffic engineering will optimise the network and increase the quality of all services. The operational aspects of MPLS and DiffServ deployment should not be overlooked. Development, education and training will take a lot of resources, and it might not be worthwhile in every situation. It all depends on what is required of the network, and that will be different for every operator. It is also necessary to bear in mind that too much complexity will create an unreliable and unstable network, so it is preferable to deploy the simplest solution that meets the requirements. กก Thomas Telkamp, Director, Global Data Architecture & Technology, Global Crossing, Hilversum, The Netherlands |
| © International Clearing House Ltd 1997-2002. |