Tuesday, 12 April 2011

Managing remote service dependancies with IP SLA

A simple yet effective way to manage remote service dependancies is with IP SLA.

Imagine you have two routers connected to one ISP, behind those routers you have a shared ethernet LAN in which you have a firewall. The ISP offers you eBGP services and you receive a full routing table from them into each of your CPE routers. Your CPE facing LAN is configured over ethernet and you are running VRRP between your two routers and the firewall in this LAN (212.1.30.0/24). Devices in the CPE LAN all point to the VRRP virtual IP address for service to the Internet.

Screen shot 2011-04-13 at 20.59.09

So, everything is great, VRRP is up and running as master on RouterA. Now you’ve been extra sensible and setup VRRP to ‘track’ an object on each router to preference the VRRP master device. In our case we are tracking object 1, the object is looking for route 1.2.3.4/32 in the routing table.

Screen shot 2011-04-13 at 21.08.53

If the route exists then our priority s the default VRRP priority of 100 however if the route is lost we decrement by 10 and RouterB (configured with a priority of 95 and to preempt) takes over.

Screen shot 2011-04-13 at 21.08.22

You figure if the eBGP neighbor goes down you’ll lose the routing table so you be able to get out into the internet so you should maybe fail over the VRRP address to the other node (hopefully he’s got a full routing table).

Now, what happens if there is a problem on the ISP network so you are receiving a full routing table into both of the routers and yet behind one of the PE routers there is no upstream service and your data is being blackholed there. This is bad. Your VRRP service does NOT fail over because it was tracking a local route and the provider eBGP neighbor is still sending you a full table. Without manual intervention you are out of service.

So this is where ‘ip sla’ takes over. Cisco ip sla stands for IP Service Level Agreements and is used to provide telemetry information about service availability and capability for a number of network streams such as FTP, HTTP, Voice Jitter.

First lets setup a simple SLA object. For this example we’ll do an HTTP GET request against the http://www.defaultroute.co.uk website but htere are a number of different types of SLA (Jitter, ping,FTP etc). The response time from the site will be a deterministic value for GOOD or BAD status. Clearly a DOWN website might be for reasons other than a lost route but it’s just an example.

Screen shot 2011-04-13 at 21.13.44

Lets check to see if it’s running

Screen shot 2011-04-13 at 21.16.35

Looks good. We’re getting a round trip time (RTT) of 5060ms, the DNS lookup is around 52ms. The return code is OK which is what we are looking for. So now we’ll bind this sla monitor to a track object so we can add it to the VRRP tracking.

Screen shot 2011-04-13 at 21.13.35

Lets confirm thats in

Screen shot 2011-04-13 at 21.13.13

So now we’ll add the new track object to theVRRP configuration and we’ll add another 10 to the increment. So now the outcome of losing the route AND connectivity to the web server will reduce the value to 80. However the key here is that losing connectivity to the monitored web server OR the existence of the route to 1.2.3.4/32 will cause the VRRP service to failover.

Screen shot 2011-04-13 at 21.13.24

Job done - experiment and have fun

No comments:

Post a Comment