Question:
Telecommunication network providers and users are concerned about the single point of failure in the “last mile”, which is the single cable from the network provider’s switching station to the customer’s premises. How can a customer protect against that single point of failure? Provide an analysis on whether this presents a good cost-benefit trade-off.
Response:
The obvious answer here is to have redundant providers, but redundant links alone do not provide redundancy. To truly be redundant the solution must incorporate transparent failover. This is no different that a blown electrical circuit in your home. If the freezer is connected to a circuit which blows the idea that an adjacent outlet is available to power the freezer is meaningless if your sleeping or on vacation. For a system to have no single point of failure, redundant infrastructure (the easy part) must exist, but these systems also need to be self-healing. This concept has prompted the emergence of a field called site reliability engineering; this field focuses on the self-healing aspects of information systems at scale. Consumers or SMBs looking to protect themselves from “last mile” failures via infrastructure redundancy might use a “dial-up” connection but probably not because who still has a POTS line? The more likely option is a router which will handle both and wireline broadband and wireless broadband connections. Devices like the Failsafe Gigabit N Router for Mobile Broadband from Cradlepoint provide a cost effective way to transparent circuit failover. Because most ingress and egress traffic is NAT’d on a consumer grade networks (e.g. – your home network) a move from one provider to another can be performed quickly and nondisruptively. NAT’d traffic moves between your LAN and the Internet using a single IP public address (typically a DHCP address assigned by your provider), this makes it reasonable to use this approach for redundancy.
My home network is fairly complex (some pics from my home lab) with two circuits and multiple site-to-site VPNs to cloud providers. Both my wireline circuits as well as my broadband circuit are Verizon circuits with one wireline circuit being a business grade and one being a consumer grade, I leverage wireless broadband as my tertiary Internet connection (used broadband for two weeks following Hurricane Sandy). The business circuit differs in speed from my consumer circuit, and the business circuit provides me with public facing IP space and the ability to use my own router vs. the Verison FiOS provided router, these are key differentiators between consumer circuits and business circuits. I use pfSense as my router and firewall or choice, pfSense manages all my routing and circuit failover. Because this is my home lab I do not use something like BGP to manage external traffic and allow for transparent failover, what I do is monitor my home lab circuits using a witness process which runs a check against two IP addresses. For simplicity, IP address 1 is the advertised static public facing IP address on my Verizon Business circuit, and IP 2 is the NAT’d port forwarded address on my consumer grade FiOS circuit. IP 1 maps to host.domainname and IP2 resolves to host.dyndns, where dyndns is Namecheap’s dynamic DNS service. When all is well the host is directly accessible via IP 1, if something goes wrong, the host will become available using IP 2. Obviously, the use of BGP and an AS number to facilitate failover for my home lab would be a bit costly, so the witness process watches for service availability on IP 1 or IP 2 and updates the DNS A record of the service with my domain registrar if the service becomes reachable on an alternate path. My DNS provider is Namecheap, so the witness server test the process for accessibility and then uses PyNamecheap to update the A record programmatically. With a short TTL, the DNS records propagate, and public services are again available, albeit not no all services will failover, but web services are available with a little help from NGINX and reverse proxying.
The above is not very expensive from a pure infrastructure perspective. The consulting may be a bit costly if you are not capable of configuring it yourself but the cost to build in redundancy is getting lower and lower. Cloud providers like AWS with services Route 53, S3 and Lambda make it very cost effective to leverage all of their site reliability engineering to build disaster tolerant systems cost effectively without every worrying about the physical infrastructure. Is the time, energy and money worth it and is there an ROI depends on what you are looking to accomplish and the value of the services you are providing. I require public IP address space, not offered on a Verizon FiOS consumer grade circuits; I need a consumer grade Verison FiOS line for TV, the internet, and telephone service. For these reasons, it made sense for me to leverage the consumer grade line as a backup to provide access to critical systems and services in the event of something like a physical fiber cut, which has happened with the landscaper putting a shovel through the fiber (there are two fiber runs from the street to my house).
References
Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.