VOOZH about

URL: https://thenewstack.io/cloudflares-network-shutdown-shows-why-dns-is-a-devops-problem/

⇱ Cloudflare's Network Shutdown Shows Why DNS Is a DevOps Problem - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-07-27 09:37:10
Cloudflare's Network Shutdown Shows Why DNS Is a DevOps Problem
news,
DevOps / Networking

Cloudflare’s Network Shutdown Shows Why DNS Is a DevOps Problem

Cloudflare’s widespread outage a few weeks ago underscored both the inherent fragility in DNS connections and the importance of building in redundancies to help ensure that such a worst-case scenario does not ever happen.
Jul 27th, 2020 9:37am by B. Cameron Gain
👁 Featued image for: Cloudflare’s Network Shutdown Shows Why DNS Is a DevOps Problem
Feature image via Pixabay.

Cloudflare’s widespread outage a few weeks ago underscored both the inherent fragility in DNS connections and the importance of building in redundancies to help ensure that such a worst-case scenario does not ever happen.

While seemingly a network operations dilemma, putting the solutions in place to mitigate such a disaster from occurring is very much a DevOps problem, involving the developer and security teams as much as NetOps, by building redundancy and other checks at the very beginning of the software production cycle.

“It’s really just about redundancy at every level,” Jonathan Sullivan, NS1 chief technology officer and co-founder, told The New Stack.

While Cloudflare — an NS1 competitor — did have DNS redundancy built into its infrastructure, the resulting traffic drop in its network infrastructure was about 50% throughout its network and resulted in a 27-minutes outage of Cloudflare Internet properties and services, Cloudflare Chief Technology Officer John Graham-Cumming wrote in a blog post.

A router overload in the state of Georgia resulted in the Cloudflare outage. One way Cloudflare learned to prevent such an event from recurring was to set a limit on the Georgia router’s traffic for BGP sessions. This will result in the shutdown of the router thus diverting traffic away from that part of the backbone, attracting traffic from across the backbone, Graham-Cumming wrote. The fix also involved ensuring that a single router location will not attract such a magnitude of traffic that an overload will occur.

👁 Image

Photo: Cloudflare

However, no network provider’s infrastructure is 100% foolproof. “If you’re interested in how to avert such a [Cloudflare-like disaster] if you’re a customer of Cloudflare, the only way around that is to have two DNS providers or two vendors,” Sullivan said.

An organization might, for example, rely on two DNS vendors, and then use middleware such as Terraform across the network fabric to take advantage of common features the two providers offer. A DNS record, for example, might point to an application stored in three different data centers or cloud environments. The record is then written to NS1’s API and to that of another vendor for added redundancy, Sullivan said.

However, in this case, “you’re a little hamstrung because you can only take advantage of features that exist in both places,” Sullivan said. “But if uptime is really the most important thing, we tend to see people” selecting two vendors for redundancy for their critical domains and then “leveraging us for other specific types of workloads.”

Another option Sullivan said is unique to NS1 is that customers can “leverage all of all of the DevOps trends,” such as adopting Terraform, for example, and “have the ability to go to a cloud provider and push a button and turn on new infrastructure,” Sullivan said. “With a vendor like us, you can have physically logically separate DNS delivery networks, and you get your redundancy and isolation.”

If NS1 were to “get hit with some massive DDoS attack,” for example, “you still have this other network, totally independent of us, that’s got all of your configs and all of the advanced traffic management capabilities and all the bells and whistles,” Sullivan said.

Damage Control

👁 Image

The collateral damage organizations seek to protect themselves from when network disruptions occur varies, of course. Some might just lose developer productivity when they can longer upload code remotely to a Git repository, while others might stand to lose millions of dollars in just a few hours. However, in either case, all DevOps teams, including development, security and operations and NetOps must take DNS and connectivity into account at the very beginning stages of when an application is built.

“Preventing outages is typically an afterthought because you’re so busy running and building your business and it’s ‘if it ain’t broke, don’t fix it, But from time to time again, DNS and network connections are sort of a house of cards and sometimes it’s incredible that it works at all after decades of existence,” Sullivan said. “Redundancy must be something you think about while you’re building the application — because a pull-down afterward can be really painful, and in some cases, it’s impossible without a full application rewrite.”

TRENDING STORIES
BC Gain is founder and principal analyst for ReveCom Media. His obsession with computers began when he hacked a Space Invaders console to play all day for 25 cents at the local video arcade in the early 1980s. He then...
Read more from B. Cameron Gain
SHARE THIS STORY
TRENDING STORIES
NS1 is a sponsor of The New Stack.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.