![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Bringing down an entire application is easy. All it takes is the failure of a single service and the entire set of services that make up the application can come crashing down like a house of cards. Just one minor error from a non-critical service can be disastrous to the entire application.
There are, of course, many ways to prevent dependent services from failing. However, adding extra resiliency in non-critical services also adds complexity and cost, and sometimes it is not needed.
Looking at the figure below, what happens if Service D is not critical to the running of Service A? Why should Service A fail simply because Service D has failed? Why should Service D have a high resiliency if highly critical Service A can survive without it?
How do you know when a service dependency link is critical and when it isn’t? Service tiers is one way to help manage this.
A service tier is simply a label associated with a service that indicates how critical a service is to the operation of your business. Service tiers let you distinguish between services that are mission critical, and those that are useful and helpful but not essential.
By comparing service tier levels of dependent services, you can determine which service dependencies are your most sensitive and which are less important.
All services in your system, no matter how big or how small, should be assigned a service tier. The following sections outline a scale to get you started (you can make adjustments to these recommendations as necessary to accommodate your particular business needs).
Tier 1 services are the most critical services in your system. A service is considered Tier 1 if a failure of that service will result in a significant impact to customers or to the company’s bottom line.
The following are some examples of Tier-1 services:
A Tier-1 service failure is a serious concern to your company.
A Tier-2 service is one that is important to your business but less critical than a Tier 1. A failure in a Tier-2 service can cause a degraded customer experience in a noticeable and meaningful way but does not completely prevent your customer from interacting with your system.
Tier-2 services are also services that affect your backend business processes in significant ways, but might not be directly noticeable to your customers. The following are some examples of Tier-2 services:
A failure of a Tier-2 service will have a negative customer impact but does not represent a complete system failure.
A Tier-3 service is one that can have minor, unnoticeable or difficult-to-notice customer impact, or have limited effects on your business and systems.
The following are some examples of Tier-3 services:
Customers may or may not even notice that a Tier-3 service is failing.
A Tier-4 service is a service that, when it fails, causes no significant effect on the customer experience and does not significantly affect the customer’s business or finances.
The following are some examples of Tier-4 services:
Service tiers impact two aspects of your system, required responsiveness to problems and dependency between services.
The service tier level of a service determines how fast or not fast a problem with the service should be addressed. Of course, the higher the significance of a problem, the faster it should be addressed. But, in general, the lower the service tier number, the higher importance the problem likely is and the faster it should be addressed. A low-to-medium severity Tier-1 problem is likely more important and impactful than a high severity Tier-4 problem.
Given the difference in responsiveness that is given to higher importance services (lower service tier numbers), this impacts your dependency map between services and assumptions you can make about your service dependencies.
If a Tier-4 (low priority) service makes a call to a Tier-1 (high priority) service, then it probably is safe for the Tier-4 service to assume that the Tier-1 service will always respond, and if for some reason it does not respond, it would typically be acceptable for the Tier-4 service to simply fail itself. After all, if a Tier-1 service for your application is down, significant efforts will be immediately in place to try and resolve that service problem. The fact that a Tier-4 service is also down will not be of consequence. Think of the case where your web application is down because users cannot log in (a Tier 1 service problem). How concerning will it be that the marketing emails for the day might be delayed a bit (a Tier-4 service problem)?
But the reverse is not true. If a Tier-1 service depends on a Tier 4 service, that Tier-1 service must have developed contingency plans and failover recovery plans for when that Tier-4 service might be down. After all, you don’t want a Tier-1 service to fail simply because a much lower priority Tier-4 service is not functioning. As an example, you do not want your web application to fall down and fail simply because you cannot display the customer’s avatar in the corner of every page. You will want to gracefully recover and simply not display the avatar, but continue having your application work otherwise normally.
Take a look at the figure below. In this figure, we assigned service tiers to each service. Given the rules described above, note that we need additional resiliency added between Service A and Service D, because Service A is a higher priority service (Tier 1) than is Service D (Tier 3). Therefore, Service A needs to protect itself from Service D failures, given Service D is lower priority.
Now look at Service B. Service B also depends on Service D, but in this case, according to our rules above, Service B does not need the additional resiliency between it and Service D. This is because Service B is a lower priority service (Tier 4) than Service D (Tier 3). So, it’s more acceptable for Service B to suffer an outage at a time when Service D is unavailable. Service D, in this example, is more important.
By careful analysis of your services and proper tier assignments, you can determine where to focus your development, testing, and resiliency efforts for inter-service dependencies, prioritizing the most critical and most vulnerable interfaces first, without over-investing in less-critical interfaces.
As described above, service tiers provide a “labeling” system that gives you information on the importance of every service in your system. You can use that label to determine problem escalation policies, procedures and prioritizations.
But you can also use that label to determine the amount and type of back off and recovery necessary if one service cannot make a call to a dependent service. What you do and how you respond depends on if you are calling a higher- or lower-tier service.