![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Monitoring microservices effectively still can be a challenge, as many of the traditional performance monitoring techniques are ill-suited for providing the required granularity of system performance. Now a former Google and Weave engineer has developed an approach, called the RED Method, that seems to be gaining favor with administrators.
RED “encourages you to come to some sort of consistency of monitoring,” explained Tom Wilkie, the originator of RED, and a founder of the new microservices monitoring company Kausal. Wilkie spoke at the InfluxData‘s Influx Days user event held Tuesday in New York.
The most immediate benefit to instrumenting microservices along the channels described by RED gives engineers who may not be familiar with a badly-performing microservice a standard set of tools to diagnose and correct an issue. RED offers a “consistency across services [that] really helps reduce the cognitive load of your on-call people. It helps them be on call for more services, for services they didn’t write.”
Wilkie used this approach when he was an SRE engineer supporting Google Analytics.
“I didn’t write any of the Google Analytics services, but I was still able to be on call for them because for me, they were just black boxes. When something went wrong, I just had to traverse my little graph, figure out which one was throwing the errors, and then go and look at the logs, file a bug with developers, restart it, whatever,” he said.
RED came about because Wilkie was frustrated with the popular USE methodology of performance measurement. Created by Brendan Gregg, USE buckets system performance metrics around these groups:
System resources being measured can be CPUs, memory, I/O channels, and the like.
“The nice thing about this kind of pattern is that it turns the guesswork of figuring out why things are slow into a much more of a methodological approach,” Wilkie said. With the USE method, Kausal created a set of Grafana dashboards for monitoring Kubernetes infrastructure, using Prometheus as a backend.
The USE approach, however, has its limitations, Wilkie noted. For instance, it is difficult to measure the saturation of memory, or the amount of memory used. Also, error counts can be problematic, especially I/O errors and memory bandwidth. “Linux, it turns out, is really bad at exposing error counts,” Wilkie said. Also, USE is more infrastructure-focused, and RED is more focused on the end-user satisfaction.
As an alternative, Wilkie developed another easy-to-remember acronym, RED, when he was working at Weave. RED is based around requests, characterizing microservice performance thusly:
“The thing I like about RED is that it is microservice-focused, as opposed to USE method which is more about the infrastructure,” said Paul Dix, founder and CEO of Influx Data. Influx invited Wilkie to speak at the event, given RED was a popular topic of conversation at such microservices friendly conferences last year as Monitorama and Kubecon.
Wilkie said that RED is actually derived from another, little-known, set of performance metrics that he learned as a site reliability engineer at Google, called The Four Golden Signals:
Like with USE, Wilke implemented the RED method as a client library for Prometheus. The open source InfluxDB time-series database, for instance, supports the Prometheus monitoring tool‘s read-and-write API. Prometheus can be used as a data collector, piping results into the database, and it can query data out of InfluxDB as well.