![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
There’s a delicate balance between isolating workloads based on security requirements while still optimizing for compute and resource efficiency.
Machine isolation is a likely solution, but has had its limitations. Google Senior Staff Reliability Engineer Michal Czapiński and Google Site Reliability Engineering Manager Rainer Wolafka are investigating the way to overcome “the limitations of machine isolation.” In a report to Usenix, they present a new isolation method that they call “Workload Security Rings.”
Workload Security Rings (WSR) classifies the workloads by security requirements and then isolates and enforces each class at the machine boundary. This methodology still keeps sensitive and untrusted workloads on separate machines but introduces a new mid-level class between the two. Sensitive data is remains safe from hardware and software exploits such as zero-day and DDoS attacks but with higher resource utilization.
Czapiński and Wolafka came up with their novel approach in the Google production environment, but said “we believe this general technique will be applicable to other contexts such as Kubernetes.”
Czapiński and Wolafka are incredibly confident that Workload Security Rings provide a solution to the tradeoff of balancing compute requirements and security. The additional scheduling constraints that ask workloads of similar security requirements to form rings keep them from being co-scheduled with jobs of different levels of clearance.
In the simplest case, there are three classes of workloads:
The hardened workloads fill in the resource utilization cracks that result from the scheduling constraints caused by the between the sensitive and unhardened jobs. The larger the hardened class is, the more resource fluctuations can be absorbed without the need to swap any machines from trusted to untrusted or vice versa.
As long as the hardened footprint is large enough, more workload classes can be added as is necessary. Each new class needs a new group of dedicated machines so the hardened class should keep up with appropriate sizing to continue absorbing the fluctuations and using resources effectively.
Czapiński and Wolafka are confident that WSR’s security “gives a strong guarantee that we will never co-schedule sense workloads with ones that are untrusted.” Though hardened workloads are potentially at risk, the ban on remote job creation makes it “prohibitively difficult” for an attacker to move across machines to the trusted pool.
This isn’t a plug-and-play system and does require maintenance, the two warn. In attempts to avoid having to migrate machines from one group to another, Czapiński and Wolafka suggest weekly automatic rebalancing to account for a full seven-day cycle.
Here is the one exception to the security guarantee mentioned earlier. There is always a chance of a sudden load spike. This is the one instance that could lead to a temporary lift of scheduling constraints to prevent or mitigate an outage. This increases the risk of lateral movement between rings and “is not a decision to be taken lightly,” the duo writes.