Chaos Monkey

Test Your Platform With Chaos Engineering

How often does your company test your software? Does it only happen during the development cycle and end when the software has been released? Or do you also test software when something in your pipeline changes? What about your entire system? Do you run regular testing to find out if there are any issues with the whole?

That’s a lot of questions to answer. But the thing is, testing is crucial and it should go well beyond that of basic bug testing during the software development phase. There should be penetration testing and other types of auditing going on throughout your entire business.

But there’s another type of testing that your business should consider – Chaos engineering. This might well be a new term to you and your IT staff, but it’s one you should get to know because it is the future of system testing.

Chaos Monkey Developers Hiring Guide

  • How to choose the best
  • Interview questions
  • Job Description

What is Chaos Engineering?

Chaos Engineering is the practice of experimenting on your production systems to ensure confidence that the systems can withstand unexpected conditions.

Consider this: You’ve developed your software and tested it extensively before deploying it. Everything checked out and it’s running smoothly. So long as all of the constituent parts perform as expected and nothing out of the ordinary happens, your business runs like clockwork.

However, life doesn’t work that way. As much as you want to contain your systems in silos, you can’t do that to prevent anything untoward from happening. Why?

  • 1_soak_BDev_SRP_Numeros
    Human error happens.
  • 1_soak_BDev_SRP_Numeros
    Hackers do what they do.
  • 1_soak_BDev_SRP_Numeros
    Machines break down.
  • 1_soak_BDev_SRP_Numeros
    Network connections fail.
  • 1_soak_BDev_SRP_Numeros
    Bugs present themselves.
  • 1_soak_BDev_SRP_Numeros
    The software doesn't perform as expected.

You won’t be able to test for all of the above using traditional Q&A practices. That’s where Chaos Engineering comes in. The idea is to intentionally break a system to collect information that can aid in improving the system’s resiliency. This approach to software testing and Q&A is ideally suited for distributed systems, where you have multiple pieces of software running across platforms either on your LAN or in a cloud-hosted environment.

With these distributed systems, numerous places can fail and you probably wouldn’t know where they are in the chain until they break. Intentionally breaking them makes it possible for you to be better informed of the inherent weaknesses in your systems. Once you break them, you can fix them.

By engaging in Chaos Engineering, you can stay one step ahead of those who might attempt to break into your systems. The importance of that can’t be overstated. And the larger your systems grow, the more they need to be tested in such a way.

What is a Distributed System?

A distributed system is a type of system whose components are located across different computers on a network but appear as a single coherent system to either the end user or a service. 

The benefits of distributed systems include horizontal scaling, reliability, and high levels of performance.

Enter Chaos Monkey

Chaos Monkey is a piece of software that randomly terminates instances in a distributed production environment to highlight what engineers need to focus on to make those systems as resilient as possible. 

Back in 2010, Netflix decided to move its systems to the cloud, specifically to Amazon Web Services (AWS). With this move came the realization that hosts could go down at any time. To prepare for such an eventuality, Netflix developers created Chaos Monkey, which would randomly reboot hosts. As those random reboots occurred the engineers could better discover weaknesses in the system as a whole. This also had the added bonus of helping the engineers discover if their automated remediation system functioned as expected.

With the help of Chaos Monkey, Netflix had a much better understanding of how their system responded when various components were taken down. This made it possible for the engineers to improve the systems and avoid such occurrences.

It was the creation of Chaos Monkey that brought about the idea of Chaos Engineering. 

Around 2011, Netflix announced Chaos Monkey had evolved to include an additional toolkit, called the Simian Army. The Simian Army is a full suite of failure-inducing tools to add far more capabilities than what Chaos Monkey offered. 

The tools found in Simian Army include Janitor Monkey (seeks out and disposes of unused resources within the cloud), Conformity Monkey (seeks out instances that don’t conform to redefined rules and, if found, terminates them), and Security Monkey (locates potential security vulnerabilities and violations).

Both Janitor Monkey and Conformity Monkey are now part of Spinnaker. It’s important to also note that Chaos Monkey doesn’t support deployments that are managed by anything other than Spinnaker, which is a CI/CD solution. So if you’re not using Spinnaker, you can’t use Chaos Monkey.

Benefits of Using Chaos Monkey

There are a couple of very important benefits to using Chaos Monkey for your Chaos Engineering needs. First and foremost, it will prepare you for random failures, so your engineers aren’t taken by surprise when something goes down, because you’ll have backup systems and response protocols to face multiple contingencies. 

Another benefit is that it will encourage your engineers to build redundancy into your systems. Without redundancy, when a system goes down there won’t be another system to automatically take its place.

Of course, when using Chaos Monkey you won’t be informed how to solve a problem, only that the problem exists. But in the world of distributed systems, uncovering the problems is one of the biggest challenges.

Conclusion

If you have a distributed system that is deployed via Spinnaker, Chaos Monkey is the de-facto standard tool for Chaos Engineering and helps ensure that your systems are as resilient as possible. For any business that meets the requirements of Chaos Monkey, this tool should be considered a must use.

Related Pages

With more than 2,500 software engineers, our team keeps growing with the Top 1% of IT Talent in the industry.

Clients' Experiences

Ready to work with the Top 1% IT Talent of the market and access a world-class Software Development Team?

Scroll to Top

Get in Touch

Jump-start your Business with the
Top 1% of IT Talent.

Need us to sign a non-disclosure agreement first? Please email us at [email protected].

ACCELERATE YOUR DIGITAL TRANSFORMATION

By continuing to use this site, you agree to our cookie policy.