Synthetic Traffic vs. Canary Deployments
- Synthetic traffic is using simulated traffic to detect issues within a system early on, ideally before real users do.
- Canary deployments are a way to de-risk changes, by having the updated version of a system serve only a small subset of requests at first.
- These strategies can be combined with each other and many other release engineering best practices.
During the week, I am a software engineer. One of the lessons I learned early on, especially when leading projects across teams and organizations, is the importance of shared vocabulary. This is amplified when you have many functions involved: Engineering, Product, Legal, Treasury, Accounting, Risk, Compliance, and so on.
Assigning the same meaning to a consistent set of words minimizes and prevents confusion. This is the foundation for a shared mental model of user's needs, the product's current capabilities and constraints, and the desired future state.
As a concrete example of this, I want to disambiguate synthetic traffic and canary deployments.
(If you're not already familiar with how software can be deployed in general, start with an introductory resource like this one.)
Synthetic traffic (aka "synthetic monitoring") is exactly what it sounds like: generate simulated traffic and send it through your system to verify that all is working as expected.
The goal is to find issues in your system (unavailable components, sudden performance regressions, etc.) before your real users do.
How exactly you generate this traffic is case-dependent, but consider:
- Who are your users and how do they use your product?
- What are the key user journeys they'll be undertaking when interacting with your system?
- Where do you have dependencies on external partners and systems?
- What absolutely must-not-fail?
- What can be delayed for some time or partially degraded in its functionality?
Canary deployments (aka "canary testing") are a specific deployment strategy, where we allow a new version of the system to begin serving a small subset of requests.
The goal is to limit the blast radius of buggy changes, so that as few users as possible are impacted.
Here's an example usage: "The change rolled back automatically after healthcheck detectors fired on the canary". This means that:
- A small subset of users were routed to the new version of the system ("the canary").
- Those users encountered issues (performance slow-downs, errors, etc.).
- Instrumentation detected this and automated alarms went off ("healthcheck detectors").
- Those alarms automatically triggered a rollback (reversion), where those users were sent back to the current version of the system.
Notice that synthetic traffic and canary deployments can be combined with each other: a small subset of simulated traffic can be run through the updated version of the system to help identify new issues. Then the next stage of the deployment will begin routing a small subset of real traffic.
This sense of "canary [deployments]" is the standard within the tech industry:
Why did I write this post then? Because I was surprised to encounter usage of the term "canary" to refer to a script that generates synthetic traffic. This seems to originate from AWS but it is inconsistent with their other public docs.
Let's avoid unnecessary confusion. A canary deployment is a new version of the system that is serving a small subset of requests. It may or may not be receiving synthetic traffic that simulates user interactions.