Synthetic Monitoring & End-to-End Testing: Two Sides of the Same Coin

November 26, 2024

Opinion

Authors:

Article shepherded by:

Laura Nolan

Through using monitoring-as-code practices we can build common artefacts that validate documented application behaviour in both production and pre-production environments. Monitoring-as-code also provides a common definition of functionality, which can help to address cultural, prioritisation, and documentation issues.

What is Synthetic Monitoring?

Synthetic Monitoring is the use of automation frameworks to periodically check the availability of an application according to a fixed schedule. Traditionally, this has been implemented by probing production application services through health endpoints, or exercising key API calls and validating they are returning a 200 OK status. When errors are detected, alerts can be raised according to defined tolerance thresholds.

In recent years, Synthetic Monitoring tools have gained the capability to automate user interface actions such as clicks and text entry against applications.

Figure 1: Example of a modern synthetic monitoring tool (Elastic)

There are several types of Synthetic Monitoring tools:

Probing tools that call application service endpoints for protocols such as HTTP, TCP and ICMP on a fixed schedule. The majority of monitoring tools provide some form of heartbeat capability using an agent running in a fixed location (or sometimes in the case of SaaS versions of these tools, from multiple locations).
Recorder-based flows that capture browser activity and recreate that activity on a fixed schedule. Key examples include the browser-based monitors offered by Datadog and Splunk.
Web synthetic monitors that wrap modern automation libraries such as Playwright or Selenium where user workflows are simulated using code. Examples include Elastic Synthetics and Checkly, which use Playwright JS, and New Relic which has adopted Selenium.

About End-to-End Testing

In End-to-End (E2E) Testing, developers and testers use automation frameworks to write tests that exercise the multiple steps users perform to achieve a particular goal in the application. This allows developers or dedicated testers to validate user interactions such as clicks and test entry will yield the expected results from the entire system. Modern examples include Playwright, Cypress, and Nightwatch, with the venerable Selenium also still widely used by testing teams.

Figure 2: Example of a modern E2E testing tool (Cypress)

These tests are typically run as part of pre-production validation in Continuous Integration (CI) pipelines.

The key overlap between Synthetic Monitoring and E2E Testing is that they both automate the user workflow, albeit at different points in the software lifecycle. Specifically:

E2E testing catches potential defects and incorrect behavioural assumptions in local development and merging stages via execution in CI pipelines. It can also be an indication of early performance issues if, for example, the duration of the suite or individual tests starts to increase without a known cause.
Synthetic Monitoring is used to drive alerts to system operators of potential system unavailability, or, in some cases, incorrectness.

Challenges

In 2021, I struggled to understand why I, as a software engineer, was using Cypress for E2E tests on user features under development, while Site Reliability Engineer (SRE) colleagues were building similar workflows for synthetic monitors using Selenium. Since both of these tools are automation frameworks that can be used for monitoring and E2E testing, using two different tools doesn’t always make sense. In my case, testing of key user operations such as searching, filtering and selection cases were covered by both the software engineering team’s E2E test suite and the SRE team’s Synthetic Monitors. The differences in this case were the choice of tooling, the environment in which the tests were run, and the frequency of execution. For the SREs, Selenium fit with the tool they were using for monitoring production.

There may not be 100% overlap in these tests used by developers and SREs. The suite of tests run as part of a CI pipeline to validate the application is generally much more comprehensive than a set of monitors. Despite that, it’s still possible to use a common toolchain. This allows for the sharing of application behaviour, testing scenarios, and monitoring between developers and SREs. Effectively, using monitoring-as-code shifts these monitors left, giving developers more responsibility for monitoring and availability. This is especially important in teams with developers on-call or, you-build-it-you-run-it practices. In subsequent sections we shall examine the main reasons why teams may use different tools for these related tasks, and also how using a common tool for both Synthetic Monitoring and E2E testing establishes benefits to these challenges.

Lack of Common Monitoring and Testing Workflows

Quite often it is assumed that using the monitor definitions as tests is not possible because the E2E suite may need to be more comprehensive to validate application functionality. Just because the workflows are different doesn't mean a common automation tool isn’t beneficial.

There are some technical challenges to be overcome. Production data can be changed by tests, tests can contaminate analytics data, and side-effects in third party services (such as payments) can be triggered. These actions can be managed by the use of dedicated test accounts, by data cleanup tasks, and by test capabilities in third party services such as test cards. In the event that these mitigations are not possible, then E2E tests may not be suitable as monitors. E2E tests run in CI or pre-production environments can use mock or fake services to avoid many of these issues, which production systems cannot.

Some monitoring tools do support conditional disabling, allowing you to configure these specs as a local E2E test, or production monitor. This allows you to embrace monitoring-as-code practices and use production monitors as E2E tests with the same specification, while disabling execution of probing flows in production where needed.

Execution Time & Storage

Execution time of the test suite is also a challenge in both monitoring and testing scenarios. E2E test and monitoring suites do take considerable time to maintain and run, meaning that large suites can elongate the execution time of CI pipeline testing stages. Spikes in duration are a useful anomaly to be investigated, as shown in the Figure 3. But increased execution times can mean increased costs to run, and can be made worse if running in multiple locales.

Figure 3: Example monitor duration trend over time with 12h comparison (Elastic)

With the increasing focus on cost of software operations, minimising both the execution time of CI pipelines and monitor schedules running user workflows is an important consideration, as is minimising the number of failed runs due to configuration issues. By using monitoring-as-code, not only can we validate the monitors in a pre-production scenario to test them against the software, but it can help us identify key workflows of interest to potentially reduce the number of workflows in the CI pipeline too.

Enterprise Barriers

Existing organisational structures in large enterprises foster disparate automation tools across development and operational spaces by making communication between teams more difficult. Conway's law makes it difficult to change organisational structures, as application ecosystems are tied to old structures. As a result, differing assumptions about application behaviour, user workflow, and performance are baked into the executed user workflows and need to be resolved between the developer and SRE-generated monitoring and test specifications.

You could argue that moving to a you-build-it-you-run-it model, which is becoming more common, solves the issue. However, in some regulated areas that isn't possible due to segregation of development and operational roles. As a developer in banking, I wasn’t allowed to have persistent production access, only a carefully-monitored break glass option for emergency access. This made a dedicated operations team absolutely essential.

SREs have a combination of engineering and production management skills that tend to make them good at writing monitoring-as-code. Specifically, SREs can encourage the rest of the organisation to converge on a common tool, potentially reducing cost and complexity for the entire organisation. However, a pitfall is that sharing a common tool does make it less clear who is responsible for maintaining the tool itself, along with any shared libraries and monitors. Collaboration and clarity around ownership are important.

Using the same automation tool for monitoring and testing also reduces the learning curve associated with teams making it easy to transfer not only knowledge of the tool, but knowledge of how applications function through the use of a common artefact. In this case the artefact is the monitor written using our automation tool of choice.

Role Explosion

The rise of job titles such as SRE, DevOps Engineer, not to mention the emerging AI roles, make establishing a common understanding of user workflows difficult. While this allows for deep expertise in engineering, reliability and security practices, it also means we need to find common tools and artefacts that can help build a common understanding of how the product works.

Monitors-as-code use a common framework that can be understood by developers, testers and SREs, and document expectations on not just how a feature functions, but also availability expectations as the schedule metadata is included alongside the artefact.

Poor Documentation

Documentation is still an afterthought in software development. While comments, READMEs and practices such as Behaviour Driven Development (BDD) and Architecture-as-Code can give insight into the intended functionality, they can quickly become outdated.

Monitors built using the same automation tool can be used to document the intended user workflow, and the steps they are expected to take to navigate through the application. This can help support engineers understand what the steps are, and SREs can comment on availability expectations and error handling scenarios using a common artefact. Monitors are less likely to become outdated than documentation, as they are run frequently, either as part of CI, as part of production monitoring, or both.

Conclusion

In recent years, tooling for E2E Testing and Synthetic Monitoring have converged on a common core of functionality. If SREs, developers, testers, and other groups who build and run monitors and E2E tests share the same tools, then this can support better communication and collaboration between these groups, as well as reducing cost and complexity.

Article Categories:

SRE

Programming

Last updated November 26, 2024

Authors:

Carly is a Principal Developer Advocate and Manager at Elastic, based in London, UK. Before joining Elastic in 2022, she spent over 10 years working as a technologist at a large investment bank, specialising in front-end web development and agility. She is a UI developer, who occasionally dabbles in writing backend services, a speaker and a regular blogger.

She enjoys cooking, photography, drinking tea, and chasing after her young son in her spare time