How to Use RTT Metric to Isolate Amazon Connect Performance Issues

By Lennart Schulte on April 18, 2019

When a customer suddenly encountered call quality problems on their Amazon Connect contact center service, the RTT metric in their dashboard gave them the insights they needed to diagnose the problem. The round-trip time (RTT) metric is a valuable indicator of potential performance problems within the Amazon Connect cloud, enabling customers to rapidly react and mitigate trouble.  

Contact Center as a Service solutions like Amazon Connect can help you simplify operations and improve business agility. But when you move services into the cloud you lose visibility and control over your infrastructure. lets you monitor and analyze call quality, helping you detect and isolate CCaaS performance bottlenecks, and improve user experiences. In this blog I’ll explain how a customer used RTT metric to pinpoint and work around an Amazon Connect service issue.

Amazon Connect Integration Monitors Key WebRTC Performance Indicators Monitoring and Analytics for Amazon Connect embeds our advanced monitoring capabilities into Amazon Connect agent endpoints, providing real-time visibility into WebRTC sessions. The solution complements Amazon’s native reporting capabilities, helping contact center managers and support engineers easily identify, isolate and resolve audio quality issues. Monitoring and Analytics for Amazon Connect passively monitors WebRTC sessions, collecting hundreds of data points throughout every call. One of the many useful metrics the product tracks is round-trip time—the time it takes for a packet to travel from a sending endpoint to a receiving endpoint and back. In the specific case of Amazon Connect, monitors RTT from an agent endpoint to the WebRTC session termination point in an Amazon Connect instance and back, as shown below. (RTT does not include the PSTN leg of the call.)

RTT definition

As a general rule of thumb, RTTs of 300 ms or less are not perceptible to the average caller. But RTTs exceeding 300 ms indicate delays that reduce the interactivity of the conversation, leading to user frustration and hang ups, as one customer recently discovered. RTT Metric Pinpoints Performance Problem to Specific AWS Region

This particular customer runs multiple Amazon Connect instances in several different AWS regions, supporting contact center agents across the United States. One day the customer observed a spike in call quality issues; an unusually high number of agents were reporting delayed audio and audible static. Amazon CloudWatch Metrics did not reveal any obvious Amazon Connect service availability or performance issues, so the customer turned to to determine the root cause of the problem.

The dashboard showed a clear spike in RTTs throughout the event, which explains the delayed audio. In particular, the average RTT for all calls exceeded 1500 ms for two hourly periods, as shown below.

Hourly RTT Averages Bar Chart

Next, the customer took a closer look at each call to see which specific agents were affected by the issue. As it turns out, the agents with excessive RTTs were all logged-in to a particular Amazon Connect instance.

A more detailed analysis, examining all calls in 10-minute intervals, shows how the 2+ hour event unfolded, with an increasing number of calls affected by high RTT (shown in red below). All of these calls were associated with the specific availability zone experiencing the performance problem.

Detailed RTT Analysis Bar Chart

Having isolated the problem to a specific Amazon Connect instance, the customer contacted Amazon, who confirmed a performance issue limited to a particular AWS region.

In the short term, the customer instructed agents to log on to a different Amazon Connect instance until the problem was resolved. Ultimately, the company updated its operational procedures to avoid future occurrences. The customer now proactively monitors RTTs so they can detect and contain Amazon Connect performance issues at an early stage—before large numbers of callers are impacted.

This customer story provides a great example of how can help you efficiently detect, isolate and resolve cloud contact center performance and call quality issues. Using the customer was able to pinpoint their audio quality issues to a particular Amazon Connect instance, implement a short-term workaround and institute a policy to mitigate future incidents.

Tags: Real-time Communications, WebRTC, Amazon Connect, Contact Centers