We’ve all been there. A video call starts, everything looks fine as people come in and start speaking about covid restrictions and the situation where they live, and then Matt says “hi”. Everyone stops talking as Matt’s voice sounds really strange, like a robot. Alice obviously asks if she’s the only one hearing Matt like a robot. Everyone follows Alice and confirms: Matt does really sound like a robot speaking.
Quickly, everyone starts blaming the video conferencing platform, but is it?
If this was your platform, how would you defend yourself?
How does robotic voice manifest?
When a participant complains of hearing a Robotic voice, it is generally caused by a high packet loss rate or packets discarded due to late or early arrival (Jitter) . Typically packet loss of +/- 30% or higher can cause this.
When packets have been dropped or lost, the Packet Loss Concealment (PLC) is triggered. The PLC algorithm tries to fill-in the missing audio signal in an attempt to maintain audio clarity from a particular participant. This also an area where ML/AI algorithms are replacing traditional heuristics-or statistics-based algorithm. The traditional PLC is quite effective when there is only a small number of consecutive lost packets (+/- 5%) and under those circumstances, most people never notice because the PLC is doing a good job filling in the missing gaps. When the lost packets exceeds the estimator, the results of PLC can produce that all too familiar robotic-sounding speech.
How to use callstats to debug the issue?
To identify these issues in callstats.io, there are a few key metrics that you can use. Let’s take a look through an example call where several participants complained that the speech of another participant on the call sounded like a robot.
To start, you need to find the specific call you want to investigate, you can filter on either the conference ID or the user ID in addition to narrowing down the date/time of the search. In my example, I’ve used my user ID and then selected the conference I want to view from the resulting conference list. Note, this is a test conference, I deliberately set up a mobile endpoint to exhibit approx. 40% packet loss.
Fig1. Filter on Date/Time and User ID.
Fig2. Selecting the conference to view
Because we know that a robotic speech issue is caused by packet loss, a good place to start is to view the MST Charts. MST = Media Stream Tracks
The MST charts display information on every outgoing media stream for each participant on the call; you can easily look at the total fractional loss and/or the 95-percentile loss. Note, for definitions of these and other metrics please refer to the callstats.io Help Center.
If you see packet loss across the board then it could be a service side issue, anytime you see asymmetric packet loss for one or more participants, you can be fairly certain that it’s localised to those participant/s and/or their local environment. The most common cause of localised packet loss is the participant’s network, specifically WiFi. That said, the root cause of Packet loss can of course be many different things including; insufficient or congested bandwidth, poor QoS on shared data connections, poor or overloaded network equipment (router, WiFi access point), WiFi interference, overloaded PC/laptop (not enough CPU/RAM available), faulty network equipment (including poor cabling), problems with the ISP and others.
In the screenshot below there are two tracks showing quite a bit of packet loss. Both the audio and video tracks are for the same participant, showing that there is most likely a localised issue for that user.
Fig3. Packet loss can be seen for participant ‘email@example.com’
Digging deeper, we can select one of the participants who reported the problem (09ce9d11). We can now select one of the media tracks for the problematic user, identified above, scrolling down to view the charts we can see a sudden spike and high amount of jitter and packet loss that lasts for approx. 30-40 seconds.
Fig4. Select the reporting participant and media track you want to view
Fig5. Period of Packet loss and jitter being shown on Video track 924533673
Fig6. The corresponding period of Packet loss and jitter being shown on Audio track 3848234684
With the same reporting participant selected, you can view the other media tracks, such as the audio track shown in Fig7, which does not display any signs of packet loss. This confirms that the packet loss issue is localised to one participant.
Fig7. Zero packet loss is shown for Audio track 1729071078
In addition to those we used above, there are many other metrics at your disposal when looking for and trying to diagnose audio quality issues, here a few of note:
- Conference graphs>quality>objective quality
- Conference graphs>Media Stream tracks>Delay>Latency
- Conference graphs>Media Stream tracks>Jitter
- Conference graphs>Media Stream tracks>audio>concealed events
- Conference graphs>Media Stream tracks>audio>concealed samples
- Peer connection > Delay > RTT
- Peer connection > Delay > Jitter
The metrics above are all good for investigating specific call/s but you can also use callstats.io to look at the bigger picture. By defining some search criteria you can view the overall Service Metrics for a given search result, more than 20 metrics, which includes: connection type, eMOS, Avg. Jitter & Loss, Avg. RTT and if any errors were detected (signalling, Media source, ICE, etc.)
In addition to Robotic-sounding speech, high packet loss can also cause choppy-sounding speech in circumstances where PLC is not being used. Choppy-sounding speech however can also occur if the audio level is very low and Voice Activity Detection (VAD) is being used which can sometimes miss parts of the original voice signal causing a similar effect.
Finding high packet loss in callstats.io enables you to identify if that is the cause of the quality issue and where those issues are occurring. Armed with that information you can determine if you need to work on optimising your service and infrastructure or if your customers/users need to look closer at their home/work environments to resolve the issue being experienced.
This feature is part of 8x8's call quality initiative
This blog post was written by Matthew Rogers and Filipe Leitão, EMEA Solutions Engineer - CPaaS.