WebRTC Quality of Experience: What is the State of the Art?

By Martin Varela on April 10, 2019
read

Introduction

This is the first of a series of blog posts about Quality of Experience (QoE) in WebRTC. At callstats.io we live and breathe WebRTC, and our aim is to enable our customers to provide the best possible WebRTC experience to their users.

QoE (see this white paper for the rationale behind the definition of QoE, as established by leading researchers in the field) is a fascinating and complex topic, bridging together several domains of expertise ranging from media coding, to networks, to human psychology. In this series of posts we will take a technical deep-dive into QoE for WebRTC services, aiming to help service operators understand the current state of the art, and what is coming in this domain.

Why care about QoE?

This is probably the easiest question about QoE to answer: because it's what best represents the end users' experience with a service. Understanding how users experience the service they are using allows service providers to improve that experience, to optimize service delivery, and to better monetize their offer.

As engineers, we are often very focused on the more technical aspects of service delivery, and sometimes can forget that, ultimately, the end users have the last word. All the hard work we put into improving our service will make no (big) difference if it does not impact the experience of the users in a positive way, and as we will see, it is often the case that seemingly obvious technical improvements or deficits do not clearly translate into similarly obvious impacts in the way users experience a service.

In the end, quality goes hand in hand with a service provider's bottom line. Improving quality helps with customer retention, which is always desirable.  In addition, service delivery can often be optimized so as to maintain user satisfaction, while allowing for more efficient (read, cheaper) use of resources.

QoE 101

Quality of Experience has somehow eluded a proper definition for many years. The current accepted QoE definition in the scientific community is:

The degree of delight or annoyance of the user of an application or service.

Furthermore,

It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user’s personality and current state.

This definition came from the Qualinet COST Action, in the form of a the already mentioned white paper, co-authored by over 70 experts in the field. It is currently the working definition for the ITU-T, as well.

Put shortly, QoE concerns itself with how users experience the services they use. Traditionally, most QoE research has focused on media services (e.g., telephony, video), but more recently, other types of services have also caught the interest of the research community, such as web-based services, cloud services, gaming, etc.

QoE is a very subjective concept, and as such, the only "truth" that can be gathered about it is the actual opinion of users. For a given service, we can collect ground truths by means of a subjective assessment, which basically involves a user panel rating a service under a series of carefully crafted conditions, and typically yields a Mean Opinion Score (MOS) for each condition. The methods to carry out these assessments have been well-studied, and many of them are standardized by the ITU-T (a canonical example is the P.800 recommendation for telephony systems). This is a tedious and expensive process, and most of the research work on QoE revolves around ways to avoid it.

Factors influencing QoE

According to the ITU-T and the Qualinet white paper, factors influencing QoE:

Include the type and characteristics of the application or service, context of use, the user's expectations with respect to the application or service and their fulfillment, the user's cultural background, socio-economic issues, psychological profiles, emotional state of the user, and other factors whose number will likely expand with further research.

In general, we can classify these factors into human factors, context factors, and system factors.

Human factors are inherent to the users of the service. They include the physiological, emotional, cultural and socio-economic aspects of each user. In general, human factors are hard to capture, and their impact on QoE is equally hard to understand (though in some cases, human factors such as language have a better-understood impact on some aspects of QoE, such as listening quality for voice streams).

Context factors are related to the situational aspects of how the user actually uses the service. Context factors can further be classified into physical (e.g., location, mobility), temporal (e.g., frequency of use, time of day, duration of use), economic (e.g., service price), and technical (e.g., type of device used, screen size)

System factors are inherent to the service itself, and those characteristics of it that can have an impact on the quality experienced by the users. Examples of these characteristics in the case of WebRTC can be the network performance, the type of codec used, the video resolution, etc.

System factors contribute the largest overall component to the users' QoE - the perceived quality of a WebRTC service. These factors are amenable to monitoring and they are the focus of callstats.io’s interests. That being said, it should be noted that the other factor classes should not be ignored, as they can sometimes have a large impact on QoE.

Modeling QoE

As discussed above, obtaining ground truths for QoE is a very expensive and time consuming endeavor. Therefore, a plethora of mechanisms for modeling QoE have emerged from the research community, so as to be able to obtain an approximation of QoE without incurring the costs of subjective assessment.

Objective methods for assessing QoE aim to estimate user ratings using algorithmic approaches, which often rely on comparison of the original media to the degraded media (full-reference methods), or of some characteristics thereof (reduced-reference methods). These comparisons most often consider models of the human visual or auditory systems, as well as well-known aspects of perception such as the Weber-Fechner law. A third type of objective methods are referred to as no-reference, and can either be signal-based or parametric.

In the context of monitoring applications, such as callstats.io, which aim to observe, in real-time, the quality of a service, this is the type of model we often seek. This is because it allows us to estimate the perceived quality from measurable parameters, without needing access to the original media. These parameters can be at the network and application levels, as well as other contextual parameters that may be relevant (device type, location, etc.).

The problem with WebRTC QoE

Put very shortly, the main problem with WebRTC QoE is that it is a hard problem to solve. While the quality of real-time media streaming has been the subject of much research over the past 25 years or so, WebRTC brings a slew of new complexities to the table, which have, so far, not been completely solved by the QoE research community.

Among the many things that make WebRTC a complex QoE topic, are the following:

  1. It is a fast-moving target. WebRTC uses the latest technology in terms of codecs (OPUS for audio, VP8, and soon VP9, for video), and these lack, for the time being, suitable parametric models for quality. There are work-arounds for this dearth of QoE models, and we have been working on those, but much work remains to be done in this area.
  2. It often is a multi-party application. WebRTC calls often have tens of users (in diverse regions, with heterogeneous connection quality), and while understanding the audiovisual quality of individual user pairs is not a particularly complex problem, understanding how the quality of each user affects the call's dynamics, and the quality perceived by all other users is a hard problem for which there is no simple answer yet.
  3. It is used in a multitude of different application scenarios. From phone-like applications (such as contact centers), to online lectures, remote meetings, and live streaming, WebRTC services can be used for pretty much anything. This makes estimating QoE complicated because those scenarios can imply very different experiences for the participants, even under otherwise comparable conditions (e.g,. network performance, browsers used, etc.).
  4. It is inherently multi-modal: voice, video, screen sharing, music, etc. can all be present at different times throughout the lifetime of a call, and their relative importance can also vary with time. QoE models for WebRTC should therefore consider these aspects.
  5. It lives in the browser. While this makes WebRTC ideal for end-users (who can simply connect to a session from just about anywhere, on pretty much any type of device), it severely complicates performing accurate measurements, because the entire application lives inside the browser, and is isolated from the network and the host.
  6. There are many more quality-influencing parameters to consider than in most other media applications. Developing QoE models typically requires effort that increases roughly in geometric proportion to the number of quality-influencing factors considered. The sheer number of factors that need to be considered for WebRTC requires new approaches to QoE modeling, as the traditional approaches quickly become intractable.

That being said, it is certainly possible to improve the current state of the art, and at callstats.io we are working towards that goal, and making significant progress.

Quality at callstats.io

Our Objective Quality (OQ) metric provides a single number representing the overall quality of a completed call. Index values such as OQ are nice, since they provide a simple intuition of what the call's quality was like for users.

Recently, we have been receiving requests to provide e.g., Mean Opinion Score (MOS) estimates, E-model ratings, etc. These are also index values that afford us an intuitive grasp of call quality.

While index values are comfortingly simple to understand, they do not paint the full picture. Typically, a call's quality can vary significantly throughout the call, and there are a number of well-established psychology results (e.g., Kahneman's peak-end rule, and other recency-related effects) that can make a single value less useful. Moreover, most (if not all) QoE models that provide MOS estimates for audiovisual media are meant to be used in very short (think 10 to 30s) time scales, and do not work well on their own for longer periods.

In the case of multi-user conferences, simply averaging quality scores drops valuable information. As an example, you could have an acceptable MOS if you were to poll users after a call, while still having many users that were completely dissatisfied with the call quality. Conversely, having a single user with very bad quality can adversely affect a MOS estimate, whereas all other users may be perfectly content with the quality they experienced.

Due to this, over the past few months we have been working on an enhancement to our OQ aggregation methodology. The main objective is always to bring a better understanding of the call- and service-level quality to our customers. These enhancements introduce explicit score distributions and more detailed summary descriptions. They  are rooted in the same approach advocated by the QoE research community to move beyond the MOS.

OQ enhancements will provide our customers better quality estimates both at the call and service levels, and empower them to make better decisions when it comes to improving it. We will soon be updating our dashboard to reflect these changes.

There are many more improvements coming to our OQ metric, including better temporal pooling, and call-level aggregation. This provides a more detailed view of why the quality of a call was at any given level.

Stay tuned for the next post in this series, where we will describe the new callstats.io OQ aggregation, and how it will help our customers better understand and improve the performance of their WebRTC services.