At USENIX NSDI 2018 Keith Winstein and a team from Stanford presented their new research work, Salsify. According to the paper, Salsify reduces video delay by 10.5x from 4,730 ms (in
libwebrtc) to 449 ms (with Salsify). As with any new approach, developers and companies are curious about how it works and how useful it could be for them. The short answer? It looks promising. The long answer? Read on to find out.
What is Salsify, and How Does it Work?
The premise behind Salsify is that improving video codecs has reached a relative standstill, so the next natural step is to transition to optimizing the architecture of video systems. Specifically, modern systems generally separate the codec and transport protocol. The paper claims that there are two different rate controls for each piece. This is true, partly due to the ways the codecs are developed. With the codec rate control, motion compensation and associated algorithms are defined in the appropriate codec consortium. With transport protocols,they are developed by the IETF.
We at callstats.io are in agreement with the Salsify team that the network and codec should not be separated, and that algorithms that affect video quality should work together. In our opinion, the network capability (estimated loss, delay, throughput) and device capability (CPU, battery, user preferences, network selection) should all be taken in to account by the codec to output a video frame.
The main reason that many cross-layer approaches, including Salsify, look to do this is because the codec reacts more quickly to a shift in network conditions and device performance. It is typically implemented by two separate parts integrated together into one algorithm: the transport protocol utilizing packet-by-packet and frame-by-frame congestional control, while the video codec uses frame-by-frame rate control (typically motion compensation).
Most of the current systems control frame rate or bit rate. Instead, Salsify optimizes the compressed length and transmission time for every individual frame depending on the network capacity. Iit issues a frame only when needed. This approach builds on an entirely functional video codec, so Salsify can examine different encodings of each frame at diverse quality levels.
The video codec is tightly integrated with the rest of the application. It is done in a purely functional style to identify alternative encodings for individual frames and adjust according to network capacity. Video frames are not sent at a decided rate, but are instead sent when the network can handle them.
Salsify is an interesting model because there are many parallels to how TCP works. TCP sends data when the network can handle it. In a paper by Henning Schulzrinne, delay friendliness of TCP for RTC, they show that an existing video codec (H.264) and TCP can work without any change if the network latency is about 100ms, i.e., the queuing delay will be relatively small and the TCP mechanics would not totally spew the video. Salsify makes an interesting assertion that the codec can potentially hurl out frames at different quality levels without much delay (we are talking less than milliseconds). This is an exciting and important development that has potential to improve video quality and video delay. We are curious to see the potential of the work and where it could be headed.
What Benefits Does Salsify Bring?
Salsify was tested over an array of real-world and synthetic network traces within a measurement testbed created by the authors. Within their evaluations, Salsify had a single-core version video delay of 449 ms over an emulated AT&T LTE network path. They found the video delay of WebRTC to be 10.5x higher than that of Salsify. They also compared it to FaceTime, which had 2.3x the delay, Hangouts, which had 4.2x the delay, and Skype, which had 1.2x the delay.
Comparing Salsify on a Testbed
The paper measured the the end-to-end video delay for WebRTC across five different traces: Verizon LTE, AT&T LTE, T-Mobile UMTS, Intermittent Link, and Emulated Wi-Fi Link. Check out a video of their side-by-side comparison of Salsify and WebRTC (Chrome 65) on YouTube. Their results are shown below.
|Trace||Salsify-1c (ms)||WebRTC (ms)|
|Emulated Wi-Fi Link||593.9||721.0|
Salsify and WebRTC 95th percentile video delay values from Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol.
While we appreciate the breadth of the tests performed, we are somewhat skeptical of these WebRTC video delay values. Given that we monitor and manage the performance of real-time media communication at scale, specifically for WebRTC, we wanted to compare these values to real-world data that we have collected on WebRTC video delay.
To give a real-world gauge, we referenced our WebRTC Metrics Report, which gives usage metrics from real-world WebRTC deployments. In it, we establish that 86% of all real-time media sessions have a round-trip time of less than 240 ms. This is a drastic departure from the results the Stanford researchers gathered. Because of this, we speculate that the WebRTC video delay values presented for comparison by the Stanford researchers are significantly higher than they would be in a real-world scenario. We plan to reach out to the Salsify team separately to get a better understanding of how their testbed is set up, and how they came to reach these values.
Salsify is an interesting development for real-time Internet video system architecture. It combines the video codec and transport protocol, so it is able to change quickly in case of a difference in network conditions, and prevent queueing delays and packet drops. It has potential implications for improving video delay and quality, and we are curious to see who builds on it and how the industry integrates it into products in the future.
Have you experimented with Salsify? What do you think about the potential of this new architecture? Leave us a note in the comments below.
If you are interested in improving your WebRTC service, register for an account today.