What Does Emil Ivov Anticipate from the Future of RTC?  [Champion Series]

By Allie Mellen on September 26, 2018
read

Our real-time communications champion spotlight highlights respected, expert members of the real-time communications industry. These individuals come from varied backgrounds and have experienced their own intriguing challenges and successes. Each has a unique story to tell.


Our next real-time communications champion is Emil Ivov, an incredibly impactful member of the real-time communications space with his role as founder and project lead of Jitsi and as chief video architect at Atlassian.


Read on to learn more about his thoughts on the future of real-time communications.

Who is Emil Ivov?

Emil Ivov is the chief video architect at Atlassian. He is the definition of a real-time communications expert, with over fifteen years of experience in telecommunications between Jitsi, Atlassian and others. He is the founder and project lead for Jitsi, an open-source project that lets you easily build and deploy video conferencing solutions. He is also an actively engaged member and contributor to the IETF in the real-time applications and infrastructure area. Apart from authoring Jitsi, Emil’s notable contributions to the IETF include: SIP extensions for group chat, RTP extensions for audio-level mixing, and several extensions to improve ICE (trickle and latching).


Takeaways from our Conversation with Emil

  1. The one thing that is certain beyond any doubt is that we will be doing more and more real-time communications.
  2. Making real-time communications have less friction is a difficult problem that is going to be a huge focus in the coming years.
  3. WebRTC is having a tremendous impact but not the way that many had expected. It's primarily because of its technology more so than the browser.

Check out the full conversation below.

emil-ivov-1

Emil Ivov’s Thoughts on Real-time Communication and WebRTC


We met with Emil Ivov to discuss how he sees real-time communications changing from an expert perspective. He gave a lot of interesting insights into where real-time communications has been and where it is headed.


Where do you think real-time communications is headed?

There is a generic answer and a specific answer to this question. As you go into the specifics, things become very uncertain. But the one thing that is certain beyond any doubt is that we will be doing more and more real-time communications.


Let’s take an example: commute time. The entire model is that I live in one place and work in another, and I have to move between the two. Because we all have to be in one place when we work, we make bigger buildings. Office spaces grow three-dimensionally, but the transportation network remains two dimensional. There’s a clear incompatibility there that just becomes worse and worse over time. In order to solve this, you have to make the transportation network three dimensional. You have to either create flying cars, or do what Elon Musk is suggesting and build a ten-story underground subway. Neither of those things will happen in the next thirty years, yet we still have the problem. The only solution to it is to stop moving for work and work remotely using audio-video communication. I’m particularly confident that this is going to be what happens because I’ve seen it, and we have been doing it for ten years. It works.


Now this is where we get into specifics. Obviously, there is a lot to be done to make this technology less annoying for people to use. We can speculate about how exactly, which specific pieces are going to evolve, and in what way. It’s interesting to speculate, but there is no doubt that it is going to grow very aggressively in the next several tens of years.


Clearly, there is an explosion of machine learning. I don’t think it will apply to video in the way that many people are expecting today. Today, people use it to put hats and mustaches on people. That has zero interest. I don’t think image manipulation is going to be *that* big. It doesn’t feel like it’s solving a particularly big problem. It’s a niche market, and I don’t think it will be game-changing.


Where I believe machine learning will have a bigger impact is getting video communication to be more easily woven into your day-to-day. For example, transcription of conversations. That in itself sounds boring, especially when half the words are wrong. But it’s a first step that leads to it being more accurate and having that data available as part of your daily routine. You’ll be able to check back on your conversations and get summaries, which I think would be a pretty big deal.


Everyone hates meetings. The only thing they hate more than meetings is missing out on meetings - feeling left out. Out of fear, people just keep attending stuff they don’t really care about. Imagine if you could change all that and have a semi-presence. First, you would get summaries of the meeting - what got discussed and what got decided. Then, you could get real-time notifications:

Hey, people in this meeting are talking about this thing. Is it important for you to join now?

That kind of thing actually saves you time and has a lot of potential. It’s still huge speculation, because we don’t know whether machine learning will evolve fast enough, and this is fairly complicated to build. You only have to look at the myriad of services that do summaries for the press to realize that. I don’t know if they can evolve fast enough, but I perceive that this will be the place where things would get interesting.


There is still a huge problem with audio-video devices and the way we actually interact in a meeting. Audio more so than video. I had to spend years to find a headset that would work for me, as many are inconvenient and leave plenty to be desired. It’s a problem.


Another aspect is that of room installations. Heading into a meeting where you have a group of people talking at one place over a video/audio conference is a huge problem. It’s hard to build devices you can put in a room where everyone can speak and it feels like you are all in the same room. That’s an entirely different use case from individual participation. Feeling as if you’re part of the room is almost impossible today, even though every single conferencing system promises you exactly that.


Everyone knows you have to make an extra effort to be heard over a microphone. It’s so burdensome, whereas, if you can use existing hardware, like the phone you carry in your pocket to grab audio from all the places in the room, you can have more confidence that every word will be heard. If you know that your phone will capture the audio, or the phone of the person next to you will capture the audio, it’s much simpler. That whole process could have started automatically, because everyone knew they were attending the meeting, so their phones started capturing traffic. Of course, if you have multiple audio sources next to each other, they will all capture the same things. You don’t want it all rendered in a conference, so you have to isolate the ones you are not hearing well or combine them together in a smart way. Enrico Marocco from Telecom Italia, and I are calling this Crowd Conferencing. It’s another area that has a lot of potential.


Then you have things like whiteboards that there is still no good solution for. When people meet in person, they usually need to be in front of something: drawing on a whiteboard or using objects that don’t render particularly well in a video stream. There are a bunch of solutions that make you sacrifice something. Either there is a whiteboard that is captured but sent in a read-only format, or you have to sacrifice the simplicity of a physical whiteboard and use an electronic version. This looks like a simple problem, but it is one that I’m the least confident we will actually solve. It’s so important that I expect people to keep working on it though.


Making real-time communications have less friction is a difficult problem. I don’t want to be thinking about the conference, I want to be in it and forget about that. I think it’s going to be a huge focus in the coming years.


Where does this all fit in with WebRTC?

When WebRTC came around, it landed in a space where a bunch of vendors were competing about who had the best media engine. There were a ton of open source ones, of which we were part, and some packaging them up in clients, et cetera. WebRTC lands in that and all of a sudden this is no longer a thing where you can compete. The entire industry was competing on quality so far and now quality is no longer an issue.


You still have players out there like Zoom for example, who keep claiming:

Oh, we're not using WebRTC because we have some fantastic quality magic.

I haven't seen any evidence of that, so I don't believe that's actually accurate. I think pretty much everyone today is at exactly the same spot in terms of techniques. Everyone gets to the same quality, everyone is on the same network.

WebRTC eliminated this fragmentation. It gave good quality video transport to the masses and that's an amazing shift in itself, and it put it in a browser, which was also not bad. All of a sudden you had this possibility for people to integrate and more easily build targeted solutions. I don't think that this was as big of a game changer as the quality aspect, because if you look today, the most popular players in real-time communications haven't changed, or at least not in a way that was different from how the industry was evolving before that. You still have the big meetings companies and you still have Microsoft and Facebook, that were doing these things even before WebRTC landed. It's still them that has captured the biggest audiences.

So at the end of the day we realized that something is not clicking here. Weren't we supposed to start stream starting video everywhere? Why is it that we're still talking, everyone keeps using Skype or Facebook Messenger or WhatsApp? Why aren't we talking through every website left and right? I think there are a couple reasons for that.

  1. It still requires too much knowledge to route video services to your site.
  2. The pattern is actually discouraging to people. People don't want to care about the video conference. For example, they don't want to have to learn that this one button changed its place and they have to look for it at a different place. They're comfortable just falling back to Skype or WhatsApp because nothing else gives them that much value.
  3. Even though it's in a browser, computers themselves are not great devices for this application. The whole component model: webcams, microphones, headphones. Some of these devices won’t work, and it ends up being too much. That's what we were seeing with Stride, for example, or with Jitsi. There is a lot that fails because of device management and device issues. That's inherently not a problem with WebRTC, it’s just never going to be significantly better with computers.

Where this is different is on mobile. On mobile, WebRTC isn't making an impact through the browser. It's making an impact by the fact that it's open source and people are embedding the engine left and right. We're still going back to eliminating the competition of quality and then falling back to dedicated apps that we like for one reason or another.

To summarize, I think WebRTC is having a tremendous impact, but not the way that many had expected. It's primarily because of its technology more so than the browser.


Lastly, we had to ask: What do you think of callstats.io?

Before we did callstats.io, in the very early days before callstats.io was even really ready for marketing, we tried building our own thing. We tried it twice, actually, and it just always ended up being this huge time sink where we didn’t have a lot of the features we needed.


Since we switched to callstats.io, we’ve used it in all of our deployments and we aren’t even asking ourselves the question of “is there anything else we can use?” because there isn’t. callstats.io is in a unique position where it solves a very clear problem that is on everyone’s minds. There’s no one arguing about it and no one wondering whether this should be solved. callstats.io is the only one solving it. So one couldn’t wish for a better positioning of their product, and callstats.io has that advantage. In complement to that, you have some good professionals that really do understand the nature of real-time communication.


There’s simply no alternative to callstats.io.


Please note, this interview was condensed and edited for brevity and clarity.


If you want to learn more about Emil, check out his Twitter @emilivov.


What would you like to hear from our real-time communications champions? Leave us your thoughts in the comments below.

 

Learn About Emerging Use Cases For RTC

Tags: RTC Champions, WebRTC, Real-time Communications