It’s been 5 years since we discussed “Who is Responsible for WebRTC Monitoring”, where we looked at the typical personas, in the organization, that are responsible for WebRTC monitoring. Now that our API evolved to accommodate nowadays’ RTC requirements, we thought it would be nice to revisit those profiles and tie them together with specific callstats API functions, and callstats features in this new blog post.
Revisiting the personas
The Developer is the one who builds new features into the RTC product and needs to observe how the features impact usability or service delivery. For this purpose the developer should look at events and conference data, plus, in some cases delve into granular pieces of metrics and build insights from those.
Since part of the Developer role is to build a robust application and, along with the DevOps, monitor the software deployment, the two roles should be working in close collaboration. One thing that we’ve observed to be important for both the roles, is for the app or product to send as much contextual information such as the application logs as possible, so that the developer and operations team can delve deeper into the issue without having to look at a separate logging system and therefore save time by not having to query and correlate pertinent information.
Instead of the application logs, the product can alternatively send contextual events such as, Device List, User, Internet Connectivity Establishment (ICE) events, which are shown on the conference timeline graph. For example, if a user loses connectivity and the other participants do not see that person until they rejoin is easier to see visually in a conference timeline instead of correlating events within logs.
To summarise, sending application logs and incorporating events API assists in centralising the application-specific data, like error messages, user feedback, dominant speakers etc in one place, which in turn helps correlate device or network errors with the special custom events. This way your investigation and troubleshooting activities become faster and more efficient.
The Product Manager has a broader vision of the product and the system around it. It also handles where the product is hosted, hence responsible for product scalability and along with the DevOps responsible for infrastructure stability.
In a nutshell, the Product Manager needs to mitigate the impact that the user environment can have on the service.
Product managers need all the metadata they can get while keeping a good balance with the privacy of the end-user. The callstats product is GDPR compliant and encrypts all the data as soon as it is received, ergo reducing risk. We treat all your metadata such as usernames, conferenceIDs, including device lists and the user’s IP address/location data as Personally Identifiable Information (PII), ergo all these metadata fields are encrypted as soon as the callstats infrastructure gets it.
The typical metadata that is important for isolating and correlating your growth, poor connectivity, dropped calls, churn, or other artefacts are:
- Geolocation or Location-specific data such as IP address -- tracks how and where your product is being used and forecast scaling.
- ServerName, pbxID -- gives the information on which servers or PBXes were involved, giving you an idea of where the servers are relative to the end user.
- SiteID, tenantID -- gives you a sense of where these users are connecting from, for example, Building 8, Helsinki would narrow down the users in Helsinki to a known physical location.
- Versions, OS, -- gives the information about the deployment and if certain versions or installations (android vs iOS, mobile vs desktop) that are correlated to the degradation.
The Product Manager will be looking to understand if the user experience is related to a particular category of the system and benchmark the conditions against everything else like a previous version, a different geography, different tenant, or server. The benchmarking allows the PM to figure out quickly where to invest more engineering and ops resources.
The main way to get insights apart from support tickets and benchmarking is to ask the end user directly for their feedback, that feedback can be passed on to callstats via the User Feedback API.
The Support Team is on the front line, triaging tickets and handling customer feedback. Without the right tools, any troubleshooting attempt becomes a guessing game. Their main goal is to
deliver the best customer experience possible even during periods when the product is not performing at its best.
The main aspect the support team wants to isolate is if a particular issue is with the product or something endemic to the end user’s environment. Poor connectivity or an underperforming network is the most common external root cause for support escalations. In this case, special attention needs to be paid to the ICE Events. ICE Events help Support and Operations to narrow down the root-cause to an underlying network disruption.
To backup support teams on the thesis that the network was already under performing before the call, the product needs to incorporate the Pre-call Tests (PCT) from the Smart Connectivity Test (SCT) Suite. SCTs check network conditions such as throughput, loss and round trip time (RTT or latency), jitter whenever you call the API. We recommend that individual users that are work from home users run it every 5-30 minutes, while users working from office or similar location (where you would have several users connecting to your service) running it every hour or two would provide sufficient information to
The Support and Operations team will be looking to get real-time information about ongoing calls and comprehensive dashboards that can present meaningful data to help them act proactively instead of being reactive. Proactive Notifications over webhook to get alerts before they become an outage are the way to go.
Overall, all these personas need to exist within your organization. If you are running over a CPaaS, the need for Ops reduces, however, it increases the emphasis on the product and support roles. Meanwhile, if you are running your own infrastructure, the operations team needs to have more robust tooling to make sure that they remain in front of the issues.
This blog post was written by Filipe Leitão, callstats resident expert based in Germany.