At callstats.io, we use multiple different internal and open source services to create one cohesive product. In a lot of ways, this is great. It enables us to implement features and iterate much faster than if we were building everything from scratch. However, implementing microservices can present its own set of complications.
Challenges we Faced with Microservice Structures
Having a microservice structure has its own array of challenges. These are the ones we have had to face.
Organizing the Codebase
When you’re dealing with a monolithic service, it’s relatively easy to control the codebase. One repo contains all of the code, which the developer can access and commit to easily. Microservices, however, change things. The developers have a choice to make: continue with one repo and simplify management, or split into multiple repos to enforce isolation. In order to organize all of our services, we segregate them into their own separate repos: one per service. We have a common library that is used by the majority of our internal services and contains a set of stable components used by many of our other services. Though this keeps each service separated neatly, it can be annoying to work on these services collaboratively. For a new feature to be released, we may need to modify multiple services. In order to accomplish this with our current setup, a developer will need to continuously commit to multiple repositories to properly test and implement a feature.
Communicating Effectively Between Services
Communicating with a monolithic service is generally done over HTTP. Added complexity comes into play when dealing with multiple services that all need to communicate to each other effectively. We use several different methods to communicate between services, including gRPC, REST, asynchronously through Apache Kafka, and SNS/SQS.
We have had great success with gRPC, even though it increases our development time. Protobuf files need to be synced between multiple services whenever a small change is made, which adds a small burden to development.
Alternatively, we have used REST-like APIs for a long time. Apart from small, human-made errors such as typos and miscommunications, we have had virtually no trouble.
Apache Kafka and SNS/SQS are particularly handy for data processing and upcoming tasks. While this has been fairly effective, we have had some issues with using monitoring and alerts effectively.
Docker is very easy to use, to the point where the majority of our team uses it, including individuals in other departments such as marketing.
Kubernetes, however, has a fairly high learning curve. It can be difficult to debug unless the developer has a significant amount of operational experience. In some instances, operations has to get involved in order to fix the issue. Since our team is small, this has caused some delays in bringing our services back online and has definitely caused more context switches to operations than should be necessary. This type of issue can have a big effect on the productivity of such a tight core team.
Debugging can be incredibly tedious, so a priority for us has always been to make it as simple as possible. We need our team to be able to iterate quickly and effectively. When running a monolithic service, it can be fairly easy to debug everything at once. You simply look at a single request in the logs to identify the problem. However, microservices make this a little more complicated. We have multiple services, all with their own high-level metrics that we need to monitor and interpret. This includes things like resp latency, req count, memory usage, and CPU usage. We also have centralized logs to figure out the details of the error.
Especially given the recent high-level breaches and the introduction of GDPR, security has never been more important. Our services need to have a connection to the user as well as ensure the communication is being done securely. With a traditional monolith, this was easy. Every request was bound to a specific user, and solely the connection from the users browser to the database had to be protected. In contrast, microservices need to be able to communicate information to each other securely and in a way they can both interpet. In order to accomplish this, we must explicitly pass authentication data. As the number of communication channels has increased, it has brought even more concerns. We opted to use JWT tokens in most places to standardize.
Our Tips for Getting the Most out of your Microservices
In order to get the most out of your microservices, go in with a plan. Try to think ahead and address as many of these issues as possible ahead of time, so you can avoid tedious and difficult work addressing them later.
Communicate Asynchronously When Possible
A single service with a single responsibility and no one else to answer to is relatively simple to build and operate - at least until it has to update other services. Once services need to communicate, synchronous communication over a network significantly increases complexity. We found that, if you want to take the synchronous communication route, you need a strategy to fix any inconsistency between the services when an error eventually occurs. Instead, we aim for asynchronous communication between services when they must react to external changes. This has made our system much more eventually consistent. However, it does bring about its own troubles. We cannot rely on data to exist in another service immediately after it’s written in a queue, and monitoring can be more difficult.
Simplify and Standardize
Choose a set of tools and stick with them. It’s important to deeply understand each service within your microservice mesh, so you can ensure you’re using each service as effectively as possible. Additionally, try to limit the number of languages you use, unless you have a team used to operating microservices. Using a cool new technology sounds great, but if only one member of the team understands it, it is going to be difficult to maintain.
Use Proper Monitoring and Update It
It’s hard to count the number of times we have been saved by having proper monitoring that led us to the root cause of a problem. Alerts can also be very useful, but they must be properly configured to make sure the engineers are not drowning in overly sensitive alerts. For our team, alerts are almost exclusively used for high priority errors. When an alert is raised, it is immediately addressed, diagnosed, and fixed. On occasion, this could be simply tuning the alert so it is less sensitive and allows the system to self-heal.
Understand How Your Services Work
The majority of our backend team has a thorough understanding of how the services we use work. As the uses for the services increases, it becomes practically impossible to build an informative set of data guidelines for every scenario for a dedicated operations team. Make sure several developers on the team, if not all, understand how to debug, fix, and operate any single service.
Have an Operations Strategy
If you’re looking to transition from a monolithic service to microservices, make sure you have an operations strategy. You need to have a plan for how to operate, monitor, and fix your individual services and the global state when errors eventually happen - which they will.
Our pains with microservices have been relatively small, and have mostly cost us in developer time. We knew that would be a risk when we started, but were surprised at how much work it took to get to the point where we could comfortably operate our system.