At callstats.io, we use Prometheus for a number of developer-related tasks. It’s a hugely important tool for us that gets used by most every member of our engineering team.
I am our resident Prometheus expert. My official role is infrastructure software engineer, and I have over ten years of experience in system administration. I spend most of my time on systems and monitoring-related tasks. I have two and a half years of experience in Prometheus. I started before it was really stable, when it was just in the proof of concept stage. Over the years, it has proven to be stable and rock solid.
What is Prometheus?
Prometheus is a monitoring system and time series database, with powerful query language, aggregation, and alerting capabilities. It is open source and community-driven, and can be integrated with many different platforms including Docker and Kubernetes.
Pros of Prometheus
Kubernetes is a mainstream platform for microservices. Prometheus and Kubernetes integrate quite well - to the point where, if you use Kubernetes, you almost always use Prometheus. It is a standard de-facto. They are both stable, straightforward to use, and very simple but powerful. It is the type of tool that everyone needs, though not everyone knows they need it.
2. Powerful Query Language
The query language, promql, is my favorite part of Prometheus. It isn’t perfect, but it is very useful. It is simple, powerful, and helpful. It provides calculations such as aggregations, predictions, and math functions. You can do almost whatever you want with this language if you understand it properly.
Prometheus is very simple to operate. The simplicity of Prometheus itself has been a huge plus for me and for the work we do. All of the issues you face in Prometheus are obvious issues. So long as they are not architecture problems, since architecture problems are not easy to fix, the issues are very simple to avoid or fix quickly.
Cons of Prometheus
The only real con I have seen with Prometheus is that there is no option for long-term storage. For our use case, this isn’t a problem, since we have 15 days of retention time, which is the default for Prometheus. However, for other applications it may be an issue.
How Do we Use Prometheus?
Virtually every team in our engineering department uses Prometheus for the greater good.
Our infrastructure and operations team uses Prometheus in multiple ways. We use it to monitor resource usage and service performance, as well as data pipeline processing queues. Additionally, we keep track of Kubernetes deployment status and metrics. Using Prometheus is the responsibility of the entire team.
Our developers use Prometheus to compare performance and resource usage between service releases. This includes smaller individual services we offer, including our REST API collector and the dashboard.
Lastly, our analytics team uses Prometheus for several artificial intelligence-related metrics.
Over the next few engineering blog posts, we will be diving into specific experiences our engineers have had with Prometheus. Stay tuned for some interesting anecdotes!