Click to learn more about author Paolo Filippelli.
A wealth of interesting technologies and methodologies has arisen in recent years under the “cloud-native” umbrella name, and their impact in our lives as developers has been very deep.
We were once used to having big monolithic applications, hosted on enterprise application servers deployed on virtualized (and frequently expensive) hardware; now we have containers, cheap cloud computing/storage/everything, better implementations of agile methodologies, powerful architectural patterns like microservices, wonderful schedulers like Kubernetes, “unicorn-like” creatures we call DevOps engineers (or should we call them SREs? that’s another story), and so on.
All these technologies and methodologies allow us to build software that is extremely flexible, scalable, powerful, available, and maintainable in ways that we couldn’t predict years ago.
But as the saying goes, “Power is nothing without control,” and control is really something that we need. Service mesh to the rescue!
What Is a Service Mesh?
Before introducing the concept of service mesh, let’s take a small step back and talk about microservices.
This architectural pattern is founded on very solid principles that draw inspiration from service-oriented architectures (SOA), and ultimately allows developers to write software that is highly decoupled, maintainable, and resilient.
On the other hand, there is also a great challenge that teams must deal with: How to ensure robust, observable, measurable, secure, actionable communication between microservices?
While in old architectures the data flow was constrained within application tiers (for example, in a classic three-tier application: UI, business, and data layers) or completely absorbed by a monolith, the use of microservices along with containers and schedulers has brought an extreme dynamism to this field: Containers (i.e., microservices) come and go, can have performance problems, can go offline or work in a networking-degraded mode, can be spawned on entirely different data centers across different cloud vendors … and all of this while we need to make our application respond within an agreed level of service, without any downtime.
Let’s face it: Making distributed applications is hard because communication happens on networks that are unreliable by their very nature.
In response to all these needs, service meshes have entered the cloud-native space to offer a solution that allows us to deeply understand, measure, and act upon the communication layer of our microservices architecture.
So, what is a service mesh? Let’s quote one definition [1]:
“A service mesh is a dedicated infrastructure layer that adds features to a network between services. It allows to control traffic and gain insights throughout the system. […] In contrast to libraries, which are used for similar functionality, a service mesh does not require code changes. Instead, it adds a layer of additional containers that implement the features reliably and agnostic to technology or programming language.”
We can go further in our definition by saying that a service mesh is composed of two layers, the data plane and the control plane.
The data plane is made of a number of service proxies deployed alongside every microservice, following what is called the “sidecar pattern” (see [2]). Those proxies manage a wide range of cross-cutting concerns, like traffic control, monitoring, observability, and security, and they do that on behalf of microservices, without almost touching the business code.
The control plane, on the other hand, is a different layer that manages the configuration of service proxies, and also can gather telemetry data that they emit. It ensures that any change that we apply to mesh behavior is automatically distributed to the service proxies, that will behave accordingly (see Figure 1).
Benefits and Drawbacks of Using a Service Mesh
A service mesh, as we briefly discussed in the previous paragraph, brings an undeniable value within a software architecture, as it can greatly improve the control, security, observability, and reliability of the services.
This is especially true for microservices architectures, as it embraces their distributed nature and helps focusing on networking rather than business concerns.
A service mesh allows you to:
- Manage the traffic between services with advanced routing capabilities
- Set up secure mutual TLS communication by providing a dedicated CA infrastructure
- Set up RBAC authentication/authorization strategies by ensuring service identity
- Test your overall architecture’s integrity by injecting failures or delays
- Ensure resilience by using modern patterns like rate limiting, circuit breaking, andretries/time-outs
- Integrate with modern observability platforms (like Prometheus or Jaeger/Zipkin) by generating metrics/traces and set alarms on them
- Enhance your delivery capabilities by easing the adoption of strategies like canary releases, A/B testing, or progressive delivery (with tools like Argo Rollouts, Flagger, or Iter8)
All these benefits come at a cost, however, as service meshes also bring some drawbacks to the equation:
- They introduce operational complexity because they require a change in your infrastructure
- They bring new technology/concepts that must be assimilated by teams and add a new layer of “cognitive” challenges
- They use proxies that introduce latencies and resource consumption of cpu/memory, which could not be negligible in your architecture
- They require code modification if you want to properly implement tracing and logging, as every microservice is treated agnostically as a black box and cannot expose “business” data without code modification
Last but not least, we should remember that service meshes do somewhat overlap with other techniques, such as plain software libraries or API gateways (see [3] for an interesting discussion on this topic).
In conclusion, it’s fair to say that the benefits of introducing a service mesh surely outweigh the drawbacks, especially in modern microservices architectures. Though there are many considerations to be taken, service meshes are here to stay and become the long-term companions of microservices architectures.
Service Mesh Landscape
The current offering of service mesh implementations is becoming more and more diverse. A wide range of solutions exists, and each of them has its strengths and weaknesses (see [1] and [4] for detailed comparisons).
To fight all possible downsides caused by fragmentation and to ensure a healthy collaboration, an important effort has been recently made by major actors in the service mesh space that has given birth to the Service Mesh Interface (or SMI) specification ([5]).
Quoting its definition, Service Mesh Interface provides:
- A standard interface for service meshes on Kubernetes
- A basic feature set for the most common service mesh use cases
- Flexibility to support new service mesh capabilities over time
- Space for the ecosystem to innovate with service mesh technology
The introduction of SMI is an important milestone that should allow the service mesh technology to evolve in a healthy way.
Istio, or a Tale of Service Mesh Selection
When we decided to pick a service mesh to improve our company’s microservices architecture, we considered many aspects:
- Number of features
- Performance results
- Community/technical support
- Widespread usage
- Compatibility with Service Mesh Interface
After a thorough review, we decided to pick Istio, which in our opinion was the most balanced choice above aforementioned aspects.
Istio ([6]) is by far the most popular and featureful service mesh implementation. It has been jointly developed by Google and IBM and (like Kubernetes itself) is an open-source project that has received all the expertise developed by Google in their internal infrastructure.
Istio architecture, visible in Figure 2, has been designed to adapt to other types of deployments, in keeping with the definition of service mesh introduced earlier: It uses Envoy ([7]) , a very powerful and widely adopted service proxy, to be the “sidecar container” installed in every Kubernetes Pod. All Envoy proxies together implement the data plane, so they govern the flow of data between every microservice.
The control plane is instead served by a single process (called istiod) that communicates with the Envoy proxies to distribute configuration, receive recorded network traffic and telemetry data, and manage certificates issued by Istio’s own internal Certification Authority.
Istio embodies all great features that a service mesh should have:
- Telemetry reporting, with custom dedicated dashboards and alerting
- Tracing/logging features
- Traffic routing/mirroring features
- Tesiliency features (like circuit breaking, timeouts, retries, etc. etc.)
- MTLS and service identity support for authentication and authorization
- Is designed to be platform-independent and as such can also manage “legacy” infrastructure like virtual machines
Another important area that Istio covers is infrastructure governance, as it adds integration with well-established applications like:
- Prometheus, Grafana, and Alertmanager to receive, plot, and alert on telemetry received by Envoy proxies
- Jaeger backend and UI to help introduce tracing capabilities
- Kiali, a powerful service mesh that allows to view generated dependency graphs of microservices, as well as a lot of useful information like latencies, traffic rates, and overall health of the services
How We Use Istio
The adoption of Istio at our company has steadily proceeded in recent months.
We are using service meshes in our production Kubernetes clusters for measuring our performances and to observe traffic in our network, and we are also experimenting with how to use Istio to improve our delivery process.
We’ll publish some “deep dives” on our architecture in a future blog post, so stay tuned!
References
[1] servicemesh.es
[2] docs.microsoft.com/en-us/azure/architecture/patterns/sidecar
[3] blog.christianposta.com/microservices/do-i-need-an-api-gateway-if-i-have-a-service-mesh
[4] landscape.cncf.io/card-mode?category=service-mesh&grouping=category
[5] smi-spec.io
[6] istio.io
[7] envoyproxy.io
The content of this site was created by Radicalbit and contains attributions to material published by INNOQ at servicemesh.es and leanpub.com/service-mesh-primer.