CASE STUDY: KUBERNETES AND INDUSTRIES

This article comprises how Kubernetes is used in industries and what use cases are solved by Kubernetes.

Image for post
Image for post

What is Kubernetes?

Kubernetes, or K8s, is a container orchestration system. In other words, when you use Kubernetes, a container based application can be deployed, scaled, and managed automatically.

The objective of Kubernetes is to abstract away the complexity of managing a fleet of containers that represent packaged applications and include everything needed to run wherever they’re provisioned. By interacting with the Kubernetes REST API, you can describe the desired state of your application, and Kubernetes does whatever is necessary to make the infrastructure conform. It deploys groups of containers, replicates them, redeploys if some of them fail, and so on.

Because it’s open-source, a k8s cluster can run almost anywhere, and the major public cloud providers all provide easy ways to consume this technology. Private clouds based on OpenStack can also run Kubernetes, and bare metal servers can be leveraged as worker nodes for it. So if you describe your application with Kubernetes building blocks, you’ll then be able to deploy it within VMs or bare metal servers, on public or private clouds.

Image for post
Image for post

The Rise of Kubernetes

First released in 2014, Kubernetes is an open-source container orchestration tool that can automatically scale, distribute and manage fault-tolerance on containers. Originally created by Google and then donated to Cloud Native Computing Foundation, Kubernetes is widely used in production environments to handle Docker containers and other container tools in a fault-tolerant manner. As an open-source product, it is available on various platforms and systems. Google Cloud, Microsoft Azure, and Amazon AWS offer official support for Kubernetes, so configuration changes to the cluster itself are not necessary.

The popularity of Kubernetes has steadily increased, with more than four major releases in 2017. K8s also was the most discussed project in GitHub during 2017 and was the project with the second most reviews.

Deploying Kubernetes

Kubernetes offers a new way to deploy applications using containers. It creates an abstraction layer which can be manipulated with declarative rather than imperative programming. This way, it is much simpler to deploy and upgrade services over time. The screenshot below shows the deployment of a replication controller which controls the creation of pods — the smaller K8S unit available. The file is almost self-explanatory: the definition gcr.io/google_containers/elasticsearch:v5.5.1- 1 indicates that a Docker Elasticsearch will be deployed. This image will have two replicas and uses persistent storage for persistent data.

Image for post
Image for post

There are many ways to deploy a tool. A Deployment, for example, is an upgrade from a replication controller that has mechanisms to perform rolling updates — updating a tool while keeping it available. Moreover, it is possible to configure Load Balancers, subnet, and even secrets through declarations.

Computing resources can occasionally remain idle; the main goal is to avoid excess, such as containing cloud environment costs. A good way to reduce idle time is to use namespaces as a form of virtual cluster inside your cluster. Each namespace is a completely isolated space inside Kubernetes, which means several environments can be created as necessary, such as production environments or staging environments. Services within a namespace will receive a DNS name such as <service-name>.<namespace-name>.svc.cluster.local. This means that services within the same namespace just need to make a request to another service using a service name.

K8s can be deployed in very different scenarios depending on the size of the company and its objectives:

  • In-house: Organizations can transform their own data center into a K8s cluster. In this case, companies can take full advantage of their own resources.
  • Cloud: The setup process is similar to an in-house deployment, but includes virtual machines on the cloud. This allows for the creation of a virtually infinite number of machines, depending on demand.
  • Hybrid: An organization’s data center might perform well for most of the day, but sometimes a peak occurs that local computing resources cannot handle. In this case, a hybrid solution works well. When necessary, K8s will create virtual machines on the cloud to better distribute computing resources when on-premise servers are full.
  • On-premise: Some cloud providers have their own K8s implementation embedded. In this case, there is no need to deploy and configure Kubernetes itself; an organization just needs to manage the service. Since deploying Kubernetes can be tricky, this is a good solution for companies that do not have a big IT team capable of handling cluster configuration and maintenance.
  • Multicloud: This is the next level of a hybrid cloud solution. Computing resources are deployed among two or more cloud vendors. In this case, companies need to avoid vendor lock-in and minimize risk if something goes wrong.

Kubernetes is not the only container orchestrator available. Other popular tools on the market include Docker Swarm and Apache Mesos. Swarm is an open-source container orchestrator intended to be the “big brother of Docker and Docker Compose. Swarm uses the same command line from Docker and is not very opinionated: organizations must decide which tools to use for nearly every feature needed on their cluster. Apache Mesos is another open-source orchestrator that manages other technologies in addition to managing containers. Apache Mesos calls itself a “data center operating system.” This is also the name of its commercial product, Mesosphere‘s Data Center Operating System (DC/OS). Apache Mesos is much less opinionated than K8s, allowing for the deployment of various types of applications besides containerized applications.

Image for post
Image for post

Use Cases

We have selected some common use cases to demonstrate Kubernetes’ capabilities. The use cases can be utilized together for different setups.

Self-Healing and Scaling Services

For simplicity, K8s process units can be detailed as pods and services. A pod is the smaller deployment unit available on Kubernetes. A pod can contain several containers that will have some related communication — such as network and storage. Services are the interface that provides accessibility to a set of containers. These services can be for internal or public access and can load balance several container instances.

Pods are mortal: once finished, they vanish from the cluster. Pod termination can be natural or through an error. Deployment is the most modern Kubernetes module to create and maintain pods. Using a single description file, a developer can specify everything necessary to deploy, keep running, scale, and upgrade the pod.

The figure below shows a simple deployment. This creates a pod of Nginx (version 1.7.9) with three replicas. In other words, Kubernetes will manage three Nginx instances; when an instance stops working, Kubernetes will create a new one.

One of the advantages of K8s is that it’s easy to understand what the platform is doing. In this case, the cluster will have 10 Nginx instances, and as many as 15 instances if the CPU utilization exceeds 80 percent of capacity.

Serverless, with Server

Serverless architecture has taken the world by storm since AWS launched Lambda. The principle is simple: just develop the code, and don’t worry about anything else. Server and scalability are handled by the cloud provider and code just have to be developed as functions that handle specific events: from HTTP requests to queue messages.

Vendor lock-in is the major disadvantage of this solution. It is almost impossible to change cloud providers without refactoring most of the code. There are some solutions like Serverless that seek to standardize function code across clouds. Another solution is to use a Kubernetes cluster to create a vendor-free serverless platform. As mentioned above, K8S abstracts away the difference between cloud servers. Currently, two popular frameworks virtualize the cluster as a serverless platform: Kubeless and Fission.

Optimized Resource Usage with Namespaces

A K8s namespace is also known as a virtual cluster. Namespaces create a virtually separated cluster inside the real cluster. Clusters without namespaces probably have test, staging, and production clusters. Virtual clusters usually waste some resources because they do not undergo continuous testing and because staging is used from time to time to validate the work of a new feature. By using a virtual cluster, or a namespace, an operations team can use the same set of physical machines for different sets depending on a given workload.

Namespaces are closely related to DNS because services located within the same namespace are accessible through their names. Namespaces offer a good solution for creating similar environments that locate services through network names: instances from different namespaces will find their dependencies without having to take into account which namespace they are located in.

In addition, namespaces can have resource quotas: each virtual cluster can receive a defined allocation in order to avoid a resource competition between namespaces. This is particularly useful to avoid a production environment sharing computing resources with just a few priority environments. Finally, different permissions can be created with roles for each namespace in order to limit the number of individuals with access to production environments.

Hybrid and Multiclouds

A hybrid cloud utilizes computing resources from a local, conventional data center, and from a cloud provider. A hybrid cloud is normally used when a company has some servers in an on-premise data center and wants to use the cloud’s unlimited computing resources to expand or substitute company resources. A multicloud, on the other hand, refers to a cloud that uses multiple cloud providers to handle computing resources. Multiclouds are generally used to avoid vendor lock-in, and to reduce the risk from a cloud provider going down while performing mission-critical operations.

Both solutions are addressed by Kubernetes Federation. Multiple clusters — one for each cloud or on-premise data center — are created that are managed by the Federation. The Federation synchronizes computing resources, and even allows cross-cluster discovery: virtually any pod can communicate with a pod in another cluster without knowing the infrastructure.

The Federation setup is not simple, and there is a caveat: for obvious reasons, the solution doesn’t work on managed services like Google Kubernetes Engine, Azure Container Service, or AWS EKS.

CASE STUDY: Pinterest’s Journey to a Kubernetes Platform

Image for post
Image for post

Challenge

After eight years in existence, Pinterest had grown into 1,000 microservices and multiple layers of infrastructure and diverse set-up tools and platforms. In 2016 the company launched a roadmap towards a new compute platform, led by the vision of creating the fastest path from an idea to production, without making engineers worry about the underlying infrastructure.

Solution

The first phase involved moving services to Docker containers. Once these services went into production in early 2017, the team began looking at orchestration to help create efficiencies and manage them in a decentralized way. After an evaluation of various solutions, Pinterest went with Kubernetes.

Image for post
Image for post

Pinterest software engineers have revealed the custom tools and resources they introduced in the company’s adoption of Kubernetes. The key takeaways for other teams looking to build their own platform as a service (PaaS) and associated developer workflow include how container orchestration systems can provide a way to unify workload management, that the Kubernetes workload model can be enhanced with custom resource definitions, and that a robust end-to-end test pipeline is key to avoiding regression.

Pinterest, a social media web and mobile app that allows users to save or “pin” information, has a huge user base who have collectively saved more than 200 billion pins across 4 billion boards. As a result of this volume and the associated growth of their infrastructure stack, the Pinterest team had several challenges. They stated that their engineers didn’t have a unified experience when launching their workload and that managing huge numbers of virtual machines was creating a huge maintenance load for the infrastructure team. Furthermore, it was hard to build infrastructure governance tools across the separate systems and to determine which resources could be recycled. This echoes Airbnb’s experience in simplifying their Kubernetes workflow. The team attempted to address these problems across three key themes: service reliability, infrastructure efficiency, and developer productivity.

According to lead author Lida Li and team, the Cloud Management Platform team started their journey with Kubernetes in 2017 by dockerizing their production workloads and evaluating different container orchestration systems. The Kubernetes native workload model covered deployment, jobs, and daemon sets but the team needed more to model their workloads. They stated that usability issues were ‘huge blockers’ on the way to adopting Kubernetes and that it would have been difficult to support different versions of runtime support on the same Kubernetes cluster. Their solution was to design custom resource definitions (CRDs). This was a pre-release deploy workflow available to early adopters of the new Kubernetes-based Compute Platform. The team was integrating this workflow into their CI/CD platform to create a cleaner service for its engineers.

Image for post
Image for post

Pinterest designed its CRDs to achieve various ends that may also be informative for engineers considering Kubernetes adoption. Firstly, they wanted to bundle various native Kubernetes resources to work as a single workload, which saved their engineers from doing this piece by piece. Secondly, they wanted to inject necessary runtime support for their applications by adding the necessary sidecars, init containers, equipment variables and volumes into the specification. Lastly, these definitions were used to perform the life cycle management for native resources, such as reconciling the specifications and updating the event record. The Pinterest team surmised that this evolution significantly reduced the workload on engineers and therefore the risk of error. This echoes the experience which the Shopify team shared at QCon New York last year.

One consideration for engineers taking on similar problems is that in order to avoid inconsistencies between applications as well as bloating maintenance and support burdens, Pinterest found their infrastructure team needed to deploy all workflow types such as pod-level sidecars, node-level daemonsets or VM-level daemons. Tinder, whose platform has run exclusively on Kubernetes since March 2019, took the opposite approach and its infrastructure responsibility is shared between all engineers in the organisation.

Another consideration is that the Pinterest team built an end-to-end test pipeline on top of the native Kubernetes test infrastructure with tests deployed to all clusters. This mitigated risks associated with going beyond the Kubernetes native workflow model and the engineers stated it caught many regressions before they reached production. The Pinterest team was also integrating their deployment workflow into their new CI|CD platform.

Impact

“By moving to Kubernetes the team was able to build on-demand scaling and new failover policies, in addition to simplifying the overall deployment and management of a complicated piece of infrastructure such as Jenkins,” says Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest. “We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster.”

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store