Cloud Agnostic CI/CD Pipeline and Environment


In today's world, having a portable environment with a proper version control is key to have a healthy production environment and setup. In this article, we will be discussing the advantages of cloud agnostic design and how it can be implemented using Terraform, Helm and Kubernetes.


NOTE= Agnosticism is the belief that the existence of God or the divine is unknown or unknowable. It holds that it's impossible to prove or disprove the existence of deities.


let's say your company uses different services of a cloud provider of their preference, and the cloud team setup a production environment with the services and configurations using the cloud specific solutions. What happens when the company decides to change the cloud environment and use completely different environment instead?


Let's think, let's say you used Azure Container Applications to serve your applications and used Cosmos DB or similar services to provide storage to your setup. What are the corresponding services in AWS of those? Maybe, S3 and ECS (Elastic Container Service) for serving applications, and DynamoDB for a NoSQL database similar to Cosmos DB.


The cloud specific services considered, you will need to re-apply the logic to a completely different cloud environment, in this case Azure to AWS.


What could be the logical approach to avoid such a waste. You may be thinking the answer: Hmm, why not using a service that works on all cloud providers and can be configured with code to keep the infra and version states wherever we want them to be.


Then let's use VMs, that's included in all CPs and we can simply deploy applications with Ansible like tools. Hmm, good try. Then what happens, when the VM cannot handle the load YES scaling needed. Does VMs have scaling, actually YES, however personally I don't trust any VMs :).


Using VMs can be an option, but there’s a big question: how do they handle service discovery when they scale? To manage this, you'd need a service discovery tool like Consul from HashiCorp. But here’s the catch: setting up and managing such tools can get pretty complex and require a lot of engineering effort.


As a company owner, I’d be wary of diving into a setup that’s both intricate and challenging to manage. Why? Because while a more complex environment might offer flexibility, it can also consume a lot of engineering resources. On the flip side, a simpler setup might lead to higher operational costs. Balancing complexity and cost is key to maintaining a stable and efficient infrastructure.


Cloud Agnostic Environment


What? I thought we were building a CI/CD pipeline.

Yes, we are. However, to create a cloud-agnostic CI/CD pipeline, you also need to make your environment agnostic to specific cloud services. This will become more apparent when we get to the CD (release) stage.


Hmm, what does that actually mean?

In simple terms, it means using services that are available across all cloud platforms, avoiding vendor lock-in. By adopting this approach, you can focus on application development rather than being tied down by infrastructure choices.


So what services should we use?

If we agree to avoid cloud-specific services, VMs are still a good option. The key is to use an orchestrator that continuously monitors and manages the health of the environment, ensuring it runs smoothly.


As you understood, I don't like unnecessary risks. Therefore, let's just consider Kubernetes (K8s). BUT... I didn’t say it will be cheap.

Kubernetes is a powerful orchestrator that can manage containers across different environments, making it a great fit for cloud-agnostic strategies.


However, the trade-off here is cost—both in terms of infrastructure and the expertise needed to set it up and maintain it. You’ll need skilled engineers to manage Kubernetes clusters, ensure scaling, handle updates, and troubleshoot issues.


The upside? Once it's set up, Kubernetes will give you the flexibility to move between cloud providers with minimal hassle, avoiding vendor lock-in and making your infrastructure more resilient in the long run. It’s an investment in reliability and futureproofing, but it comes with a price tag—whether that’s in terms of resources or personnel. So while it may not be the cheapest option, it's certainly one that provides peace of mind when scaling your operations.


Great, now we have a portable environment that can run on any cloud provider. Hmm, I think it's time to consider how to keep the state of the infrastructure. Why don’t we use Terraform?


Terraform is a cloud-agnostic Infrastructure as Code (IaC) tool that allows you to define and manage your infrastructure across multiple cloud providers using a single configuration language. By using Terraform, you can maintain a consistent and version-controlled state of your infrastructure, automate provisioning and updates, and ensure that your setup is reproducible and maintainable. This aligns perfectly with our goal of a cloud-agnostic setup, providing flexibility and control over your infrastructure while avoiding vendor lock-in.


KubernetesInfra Github Repository


The main.tf file is the entry point of the terraform apply command.

# main.tf file
module "kubernetes" {
source = "./modules/azure"
resource_group_name = var.resource_group_name
kubernetes_cluster_name = var.kubernetes_cluster_name
location = var.location
node_count = var.node_count
vm_size = var.vm_size
ARM_CLIENT_ID = var.ARM_CLIENT_ID
ARM_CLIENT_SECRET = var.ARM_CLIENT_SECRET
ARM_TENANT_ID = var.ARM_TENANT_ID
ARM_SUBSCRIPTION_ID = var.ARM_SUBSCRIPTION_ID
}

However, this setup demonstrates the flexibility of a cloud-agnostic environment. The core idea is that by abstracting infrastructure through Terraform modules and variables, you can easily switch to any other cloud provider.


The specific module for Azure can be replaced or configured similarly for AWS, Google Cloud, or any other cloud provider, thereby supporting a cloud-agnostic strategy. This approach ensures that your infrastructure definitions remain portable and adaptable, fitting seamlessly into different cloud environments as needed.


Let's check the Azure module to see what resources we are creating and what logic we're after.


modules/azure/main.tf

# Inner metadata key-values can be seen in the github repository.
# https://github.com/AtakanG7/KubernetesInfra/

provider "azurerm" {
features {}
}

resource "azurerm_resource_group" "main" {}

resource "azurerm_kubernetes_cluster" "aks" {
default_node_pool {}
identity {}
}

provider "kubernetes" {}

resource "kubernetes_namespace" "monitoring" {}

resource "kubernetes_namespace" "production" {}

resource "kubernetes_namespace" "staging" {}

provider "helm" {
kubernetes {}
}

resource "helm_release" "prometheus" {}

resource "kubernetes_config_map" "alertmanager_config" {}

resource "helm_release" "database" {}

resource "helm_release" "web_app" {}

resource "helm_release" "worker" {}

resource "random_password" "grafana_admin_password" {}


This setup creates a Kubernetes service with the necessary namespaces: staging, production, and monitoring.


Kubernetes Resources

The Terraform configuration provisions an Azure Kubernetes Service (AKS) cluster with specific configurations and deployments across different namespaces:

  1. Azure Kubernetes Service (AKS) Cluster: The azurerm_kubernetes_cluster resource sets up the Kubernetes cluster in Azure, specifying node pool details and system-assigned identity.
  2. Kubernetes Provider: Configured to connect to the AKS cluster using kubeconfig details, allowing management of Kubernetes resources.
  3. Namespaces: The following Kubernetes namespaces are created to organize resources:
  4. Monitoring Namespace:
  5. Prometheus: Deployed using a Helm chart (kube-prometheus-stack). This setup includes Prometheus for metrics collection and Grafana for visualization. The Helm release configuration is provided through a template file, which includes a randomly generated password for Grafana.
  6. AlertManager Config: Configured via a Kubernetes ConfigMap containing settings for AlertManager, which manages and routes alerts generated by Prometheus. The configuration is loaded from YAML files (alertmanager.yml and prometheus-rules.yml).
  7. Production Namespace:
  8. Database: Deployed using a Helm chart from a local .tgz file. This release sets up a database service with configurations specified in a production values YAML file.
  9. Web App: Managed with a Helm chart from a local .tgz file. This release deploys a web application with configurations provided in a production values YAML file.
  10. Worker: Deployed using another Helm chart from a local .tgz file. This release sets up worker services for background tasks or batch processing, with configurations specified in a production values YAML file.
  11. Staging Namespace:
  12. This namespace is set up for staging and testing. Currently, no specific resources are deployed in this namespace, but it can be used to stage applications before moving them to production.


  1. Helm Provider: Configured to manage Helm releases, connecting to the AKS cluster to deploy and manage applications.


  1. Random Password:
  2. Grafana Admin Password: A secure password is generated for Grafana, used in the Prometheus Helm release. This ensures that the Grafana dashboard is protected with a strong password.


Summary

The Terraform configuration creates a well-organized Kubernetes environment with specific namespaces tailored for different purposes:

  1. Monitoring: Contains observability tools such as Prometheus and Grafana, along with AlertManager configurations.
  2. Production: Hosts critical applications including the database, web app, and worker services.
  3. Staging: Reserved for testing and staging purposes, ensuring changes are vetted before production deployment.


Creating such an environment will enable us to manipulate and manage it according to our specific needs. By setting up a cloud-agnostic infrastructure, we ensure that our environment is not tied to any particular cloud provider, giving us greater flexibility and control.


So far, we've covered the essentials of establishing a cloud-agnostic environment, focusing on how to deploy and manage Kubernetes clusters and associated resources across different cloud platforms. Now, let’s shift our focus to designing a cloud-agnostic CI/CD pipeline.


Before we get into the details, there’s one important thing to consider: versioning your deployments. This step is key for tracking which versions are deployed and making sure nothing gets lost. Proper versioning helps you keep a clear record of what’s been deployed. If something goes wrong, you can easily roll back to a previous version. It’s an essential part of making sure your CI/CD pipeline works smoothly and reliably.


Meet with Helm

GH-PAGES (helm github repository)

This repository contains the necessary deployments, services, cofigmaps and so on for our all services. The repository responsible for keeping the application staging and production versions differently. We will manipulate these values inside the Jenkins pipeline to keep the application version state in the application all the time.


If new services are being added to the kubernetes cluster, we can whether automate a differnet pipeline to fetch deployments, configs and other configurations from the k8s and create the charts automatically. However, for simplicity we are creating each service manuelly in the /charts. This folder keeps the deployment.yaml, values-staging.yaml, values-production.yaml.

/charts/web-app/templates/deployment.yaml

# Application Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: 3004
env:
- name: DB_HOST
value: {{ .Values.database.host | quote }}
- name: DB_PORT
value: {{ .Values.database.port | quote }}
- name: DB_NAME
value: {{ .Values.database.name | quote }}

# Application Service
---
apiVersion: v1
kind: Service
metadata:
name: {{ .Values.host }}
namespace: {{ .Values.namespace }}
spec:
selector:
app: {{ .Values.host }}
ports:
- protocol: TCP
port: 3004
targetPort: 3004

As you may realize, there are a lot of variables are being used here such as .Values.namespace,Values.image.repository and so on. These variables are defined in:

gh-pages/charts/web-app/Chart.yaml

apiVersion: v2
name: web-app
description: A Helm chart for the web application :)
version: 0.1.22
appVersion: "1.0.0"

gh-pages/charts/web-app/values-production.yaml

replicaCount: 1
host: web-app-service
namespace: staging

image:
tag: 0.1.22

ingress:
enabled: true
hosts:
- host: database-service
paths: ["/"]

database:
host: "database-service"
port: "27017"
name: "mydatabase"

gh-pages/charts/web-app/values-staging.yaml

replicaCount: 1
host: web-app-service-staging
namespace: staging

image:
tag: 0.1.22

ingress:
enabled: true
hosts:
- host: web-app-service-staging
paths: ["/"]

database:
host: "database-service-staging"
port: "27017"
name: "mydatabase"

As you can see, staging and production values are managed in separate files. This separation allows us to tailor configurations for each environment efficiently. With Helm, you can specify which values file to use during deployment. For example, you might use the following Helm commands in your pipeline to deploy to different environments:

# Deploy to production
helm upgrade --install web-app ./charts/web-app -f ./charts/web-app/values-production.yaml

# Deploy to staging
helm upgrade --install web-app ./charts/web-app -f ./charts/web-app/values-staging.yaml

By leveraging Helm’s flexibility, you can easily switch between configurations and ensure that your application is correctly deployed across various environments.


We have got the independent environment and application version control system. One thing left off.


Creating Cloud Agnostic CI/CD Pipeline


We’ve established a cloud-agnostic environment, and now it’s time to apply the same principle to our CI/CD pipeline. The goal is to ensure that our pipeline remains portable and adaptable across different cloud providers.


Why focus on a cloud-agnostic CI/CD pipeline? While cloud-specific services offer convenience, they can limit portability. By keeping our CI/CD pipeline cloud-agnostic, we avoid being tied to any single provider and ensure smooth transitions and scaling across various cloud platforms. See How to Setup CI/CD Pipeline Using Azure DevOps for AKS.


With the environment already portable, we can now create a CI/CD pipeline that integrates seamlessly with any cloud, simplifying development and deployment processes.


Pipeline Configuration and Team Management


When it comes to pipeline creation, Azure DevOps is an invaluable tool. It not only lets you define your pipeline with precision but also offers a suite of features including Kanban charts, team management, comprehensive test plans, and organizational-level configurations.


While these advanced capabilities are incredibly useful, they do come with a cost. For those seeking a more budget-friendly alternative, Jenkins is a fantastic free option. Jenkins allows you to create and manage your pipeline efficiently, making it a great choice for handling AKS CI/CD logic without breaking the bank.


View The Jenkins Full Pipeline

For our CI/CD pipeline, we’re using AWS free tier machines, which are perfect for testing. Here’s a brief overview of the setup:

  1. Docker: Ensure Docker is installed to build and push container images.
  2. Azure CLI (az): Required for managing Azure resources.
  3. kubectl: The command-line tool for interacting with your Kubernetes cluster.
  4. Jenkins: Set up Jenkins with the necessary plugins to integrate with Docker, Kubernetes, and Azure.


Here’s how the pipeline works:

  1. Checkout Application: Pulls the latest code from the Git repository.
  2. Clone Helm Chart Repository: Clones the Helm chart repository and adds it to Helm.
  3. Update Chart Versions: Increments the version in the Helm chart and updates relevant files.


  1. Build and Push Docker Image: Builds the Docker image with the new version and pushes it to the registry.

  1. Mirror Production in Staging: Mirrors the production Helm charts to the staging environment.
  2. Deploy to Staging: Deploys the updated Helm chart to the staging environment and verifies the deployment.
  3. Run Tests: Executes tests on the staging environment.
  4. Approval: Waits for manual approval before deploying to production.

  1. Update Production Chart: Packages and pushes the updated Helm chart to the repository.
  2. Remove Staging Resources: Cleans up resources in the staging environment.
  3. Deploy to Production: Deploys the updated chart to the production environment.


I think this is very self-describing. To see the full pipeline checkout the github repo called KubernetesInfra/.jenkins.


A key point to note about this CI/CD pipeline is that we use a single Kubernetes service with three distinct namespaces to manage different environments: production, staging, and a testing environment. The staging environment is particularly noteworthy because it’s designed to be a short-lived space for testing. To prevent any impact on production, we’ve put measures in place to limit the resources available to staging. This ensures that resource usage doesn’t interfere with production workloads, maintaining stability and performance across all environments.


Conclusion

By adopting a cloud-agnostic approach, you can ensure that your infrastructure and CI/CD pipeline are flexible and adaptable to any cloud provider. This strategy avoids vendor lock-in, making it easier to scale and manage your environment efficiently. For more details on the CI/CD pipeline, check out the GitHub repository: KubernetesInfra/.jenkins.


This approach provides a robust, scalable solution for managing deployments and infrastructure, offering peace of mind as you scale your operations across different cloud platforms.








Recommended Reading