What is a Kubernetes Operator and when should you create one?

A Kubernetes Operator is a custom controller that extends the Kubernetes API with Custom Resource Definitions (CRDs) to automate the management of complex applications (databases, messaging, etc.). It encodes the operational knowledge of a human expert: deployment, scaling, backup, failover, zero-downtime updates. You should create an Operator when a stateful application requires specific management logic that native Kubernetes objects (Deployment, StatefulSet) cannot cover on their own. For simple or stateless applications, a Helm chart is generally enough. The trigger threshold is the presence of complex operational tasks (automatic failover, replication, scheduled backups) that should be automated.

How do you secure secrets in a Helm chart without exposing them in the Git repository?

Several complementary solutions exist. Helm Secrets (a plugin based on Mozilla SOPS) encrypts values files with a GPG key, AWS KMS, GCP KMS or Azure Key Vault: the encrypted files can be safely committed to Git. External Secrets Operator synchronizes secrets from AWS Secrets Manager, HashiCorp Vault or Azure Key Vault into native Kubernetes Secrets. Sealed Secrets (Bitnami) lets you encrypt Kubernetes Secrets with the cluster's public key and commit them. In GitOps with ArgoCD, the recommendation is to use External Secrets Operator or Vault Agent Injector: the Git repository never contains plaintext secrets, only references to the external secrets manager.

What is the difference between ArgoCD and Flux CD for GitOps?

ArgoCD and Flux are the two reference GitOps solutions but with different philosophies. ArgoCD centers on a rich graphical interface with real-time visualization of cluster state, multi-cluster management from a central interface, and built-in RBAC. It follows a "pull with supervision UI" model. Flux is more minimalist, fully CLI-driven, and integrates natively with CNCF tools such as SOPS, Kustomize and image automation controllers. It strictly follows the GitOps principle of "Git as the single source of truth" without centralized state. In practice: ArgoCD is preferred for teams that want visual supervision and multi-cluster. Flux is preferred for platform teams that want pure GitOps without a graphical interface. Both support Helm, Kustomize and CRDs.

How do Kubernetes NetworkPolicies work and why are they essential?

By default, Kubernetes allows all pods to communicate with each other without restriction (an "everything is allowed" model). NetworkPolicies are pod-level firewall rules that explicitly define which pods can communicate with which other pods, on which ports, over which protocols. They require a CNI (Container Network Interface) that supports them: Calico, Cilium, Weave or Antrea (Flannel does not support them). The best practice is to apply a "default deny" policy on all namespaces and then open only the necessary flows. Without NetworkPolicies, a compromised pod can move laterally to access all other pods in the cluster, including databases and secrets services. With Cilium, NetworkPolicies can be extended to layer 7 (HTTP, DNS) with rules based on HTTP methods and URL paths.

What is Pod Security Admission and how does it replace PodSecurityPolicy?

PodSecurityPolicy (PSP) was deprecated in Kubernetes 1.21 and removed in 1.25. Its native replacement is Pod Security Admission (PSA), available as stable since Kubernetes 1.25. PSA defines three standardized security levels applied per namespace: privileged (no restriction), baseline (prevents obvious privilege escalations) and restricted (full hardening following best practices). Enforcement happens via labels on namespaces: enforce blocks non-compliant pods, warn accepts them with a warning, audit logs them. For more granular and flexible policies than PSA offers, OPA Gatekeeper or Kyverno let you define custom policies via CRDs: forbidding images from unapproved registries, enforcing resource limits, requiring specific labels, etc.

How does the Prometheus Operator simplify Kubernetes monitoring?

The Prometheus Operator introduces Kubernetes CRDs that let you configure Prometheus declaratively, without directly editing its configuration files. ServiceMonitors and PodMonitors define which services to monitor and how: the Prometheus Operator translates these resources into Prometheus configuration automatically. When a new service is deployed with the right labels, it is automatically discovered and scraped without manual intervention. PrometheusRule lets you define alerting rules in Kubernetes YAML that are loaded automatically. AlertmanagerConfig configures alert routing per namespace. The whole thing is easily deployed via the kube-prometheus-stack Helm chart, which includes Prometheus, Alertmanager, Grafana and around a hundred dashboards and alerts preconfigured for the cluster and Kubernetes workloads.

What is the App of Apps pattern in ArgoCD and why use it?

App of Apps is an ArgoCD pattern where a "root" ArgoCD Application itself manages a set of other ArgoCD Applications. The Git repository contains a Helm chart or Kustomize manifests that define the ArgoCD Application objects. When the root syncs, it automatically creates all child Applications, which in turn sync. This pattern solves the bootstrap problem: how to deploy ArgoCD and its applications from Git without a circular bootstrapping issue. It also lets you manage dozens of applications from a single entry point, apply uniform sync policies, and structure the cluster into tiers (infrastructure, platform, applications). The modern alternative recommended by ArgoCD is ApplicationSet, which offers more flexibility with generators (Git, cluster, matrix) to create Applications dynamically based on available repositories, branches or clusters.

Advanced Kubernetes: Operators, Helm Security and GitOps with ArgoCD

Back to tutorials

Prerequisites
This tutorial assumes a solid knowledge of Kubernetes (Pods, Deployments, Services, RBAC, namespaces). If you are just starting out, first read the introduction to Kubernetes guide and the advanced Docker guide. A working cluster (minikube, kind or production) is required for the examples.

Why advanced Kubernetes? Beyond basic Deployments

Kubernetes reduces Deployments, Services and ConfigMaps to their minimum. Most teams stop there, and stopping there creates blind spots: repetitive operations done manually, secrets exposed in Git, non-reproducible deployments, clusters with no consistent security policy, and monitoring that disappears as soon as a service is redeployed.

Advanced Kubernetes patterns address each of these problems systematically:

Operators: automate complex stateful operations (databases, messaging) via the Controller pattern
Helm Security: manage secrets without exposing them in Git, sign charts, apply RBAC
GitOps with ArgoCD: Git as the single source of truth, declarative and auto-reconciled deployments
Cluster hardening: PodSecurityAdmission, NetworkPolicies, OPA/Gatekeeper for zero-trust
Prometheus Operator: auto-discovering monitoring, declarative alerting per namespace

This guide is designed to be applied sequentially on a real cluster. Each section is independent and can be adopted progressively.

1. Kubernetes Operators: automating operations

The Controller pattern and CRDs

Kubernetes is built on the reconciliation loop: a controller observes the current state of the cluster, compares it to the desired state defined in API objects, and acts to make them converge. Custom Resource Definitions (CRDs) extend the Kubernetes API with new object types. An Operator combines these two concepts: it defines a CRD for its application domain and a controller that knows how to reconcile these custom resources.

# CRD: define a new resource type "PostgreSQLCluster"
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: postgresqlclusters.db.example.com
spec:
  group: db.example.com
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: [replicas, version]
              properties:
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 7
                version:
                  type: string
                  pattern: '^\d+\.\d+$'
                storage:
                  type: string
                  default: "10Gi"
                backupSchedule:
                  type: string
                  description: "Cron expression for automatic backups"
            status:
              type: object
              properties:
                phase:
                  type: string
                  enum: [Pending, Running, Degraded, Failed]
                readyReplicas:
                  type: integer
                primaryEndpoint:
                  type: string
  scope: Namespaced
  names:
    plural: postgresqlclusters
    singular: postgresqlcluster
    kind: PostgreSQLCluster
    shortNames: [pgc]

# Using the custom resource
apiVersion: db.example.com/v1alpha1
kind: PostgreSQLCluster
metadata:
  name: my-postgres-cluster
  namespace: production
spec:
  replicas: 3
  version: "16.1"
  storage: "50Gi"
  backupSchedule: "0 2 * * *"   # Backup at 2 AM

Operator controller in Go: basic structure

The Kubebuilder framework is the standard for building Go Operators. It generates the boilerplate (scaffolding) and integrates controller-runtime, the Kubernetes reconciliation library.

// controllers/postgresqlcluster_controller.go
package controllers

import (
    "context"
    "fmt"
    "time"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    dbv1alpha1 "github.com/example/pg-operator/api/v1alpha1"
)

// PostgreSQLClusterReconciler implements the reconciliation loop
type PostgreSQLClusterReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// Reconcile is called on every change to PostgreSQLCluster
// or to the resources it manages (StatefulSet, Service, etc.)
func (r *PostgreSQLClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // 1. Fetch the PostgreSQLCluster resource
    pgc := &dbv1alpha1.PostgreSQLCluster{}
    if err := r.Get(ctx, req.NamespacedName, pgc); err != nil {
        if errors.IsNotFound(err) {
            // The resource was deleted, nothing to do
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    // 2. Check whether the StatefulSet already exists
    existingSts := &appsv1.StatefulSet{}
    err := r.Get(ctx, req.NamespacedName, existingSts)

    if errors.IsNotFound(err) {
        // 3. Create the StatefulSet if it does not exist
        sts := r.buildStatefulSet(pgc)
        if err := r.Create(ctx, sts); err != nil {
            logger.Error(err, "Unable to create the StatefulSet")
            return ctrl.Result{}, err
        }
        logger.Info("StatefulSet created", "name", pgc.Name)
        return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
    } else if err != nil {
        return ctrl.Result{}, err
    }

    // 4. Reconcile: update if the specs have changed
    if *existingSts.Spec.Replicas != int32(pgc.Spec.Replicas) {
        replicas := int32(pgc.Spec.Replicas)
        existingSts.Spec.Replicas = &replicas
        if err := r.Update(ctx, existingSts); err != nil {
            return ctrl.Result{}, err
        }
        logger.Info("Replicas updated", "replicas", replicas)
    }

    // 5. Update the resource status
    pgc.Status.Phase = "Running"
    pgc.Status.ReadyReplicas = int(existingSts.Status.ReadyReplicas)
    if err := r.Status().Update(ctx, pgc); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

// buildStatefulSet builds the StatefulSet for PostgreSQL
func (r *PostgreSQLClusterReconciler) buildStatefulSet(pgc *dbv1alpha1.PostgreSQLCluster) *appsv1.StatefulSet {
    replicas := int32(pgc.Spec.Replicas)
    labels := map[string]string{
        "app":        "postgresql",
        "controller": pgc.Name,
    }

    return &appsv1.StatefulSet{
        ObjectMeta: metav1.ObjectMeta{
            Name:      pgc.Name,
            Namespace: pgc.Namespace,
            // OwnerReference: if the PostgreSQLCluster is deleted,
            // the StatefulSet is deleted too automatically (garbage collection)
            OwnerReferences: []metav1.OwnerReference{
                *metav1.NewControllerRef(pgc, dbv1alpha1.GroupVersion.WithKind("PostgreSQLCluster")),
            },
        },
        Spec: appsv1.StatefulSetSpec{
            Replicas:    &replicas,
            ServiceName: pgc.Name + "-headless",
            Selector: &metav1.LabelSelector{
                MatchLabels: labels,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{Labels: labels},
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "postgresql",
                            Image: fmt.Sprintf("postgres:%s-alpine", pgc.Spec.Version),
                            Ports: []corev1.ContainerPort{{ContainerPort: 5432}},
                            Env: []corev1.EnvVar{
                                {Name: "POSTGRES_PASSWORD", ValueFrom: &corev1.EnvVarSource{
                                    SecretKeyRef: &corev1.SecretKeySelector{
                                        LocalObjectReference: corev1.LocalObjectReference{Name: pgc.Name + "-credentials"},
                                        Key: "password",
                                    },
                                }},
                            },
                        },
                    },
                },
            },
        },
    }
}

// SetupWithManager registers the controller and declares the resources to watch
func (r *PostgreSQLClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&dbv1alpha1.PostgreSQLCluster{}).   // Watch PostgreSQLClusters
        Owns(&appsv1.StatefulSet{}).             // And the StatefulSets it creates
        Owns(&corev1.Service{}).
        Complete(r)
}

Existing Operators: CloudNativePG, Strimzi, Cert-Manager

In practice, it is rarely necessary to write an Operator from scratch. The CNCF ecosystem offers production Operators maintained by specialized teams:

CloudNativePG: highly available PostgreSQL with automatic failover, streaming replication and backup to S3
Strimzi: Kafka on Kubernetes with topic management, user management and mirror maker
Cert-Manager: automatic issuance and renewal of TLS certificates (Let's Encrypt, Vault, self-signed)
Prometheus Operator: declarative deployment and configuration of Prometheus (covered in section 6)

# Install CloudNativePG via kubectl
kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.23/releases/cnpg-1.23.0.yaml

# Create a 3-node PostgreSQL cluster with S3 backup
cat <<EOF | kubectl apply -f -
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-prod
  namespace: database
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:16.2
  storage:
    size: 50Gi
    storageClass: fast-ssd
  backup:
    barmanObjectStore:
      destinationPath: s3://my-bucket/postgres-backups
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: SECRET_ACCESS_KEY
    retentionPolicy: "30d"
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
EOF

# Check the cluster status
kubectl get cluster postgres-prod -n database
kubectl get pods -n database -l cnpg.io/cluster=postgres-prod

Premium Content

This advanced tutorial is reserved for premium members.

9,90€ / month

All advanced tutorials
New content every week
Progress tracking
Cancel anytime

Kubernetes Helm ArgoCD GitOps Operators DevSecOps Prometheus

Advanced Kubernetes: Operators, Helm Security and GitOps with ArgoCD

Why advanced Kubernetes? Beyond basic Deployments

1. Kubernetes Operators: automating operations

The Controller pattern and CRDs

Operator controller in Go: basic structure

Existing Operators: CloudNativePG, Strimzi, Cert-Manager

Premium Content

Morgann Riu

Comments

Advanced Kubernetes: Operators, Helm Security and GitOps with ArgoCD

Why advanced Kubernetes? Beyond basic Deployments

1. Kubernetes Operators: automating operations

The Controller pattern and CRDs

Operator controller in Go: basic structure

Existing Operators: CloudNativePG, Strimzi, Cert-Manager

Premium Content

Morgann Riu

Comments

Recommended for you

Kubernetes 1.35 : le Dynamic Resource Allocation passe en beta pour les GPU

Kubernetes : Installation et déploiement de clusters

Secrets Management en production : Vault, External Secrets et bonnes pratiques 2026

Docker Kanvas : de Compose à Kubernetes sans écrire de YAML

In-depth article on the topic

Checklist Sécurité Linux