Docker in Production: The Mistakes I See Most Often

Root containers, unpatched images, badly mounted volumes... A rundown of the classic mistakes and how to avoid them in your Docker deployments.

For several years now, I've been helping teams deploy their applications with Docker Kanvas pour migrer vers Kubernetes. Whether they're startups or more established organisations, I keep running into the same mistakes. Some are harmless, others can be costly: security holes, cascading outages, lost data.

This article isn't a Docker tutorial -- if that's what you're after, head over to my complete Docker guide instead. Here, I'm sharing hands-on field experience about the seven mistakes I come across most often in monitoring Linux en production, and how to fix them concretely.

1. Containers running as root

This is by far the most common mistake. By default, Docker runs the processes inside a container as root. Many teams never change this behaviour, often simply because "it works that way".

A root container that suffers a container escape gives the attacker root access on the host machine. This is the worst-case scenario in container security.

Even without an escape, a root process inside the container can overwrite files mounted as volumes, change the container's network configuration, or consume resources without restriction.

The fix is simple: create a dedicated user in your Dockerfile and use the USER instruction.

FROM node:20-alpine

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --chown=appuser:appgroup . .
RUN npm ci --only=production

USER appuser
EXPOSE 3000
CMD ["node", "server.js"]
Best practice: test that your container runs correctly with a non-root user right from development. Waiting until you go to production to fix this always produces permission surprises.

2. Using the :latest tag in production

The :latest tag is a classic trap. It doesn't mean "the latest stable version" but simply "the last build pushed without an explicit tag". Two identical deployments an hour apart can produce different containers if the image was updated in the meantime.

With `:latest`, you lose all reproducibility. You can't tell which version is actually running, and you can't roll back cleanly when a regression hits.

Always use a precise tag, ideally the SHA256 digest for critical environments.

services:
  api:
    # Bad: unpredictable
    # image: myapp:latest

    # Correct: explicit version
    image: myapp:2.4.1

    # Optimal: immutable by digest
    # image: myapp@sha256:a1b2c3d4e5f6...

Build tagging into your CI/CD pipeline. Every build produces an image with a unique tag (version number, commit hash, timestamp). This is the foundation of a reliable deployment.

3. No healthcheck

Docker knows whether a container is running. It doesn't know whether the application inside it is actually working. Without a healthcheck, a container whose main process is stuck (deadlock, exhausted connection pool, corrupted memory) stays marked as "running" indefinitely.

The orchestrator (Docker Swarm, Kubernetes) can't make a smart decision without this information. No automatic restart, no removal from the load balancer.

FROM python:3.12-slim

COPY . /app
WORKDIR /app

RUN pip install --no-cache-dir -r requirements.txt

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3
  CMD curl -f http://localhost:8000/health || exit 1

USER appuser
CMD ["gunicorn", "main:app", "-b", "0.0.0.0:8000"]
Best practice: the healthcheck endpoint should verify critical dependencies (database, cache, queue). A plain static "200 OK" isn't enough to detect a degraded state.

4. Secrets in environment variables

Environment variables are handy for configuration, but they're a poor choice for secrets. They appear in plain text in docker inspect, in debug logs, in process dumps, and often in the source code via .env files committed by mistake.

I've seen cloud API tokens, database passwords and even private SSL keys exposed through a simple docker inspect on a shared server. Environment variables are not a security mechanism.

Use Docker Secrets (in Swarm) or an external secrets manager (Vault, AWS Secrets Manager). In compose, you can mount secrets as files.

services:
  api:
    image: myapp:2.4.1
    secrets:
      - db_password
      - api_key
    environment:
      # Non-sensitive configuration only
      LOG_LEVEL: info
      DB_HOST: postgres

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    file: ./secrets/api_key.txt

In the application, read the secret from the file mounted in /run/secrets/. It's a minor change in the code that dramatically improves your security posture.

5. Badly mounted volumes and permission problems

Docker volumes are essential for persisting data, but mounting them is a constant source of problems. The typical case: a container running as a non-root user (as recommended in point 1) but whose volume is owned by root on the host.

The result: the application can neither read nor write its own data. Worse, some people fix this with chmod 777, which amounts to opening the doors to every process on the system.

FROM python:3.12-slim

RUN groupadd -r appgroup && useradd -r -g appgroup -u 1001 appuser

RUN mkdir -p /app/data && chown -R appuser:appgroup /app/data
VOLUME /app/data

USER appuser
WORKDIR /app
CMD ["python", "main.py"]
services:
  api:
    image: myapp:2.4.1
    user: "1001:1001"
    volumes:
      - app_data:/app/data

volumes:
  app_data:
    driver: local
Best practice: set an explicit numeric UID/GID in the Dockerfile and reuse it in the compose file. Avoid usernames that may vary from one image to another. Prepare the directories with the right permissions in an entrypoint if needed.

6. No resource limits

Without defined limits, a single container can consume all the memory or all the CPU of the host machine. I've seen production servers go down because a memory leak in one container triggered the kernel's OOM killer, which killed critical processes at random -- including other containers.

Without resource limits, a failing container can bring down the entire infrastructure. It's the equivalent of an uncontrolled "noisy neighbour".
services:
  api:
    image: myapp:2.4.1
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M

The limits define the absolute ceiling. The reservations guarantee a minimum available to the container. Size these values based on your load tests, not on guesswork. A container that regularly hits its memory limit probably has a leak or is undersized.

7. Logs that fill up the disk

By default, Docker stores each container's logs in a JSON file on disk, with no size limit and no rotation whatsoever. A slightly chatty application can generate gigabytes of logs in a few days. I've diagnosed more than one production outage caused by a full disk because of Docker logs.

A full disk doesn't just break logging: it stops the database from writing, blocks deployments, and can corrupt data. It's a silent failure that gets progressively worse.

Always configure log rotation, either at the Docker daemon level or per container.

services:
  api:
    image: myapp:2.4.1
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "5"
        compress: "true"

For a global configuration, edit the /etc/docker/daemon.json file:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}
Best practice: in serious production, ship your logs to a centralised system (Loki, ELK, Datadog). Local logs remain useful for immediate debugging, but they should not be your source of truth.

Conclusion

These seven mistakes aren't theoretical edge cases. They're problems I run into regularly, including in experienced teams. Docker simplifies deployment, but it doesn't eliminate complexity -- it just shifts it.

To recap the essential best practices:

  • Run your containers as a non-root user
  • Tag your images with explicit versions
  • Implement healthchecks that genuinely test the application's state
  • Use a secrets manager, not environment variables
  • Manage volume permissions with fixed UIDs/GIDs
  • Define CPU and memory limits for every container
  • Configure log rotation from the very first deployment

None of these fixes is complex on its own. It's their systematic application that makes the difference between a fragile Docker environment and a reliable production infrastructure. If you're just getting started with Docker, my Docker tutorial covers the fundamentals before tackling these advanced topics.

Did you enjoy this article?

Comments

Morgann Riu

Cybersecurity and Linux administration expert. I help companies secure and optimize their critical infrastructures.

Back to the blog

Checklist Sécurité Linux

30 points essentiels pour sécuriser un serveur Linux. Recevez aussi les nouveaux tutoriels par email.

Pas de spam. Désabonnement en 1 clic.