For several years now, I've been helping teams deploy their applications with Docker Kanvas pour migrer vers Kubernetes. Whether they're startups or more established organisations, I keep running into the same mistakes. Some are harmless, others can be costly: security holes, cascading outages, lost data.
This article isn't a Docker tutorial -- if that's what you're after, head over to my complete Docker guide instead. Here, I'm sharing hands-on field experience about the seven mistakes I come across most often in monitoring Linux en production, and how to fix them concretely.
1. Containers running as root
This is by far the most common mistake. By default, Docker runs the processes inside a container as root. Many teams never change this behaviour, often simply because "it works that way".
Even without an escape, a root process inside the container can overwrite files mounted as volumes, change the container's network configuration, or consume resources without restriction.
The fix is simple: create a dedicated user in your Dockerfile and use the USER instruction.
FROM node:20-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
RUN npm ci --only=production
USER appuser
EXPOSE 3000
CMD ["node", "server.js"]
2. Using the :latest tag in production
The :latest tag is a classic trap. It doesn't mean "the latest stable version" but simply "the last build pushed without an explicit tag". Two identical deployments an hour apart can produce different containers if the image was updated in the meantime.
Always use a precise tag, ideally the SHA256 digest for critical environments.
services:
api:
# Bad: unpredictable
# image: myapp:latest
# Correct: explicit version
image: myapp:2.4.1
# Optimal: immutable by digest
# image: myapp@sha256:a1b2c3d4e5f6...
Build tagging into your CI/CD pipeline. Every build produces an image with a unique tag (version number, commit hash, timestamp). This is the foundation of a reliable deployment.
3. No healthcheck
Docker knows whether a container is running. It doesn't know whether the application inside it is actually working. Without a healthcheck, a container whose main process is stuck (deadlock, exhausted connection pool, corrupted memory) stays marked as "running" indefinitely.
The orchestrator (Docker Swarm, Kubernetes) can't make a smart decision without this information. No automatic restart, no removal from the load balancer.
FROM python:3.12-slim
COPY . /app
WORKDIR /app
RUN pip install --no-cache-dir -r requirements.txt
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3
CMD curl -f http://localhost:8000/health || exit 1
USER appuser
CMD ["gunicorn", "main:app", "-b", "0.0.0.0:8000"]
4. Secrets in environment variables
Environment variables are handy for configuration, but they're a poor choice for secrets. They appear in plain text in docker inspect, in debug logs, in process dumps, and often in the source code via .env files committed by mistake.
docker inspect on a shared server. Environment variables are not a security mechanism.Use Docker Secrets (in Swarm) or an external secrets manager (Vault, AWS Secrets Manager). In compose, you can mount secrets as files.
services:
api:
image: myapp:2.4.1
secrets:
- db_password
- api_key
environment:
# Non-sensitive configuration only
LOG_LEVEL: info
DB_HOST: postgres
secrets:
db_password:
file: ./secrets/db_password.txt
api_key:
file: ./secrets/api_key.txt
In the application, read the secret from the file mounted in /run/secrets/. It's a minor change in the code that dramatically improves your security posture.
5. Badly mounted volumes and permission problems
Docker volumes are essential for persisting data, but mounting them is a constant source of problems. The typical case: a container running as a non-root user (as recommended in point 1) but whose volume is owned by root on the host.
The result: the application can neither read nor write its own data. Worse, some people fix this with chmod 777, which amounts to opening the doors to every process on the system.
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup -u 1001 appuser
RUN mkdir -p /app/data && chown -R appuser:appgroup /app/data
VOLUME /app/data
USER appuser
WORKDIR /app
CMD ["python", "main.py"]
services:
api:
image: myapp:2.4.1
user: "1001:1001"
volumes:
- app_data:/app/data
volumes:
app_data:
driver: local
6. No resource limits
Without defined limits, a single container can consume all the memory or all the CPU of the host machine. I've seen production servers go down because a memory leak in one container triggered the kernel's OOM killer, which killed critical processes at random -- including other containers.
services:
api:
image: myapp:2.4.1
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
The limits define the absolute ceiling. The reservations guarantee a minimum available to the container. Size these values based on your load tests, not on guesswork. A container that regularly hits its memory limit probably has a leak or is undersized.
7. Logs that fill up the disk
By default, Docker stores each container's logs in a JSON file on disk, with no size limit and no rotation whatsoever. A slightly chatty application can generate gigabytes of logs in a few days. I've diagnosed more than one production outage caused by a full disk because of Docker logs.
Always configure log rotation, either at the Docker daemon level or per container.
services:
api:
image: myapp:2.4.1
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
compress: "true"
For a global configuration, edit the /etc/docker/daemon.json file:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Conclusion
These seven mistakes aren't theoretical edge cases. They're problems I run into regularly, including in experienced teams. Docker simplifies deployment, but it doesn't eliminate complexity -- it just shifts it.
To recap the essential best practices:
- Run your containers as a non-root user
- Tag your images with explicit versions
- Implement healthchecks that genuinely test the application's state
- Use a secrets manager, not environment variables
- Manage volume permissions with fixed UIDs/GIDs
- Define CPU and memory limits for every container
- Configure log rotation from the very first deployment
None of these fixes is complex on its own. It's their systematic application that makes the difference between a fragile Docker environment and a reliable production infrastructure. If you're just getting started with Docker, my Docker tutorial covers the fundamentals before tackling these advanced topics.
Comments