Let's be honest: we all have a scripts/ folder full of bash files thrown together in a hurry, with nested sed calls, dangling if statements missing their fi, and comments like "don't touch, it works". The day you need to deploy the same config across 15 servers, you copy-paste over SSH like a digital medieval craftsman. And when it breaks, nobody knows what changed, when, or why.
Ansible solves this problem. No client to install on the target machines, no central database, no esoteric language to learn. Just YAML, SSH, and an idempotency logic that guarantees running a playbook ten times always produces the same result. It's the tool every sysadmin should master before claiming to do DevOps.
This guide is pragmatic. We start from scratch, ramp up, and finish with a clean project layout. No abstract theory: code, concrete examples, and the classic mistakes to avoid.
The inventory: the foundation of everything
The Ansible inventory describes your infrastructure: which machines exist, how to reach them, and how to group them logically. Without a well-structured inventory, you're just looping SSH commands with extra steps.
INI format: simple and readable
The INI format remains the most common for small infrastructures. Each bracketed section defines a group:
[webservers]
web01.example.com ansible_host=192.168.1.10
web02.example.com ansible_host=192.168.1.11
[databases]
db01.example.com ansible_host=192.168.1.20 ansible_port=2222
[production:children]
webservers
databases
[production:vars]
ansible_user=deploy
ansible_python_interpreter=/usr/bin/python3
The :children directive lets you create groups of groups. The :vars directive applies variables to all members of a group. This is the basis of hierarchical organization.
YAML format: for complex infrastructures
The YAML format offers more flexibility as the inventory grows:
all:
children:
webservers:
hosts:
web01.example.com:
ansible_host: 192.168.1.10
http_port: 8080
web02.example.com:
ansible_host: 192.168.1.11
http_port: 80
databases:
hosts:
db01.example.com:
ansible_host: 192.168.1.20
vars:
db_engine: postgresql
db_port: 5432
Whichever format you choose, test your inventory with ansible-inventory --list -i inventory.yml to verify that group and variable resolution is correct.
Your first playbook: securing a server
An Ansible playbook is a YAML file that describes the desired state of your machines. Here's a first concrete playbook that applies basic hardening to a freshly installed Debian/Ubuntu server:
---
- name: Basic server hardening
hosts: all
become: true
vars:
admin_user: deploy
ssh_port: 22
allowed_ssh_keys:
- "ssh-ed25519 AAAAC3... admin@workstation"
tasks:
- name: Update the APT cache and upgrade packages
ansible.builtin.apt:
update_cache: true
upgrade: safe
cache_valid_time: 3600
- name: Install base packages
ansible.builtin.apt:
name:
- ufw
- fail2ban
- unattended-upgrades
- curl
- htop
state: present
- name: Create the administration user
ansible.builtin.user:
name: "{{ admin_user }}"
groups: sudo
shell: /bin/bash
create_home: true
state: present
- name: Deploy the authorized SSH keys
ansible.posix.authorized_key:
user: "{{ admin_user }}"
key: "{{ item }}"
state: present
loop: "{{ allowed_ssh_keys }}"
- name: Disable SSH password authentication
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: "^#?PasswordAuthentication"
line: "PasswordAuthentication no"
validate: "sshd -t -f %s"
notify: restart sshd
- name: Disable root login over SSH
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: "^#?PermitRootLogin"
line: "PermitRootLogin no"
validate: "sshd -t -f %s"
notify: restart sshd
- name: Configure UFW - default policy
community.general.ufw:
state: enabled
policy: deny
direction: incoming
- name: Allow SSH through UFW
community.general.ufw:
rule: allow
port: "{{ ssh_port }}"
proto: tcp
handlers:
- name: restart sshd
ansible.builtin.service:
name: sshd
state: restarted
Run this playbook with the following command:
ansible-playbook -i inventory.ini harden.yml --diff --check
The --check option simulates execution without changing anything. The --diff option shows the changes that would be applied. Always test in dry-run mode before applying to production.
The essential modules
Ansible ships with thousands of modules, but a handful cover 90% of a sysadmin's daily needs. Here are the ones you'll use most:
apt: package management
The apt module handles installing, updating and removing Debian/Ubuntu packages. The cache_valid_time option avoids running an apt update on every execution:
- name: Install a specific version of nginx
ansible.builtin.apt:
name: nginx=1.24.*
state: present
update_cache: true
cache_valid_time: 3600
template: dynamic configuration files
The template module uses Jinja2 to generate configuration files from variables. It's infinitely cleaner than a cascade of sed calls:
- name: Deploy the nginx configuration
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/sites-available/{{ domain }}.conf
owner: root
group: root
mode: "0644"
validate: "nginx -t -c %s"
notify: reload nginx
copy, service, lineinfile
The copy module transfers static files. The service module manages the state of systemd services. The lineinfile module modifies a specific line in an existing file, which is ideal for one-off adjustments without rewriting the whole file:
- name: Enable IPv4 forwarding
ansible.builtin.lineinfile:
path: /etc/sysctl.conf
regexp: "^#?net.ipv4.ip_forward"
line: "net.ipv4.ip_forward = 1"
notify: reload sysctl
- name: Ensure nginx is started and enabled
ansible.builtin.service:
name: nginx
state: started
enabled: true
Variables and facts
Ansible's power lies in its variable management. Variables let you factor out configuration and adapt behavior to each machine or group.
group_vars and host_vars
Ansible automatically loads variables from files organized by convention:
inventory/
├── hosts.yml
├── group_vars/
│ ├── all.yml # Global variables
│ ├── webservers.yml # Variables for the webservers group
│ └── databases.yml # Variables for the databases group
└── host_vars/
└── db01.example.com.yml # Host-specific variables
The precedence order is strict: host variables override group variables, which override those in all. Keep this mechanism in mind to avoid surprises.
Facts and register
Ansible automatically gathers information about each target machine (the facts). You can use them in your conditions and templates:
- name: Install the package based on the distribution
ansible.builtin.apt:
name: "{{ pkg_name }}"
state: present
when: ansible_facts['os_family'] == 'Debian'
- name: Check available disk space
ansible.builtin.command: df -h /
register: disk_usage
changed_when: false
- name: Alert if disk space is critical
ansible.builtin.debug:
msg: "Warning: critical disk space on {{ inventory_hostname }}"
when: disk_usage.stdout is search('9[0-9]%|100%')
The register keyword captures the output of a task into a reusable variable. Combined with when, it lets you build powerful conditional workflows. Note the use of changed_when: false to indicate that a read-only command changes nothing.
Handlers and idempotency
Idempotency is Ansible's core principle: a task only runs if the current state differs from the desired state. This is what lets you rerun a playbook without fear.
Handlers: reacting to changes
A handler is a task triggered only when another task reports a change via notify. Typical case: restarting nginx after modifying its configuration, but only if the configuration actually changed:
tasks:
- name: Deploy the virtual host
ansible.builtin.template:
src: vhost.conf.j2
dest: /etc/nginx/sites-available/mysite.conf
notify:
- validate nginx
- reload nginx
handlers:
- name: validate nginx
ansible.builtin.command: nginx -t
changed_when: false
- name: reload nginx
ansible.builtin.service:
name: nginx
state: reloaded
Fine-grained control with changed_when and failed_when
Some commands always return "changed" even when nothing moved. The changed_when and failed_when directives let you fine-tune the behavior:
- name: Check whether the SSL certificate expires soon
ansible.builtin.command: >
openssl x509 -checkend 2592000
-in /etc/ssl/certs/{{ domain }}.pem
register: cert_check
changed_when: false
failed_when: false
- name: Renew the certificate if needed
ansible.builtin.command: certbot renew --cert-name {{ domain }}
when: cert_check.rc != 0
Without changed_when: false, every playbook run would report a fictitious change on the verification task. This kind of parasitic noise makes execution reports unreadable and hides the real changes.
Roles and project organization
When a playbook grows past 200 lines, it's time to split it into roles. A role is a reusable module that encapsulates tasks, handlers, templates, variables and files around a single responsibility.
Creating a role with ansible-galaxy
ansible-galaxy init roles/hardening
# Creates the following structure:
roles/hardening/
├── defaults/main.yml # Default variables (low priority)
├── files/ # Static files to copy
├── handlers/main.yml # Role handlers
├── meta/main.yml # Metadata and dependencies
├── tasks/main.yml # Main tasks
├── templates/ # Jinja2 templates
└── vars/main.yml # Role variables (high priority)
Structure of a complete project
Here's the recommended layout for a medium-sized Ansible project:
ansible-project/
├── ansible.cfg
├── inventory/
│ ├── production/
│ │ ├── hosts.yml
│ │ ├── group_vars/
│ │ └── host_vars/
│ └── staging/
│ ├── hosts.yml
│ └── group_vars/
├── playbooks/
│ ├── site.yml # Main playbook
│ ├── webservers.yml
│ └── databases.yml
├── roles/
│ ├── common/
│ ├── hardening/
│ ├── nginx/
│ └── postgresql/
└── requirements.yml # External Galaxy roles
The ansible.cfg file at the project root centralizes the local configuration:
[defaults]
inventory = inventory/production/hosts.yml
roles_path = roles
retry_files_enabled = false
stdout_callback = yaml
[privilege_escalation]
become = true
become_method = sudo
[ssh_connection]
pipelining = true
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
Enabling pipelining significantly reduces execution time by cutting down the number of SSH connections needed.
Common mistakes and debugging
Even with an idempotent tool, you make mistakes. Here are the classic pitfalls and the tools to diagnose them.
Verbosity levels
# Increasing verbosity
ansible-playbook site.yml -v # Shows task results
ansible-playbook site.yml -vv # Shows module parameters
ansible-playbook site.yml -vvv # Shows SSH connections
ansible-playbook site.yml -vvvv # Shows everything, including injected scripts
Check and diff mode
Always validate before applying. The --check --diff duo is your safety net:
# Simulation with the differences displayed
ansible-playbook site.yml --check --diff
# Limit to a group or a host
ansible-playbook site.yml --limit webservers --check --diff
# Limit to a specific task by tag
ansible-playbook site.yml --tags "firewall" --check --diff
The most frequent errors
Error: "Undefined variable" -- You're using a variable that doesn't exist in the current context. Check thegroup_vars/host_vars/defaultshierarchy and use the| default('value')filter for optional variables.
Error: "Permission denied" -- You forgotbecome: truein the playbook or in the task. If sudo requires a password, add--ask-become-passto the command.
Error: "Module not found" -- The module belongs to a collection that isn't installed. Install it with ansible-galaxy collection install community.general.
One last debugging tip: the ansible.builtin.debug module is your best ally. Use it to inspect variables and facts at runtime:
- name: Inspect variables for debugging
ansible.builtin.debug:
msg: |
Hostname: {{ inventory_hostname }}
OS: {{ ansible_facts['distribution'] }} {{ ansible_facts['distribution_version'] }}
IP: {{ ansible_facts['default_ipv4']['address'] }}
RAM: {{ ansible_facts['memtotal_mb'] }} MB
Conclusion
Ansible turns artisanal system administration into reproducible engineering. Let's recap the key points:
- The inventory structures your infrastructure into logical groups with hierarchical variables.
- Playbooks describe the desired state of your machines in a declarative, idempotent way.
- The modules
apt,template,service,lineinfileandcopycover most daily needs. - Variables and facts let you adapt behavior to each machine without duplicating code.
- Handlers ensure services only restart when necessary.
- Roles break complexity down into reusable, testable modules.
- Check/diff mode is your safety net before any change in production.
Start small: a playbook that manages your SSH keys and your base configuration. Then expand gradually. Every manual task you convert into Ansible is a task you'll never have to do by hand again, nor explain to your colleague on a Friday night.
The official Ansible documentation remains the most complete and up-to-date reference for digging deeper into each module and concept covered in this article.
Comments