Ansible for sysadmins: automate without breaking everything

A practical Ansible guide for system administrators: inventory, playbooks, essential modules, roles and best practices to automate your Linux infrastructure with confidence.

Let's be honest: we all have a scripts/ folder full of bash files thrown together in a hurry, with nested sed calls, dangling if statements missing their fi, and comments like "don't touch, it works". The day you need to deploy the same config across 15 servers, you copy-paste over SSH like a digital medieval craftsman. And when it breaks, nobody knows what changed, when, or why.

Ansible solves this problem. No client to install on the target machines, no central database, no esoteric language to learn. Just YAML, SSH, and an idempotency logic that guarantees running a playbook ten times always produces the same result. It's the tool every sysadmin should master before claiming to do DevOps.

This guide is pragmatic. We start from scratch, ramp up, and finish with a clean project layout. No abstract theory: code, concrete examples, and the classic mistakes to avoid.

The inventory: the foundation of everything

The Ansible inventory describes your infrastructure: which machines exist, how to reach them, and how to group them logically. Without a well-structured inventory, you're just looping SSH commands with extra steps.

INI format: simple and readable

The INI format remains the most common for small infrastructures. Each bracketed section defines a group:

[webservers]
web01.example.com ansible_host=192.168.1.10
web02.example.com ansible_host=192.168.1.11

[databases]
db01.example.com ansible_host=192.168.1.20 ansible_port=2222

[production:children]
webservers
databases

[production:vars]
ansible_user=deploy
ansible_python_interpreter=/usr/bin/python3

The :children directive lets you create groups of groups. The :vars directive applies variables to all members of a group. This is the basis of hierarchical organization.

YAML format: for complex infrastructures

The YAML format offers more flexibility as the inventory grows:

all:
  children:
    webservers:
      hosts:
        web01.example.com:
          ansible_host: 192.168.1.10
          http_port: 8080
        web02.example.com:
          ansible_host: 192.168.1.11
          http_port: 80
    databases:
      hosts:
        db01.example.com:
          ansible_host: 192.168.1.20
      vars:
        db_engine: postgresql
        db_port: 5432

Whichever format you choose, test your inventory with ansible-inventory --list -i inventory.yml to verify that group and variable resolution is correct.

Your first playbook: securing a server

An Ansible playbook is a YAML file that describes the desired state of your machines. Here's a first concrete playbook that applies basic hardening to a freshly installed Debian/Ubuntu server:

---
- name: Basic server hardening
  hosts: all
  become: true
  vars:
    admin_user: deploy
    ssh_port: 22
    allowed_ssh_keys:
      - "ssh-ed25519 AAAAC3... admin@workstation"

  tasks:
    - name: Update the APT cache and upgrade packages
      ansible.builtin.apt:
        update_cache: true
        upgrade: safe
        cache_valid_time: 3600

    - name: Install base packages
      ansible.builtin.apt:
        name:
          - ufw
          - fail2ban
          - unattended-upgrades
          - curl
          - htop
        state: present

    - name: Create the administration user
      ansible.builtin.user:
        name: "{{ admin_user }}"
        groups: sudo
        shell: /bin/bash
        create_home: true
        state: present

    - name: Deploy the authorized SSH keys
      ansible.posix.authorized_key:
        user: "{{ admin_user }}"
        key: "{{ item }}"
        state: present
      loop: "{{ allowed_ssh_keys }}"

    - name: Disable SSH password authentication
      ansible.builtin.lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "^#?PasswordAuthentication"
        line: "PasswordAuthentication no"
        validate: "sshd -t -f %s"
      notify: restart sshd

    - name: Disable root login over SSH
      ansible.builtin.lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "^#?PermitRootLogin"
        line: "PermitRootLogin no"
        validate: "sshd -t -f %s"
      notify: restart sshd

    - name: Configure UFW - default policy
      community.general.ufw:
        state: enabled
        policy: deny
        direction: incoming

    - name: Allow SSH through UFW
      community.general.ufw:
        rule: allow
        port: "{{ ssh_port }}"
        proto: tcp

  handlers:
    - name: restart sshd
      ansible.builtin.service:
        name: sshd
        state: restarted

Run this playbook with the following command:

ansible-playbook -i inventory.ini harden.yml --diff --check

The --check option simulates execution without changing anything. The --diff option shows the changes that would be applied. Always test in dry-run mode before applying to production.

The essential modules

Ansible ships with thousands of modules, but a handful cover 90% of a sysadmin's daily needs. Here are the ones you'll use most:

apt: package management

The apt module handles installing, updating and removing Debian/Ubuntu packages. The cache_valid_time option avoids running an apt update on every execution:

- name: Install a specific version of nginx
  ansible.builtin.apt:
    name: nginx=1.24.*
    state: present
    update_cache: true
    cache_valid_time: 3600

template: dynamic configuration files

The template module uses Jinja2 to generate configuration files from variables. It's infinitely cleaner than a cascade of sed calls:

- name: Deploy the nginx configuration
  ansible.builtin.template:
    src: templates/nginx.conf.j2
    dest: /etc/nginx/sites-available/{{ domain }}.conf
    owner: root
    group: root
    mode: "0644"
    validate: "nginx -t -c %s"
  notify: reload nginx

copy, service, lineinfile

The copy module transfers static files. The service module manages the state of systemd services. The lineinfile module modifies a specific line in an existing file, which is ideal for one-off adjustments without rewriting the whole file:

- name: Enable IPv4 forwarding
  ansible.builtin.lineinfile:
    path: /etc/sysctl.conf
    regexp: "^#?net.ipv4.ip_forward"
    line: "net.ipv4.ip_forward = 1"
  notify: reload sysctl

- name: Ensure nginx is started and enabled
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true

Variables and facts

Ansible's power lies in its variable management. Variables let you factor out configuration and adapt behavior to each machine or group.

group_vars and host_vars

Ansible automatically loads variables from files organized by convention:

inventory/
├── hosts.yml
├── group_vars/
│   ├── all.yml          # Global variables
│   ├── webservers.yml   # Variables for the webservers group
│   └── databases.yml    # Variables for the databases group
└── host_vars/
    └── db01.example.com.yml  # Host-specific variables

The precedence order is strict: host variables override group variables, which override those in all. Keep this mechanism in mind to avoid surprises.

Facts and register

Ansible automatically gathers information about each target machine (the facts). You can use them in your conditions and templates:

- name: Install the package based on the distribution
  ansible.builtin.apt:
    name: "{{ pkg_name }}"
    state: present
  when: ansible_facts['os_family'] == 'Debian'

- name: Check available disk space
  ansible.builtin.command: df -h /
  register: disk_usage
  changed_when: false

- name: Alert if disk space is critical
  ansible.builtin.debug:
    msg: "Warning: critical disk space on {{ inventory_hostname }}"
  when: disk_usage.stdout is search('9[0-9]%|100%')

The register keyword captures the output of a task into a reusable variable. Combined with when, it lets you build powerful conditional workflows. Note the use of changed_when: false to indicate that a read-only command changes nothing.

Handlers and idempotency

Idempotency is Ansible's core principle: a task only runs if the current state differs from the desired state. This is what lets you rerun a playbook without fear.

Handlers: reacting to changes

A handler is a task triggered only when another task reports a change via notify. Typical case: restarting nginx after modifying its configuration, but only if the configuration actually changed:

tasks:
  - name: Deploy the virtual host
    ansible.builtin.template:
      src: vhost.conf.j2
      dest: /etc/nginx/sites-available/mysite.conf
    notify:
      - validate nginx
      - reload nginx

handlers:
  - name: validate nginx
    ansible.builtin.command: nginx -t
    changed_when: false

  - name: reload nginx
    ansible.builtin.service:
      name: nginx
      state: reloaded

Fine-grained control with changed_when and failed_when

Some commands always return "changed" even when nothing moved. The changed_when and failed_when directives let you fine-tune the behavior:

- name: Check whether the SSL certificate expires soon
  ansible.builtin.command: >
    openssl x509 -checkend 2592000
    -in /etc/ssl/certs/{{ domain }}.pem
  register: cert_check
  changed_when: false
  failed_when: false

- name: Renew the certificate if needed
  ansible.builtin.command: certbot renew --cert-name {{ domain }}
  when: cert_check.rc != 0

Without changed_when: false, every playbook run would report a fictitious change on the verification task. This kind of parasitic noise makes execution reports unreadable and hides the real changes.

Roles and project organization

When a playbook grows past 200 lines, it's time to split it into roles. A role is a reusable module that encapsulates tasks, handlers, templates, variables and files around a single responsibility.

Creating a role with ansible-galaxy

ansible-galaxy init roles/hardening
# Creates the following structure:
roles/hardening/
├── defaults/main.yml     # Default variables (low priority)
├── files/                # Static files to copy
├── handlers/main.yml     # Role handlers
├── meta/main.yml         # Metadata and dependencies
├── tasks/main.yml        # Main tasks
├── templates/            # Jinja2 templates
└── vars/main.yml         # Role variables (high priority)

Structure of a complete project

Here's the recommended layout for a medium-sized Ansible project:

ansible-project/
├── ansible.cfg
├── inventory/
│   ├── production/
│   │   ├── hosts.yml
│   │   ├── group_vars/
│   │   └── host_vars/
│   └── staging/
│       ├── hosts.yml
│       └── group_vars/
├── playbooks/
│   ├── site.yml            # Main playbook
│   ├── webservers.yml
│   └── databases.yml
├── roles/
│   ├── common/
│   ├── hardening/
│   ├── nginx/
│   └── postgresql/
└── requirements.yml        # External Galaxy roles

The ansible.cfg file at the project root centralizes the local configuration:

[defaults]
inventory = inventory/production/hosts.yml
roles_path = roles
retry_files_enabled = false
stdout_callback = yaml

[privilege_escalation]
become = true
become_method = sudo

[ssh_connection]
pipelining = true
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

Enabling pipelining significantly reduces execution time by cutting down the number of SSH connections needed.

Common mistakes and debugging

Even with an idempotent tool, you make mistakes. Here are the classic pitfalls and the tools to diagnose them.

Verbosity levels

# Increasing verbosity
ansible-playbook site.yml -v      # Shows task results
ansible-playbook site.yml -vv     # Shows module parameters
ansible-playbook site.yml -vvv    # Shows SSH connections
ansible-playbook site.yml -vvvv   # Shows everything, including injected scripts

Check and diff mode

Always validate before applying. The --check --diff duo is your safety net:

# Simulation with the differences displayed
ansible-playbook site.yml --check --diff

# Limit to a group or a host
ansible-playbook site.yml --limit webservers --check --diff

# Limit to a specific task by tag
ansible-playbook site.yml --tags "firewall" --check --diff

The most frequent errors

Error: "Undefined variable" -- You're using a variable that doesn't exist in the current context. Check the group_vars / host_vars / defaults hierarchy and use the | default('value') filter for optional variables.
Error: "Permission denied" -- You forgot become: true in the playbook or in the task. If sudo requires a password, add --ask-become-pass to the command.
Error: "Module not found" -- The module belongs to a collection that isn't installed. Install it with ansible-galaxy collection install community.general.

One last debugging tip: the ansible.builtin.debug module is your best ally. Use it to inspect variables and facts at runtime:

- name: Inspect variables for debugging
  ansible.builtin.debug:
    msg: |
      Hostname: {{ inventory_hostname }}
      OS: {{ ansible_facts['distribution'] }} {{ ansible_facts['distribution_version'] }}
      IP: {{ ansible_facts['default_ipv4']['address'] }}
      RAM: {{ ansible_facts['memtotal_mb'] }} MB

Conclusion

Ansible turns artisanal system administration into reproducible engineering. Let's recap the key points:

  • The inventory structures your infrastructure into logical groups with hierarchical variables.
  • Playbooks describe the desired state of your machines in a declarative, idempotent way.
  • The modules apt, template, service, lineinfile and copy cover most daily needs.
  • Variables and facts let you adapt behavior to each machine without duplicating code.
  • Handlers ensure services only restart when necessary.
  • Roles break complexity down into reusable, testable modules.
  • Check/diff mode is your safety net before any change in production.

Start small: a playbook that manages your SSH keys and your base configuration. Then expand gradually. Every manual task you convert into Ansible is a task you'll never have to do by hand again, nor explain to your colleague on a Friday night.

The official Ansible documentation remains the most complete and up-to-date reference for digging deeper into each module and concept covered in this article.

Did you enjoy this article?

Comments

Morgann Riu

Cybersecurity and Linux administration expert. I help companies secure and optimize their critical infrastructures.

Back to the blog

Checklist Sécurité Linux

30 points essentiels pour sécuriser un serveur Linux. Recevez aussi les nouveaux tutoriels par email.

Pas de spam. Désabonnement en 1 clic.