VOOZH about

URL: https://tech-insider.org/ansible-tutorial-it-automation-13-steps-2026/

⇱ Ansible Tutorial: Build IT Automation in 13 Steps [2026]


Skip to content
May 3, 2026
21 min read

Ansible has quietly become the default control plane for IT operations teams who refuse to write yet another bash script. With roughly 63,000 GitHub stars, an active release train that just shipped ansible-core 2.20.5 on April 21, 2026, and a 2.21 beta already in the hands of testers, the project keeps proving that agentless automation still has the most use per line of code. This Ansible tutorial walks you from a clean Ubuntu 24.04 box to a working multi-node deployment in 13 steps, using only the tools that ship with the latest stable release.

Unlike the screenshots you may have read in 2022, the 2026 toolchain has tightened a great deal. Execution environments are now first-class, ansible-navigator replaces ad-hoc ansible-playbook invocations in production, and the Galaxy collection ecosystem has consolidated into namespaced bundles for AWS, Azure, GCP, and Kubernetes. We will cover all of it, plus the eight troubleshooting items and pitfalls that catch newcomers before they reach their first idempotent run.

Why Ansible Still Wins the Configuration Management Fight in 2026

The configuration management space looked very different five years ago. Puppet, Chef, and SaltStack split the enterprise market while Ansible chipped away at the long tail with a YAML-first, agentless model that ran over plain SSH. By April 2026 that long tail has become the head of the curve. Red Hat's acquisition cemented the project's commercial footing, the Ansible Automation Platform 2.x release line gave operators a supportable execution-environment runtime, and the community split between ansible-core (the engine) and the ansible meta-package (the engine plus a curated bundle of collections) finally made dependency management sane.

What keeps Ansible relevant against Terraform, Pulumi, and Kubernetes operators is the agentless story. There is nothing to install on a target node beyond Python, and even that requirement has been relaxed in many modules through raw command execution. For a sysadmin who needs to patch 400 RHEL boxes on Saturday morning, the equation is simple: an inventory file, an SSH key, and a playbook. No Helm chart, no provider plugin, no state file to corrupt. That simplicity is exactly why this Ansible tutorial uses the same toolchain that Fortune 500 operations teams ship to production.

The release cadence also matters. According to endoflife.date, ansible-core 2.20 became the latest stable on November 3, 2025 and is supported through May 31, 2027. Version 2.19 stays in maintained status until November 30, 2026. Version 2.18 hits end-of-life on May 31, 2026, so anyone still on it should plan an upgrade before summer. The 2.21 beta line has already produced 2.21.0b3 (April 13, 2026) and the Community Package 14.0.0 will pin to it once it ships.

Prerequisites and Versions for This Ansible Tutorial

Before we touch a single playbook, lock down a known-good baseline. The control node is the workstation that runs ansible-playbook; managed nodes are the servers it talks to. The control node has the strict requirements; managed nodes only need a Python interpreter and an SSH login.

πŸ‘ Prerequisites and Versions for This Ansible Tutorial
ComponentVersion Used in This TutorialNotes
Control node OSUbuntu 24.04 LTS or Fedora 41WSL2 works; native Windows control node not supported
ansible-core2.20.5 (April 21, 2026)Install from PyPI or distro packages
Python on control node3.11, 3.12, or 3.133.10 dropped in core 2.19
Python on managed nodes3.8 minimum, 3.11+ recommendedraw module bypasses this if needed
OpenSSH client8.0 or newerControlPersist required for performance
ansible-lintlatest version from PyPICatches 200+ rule violations
Molecule (optional)latest version from PyPIRole testing harness

For the worked example you will need three Linux machines: one control node and two managed nodes. Cheap options include three Ubuntu droplets on DigitalOcean, three EC2 t3.micro instances, or three LXD containers on a single laptop. Allocate at least 1 GB of RAM per node so you do not fight the OOM killer when collections expand.

Step 1: Install ansible-core 2.20 on the Control Node

Distro packages lag the upstream release by months, sometimes more than a year. The clean approach in 2026 is a Python virtual environment with pip. That isolates ansible-core, its collections, and any Python SDKs (boto3, kubernetes, azure-mgmt) from your system Python.

# Create a clean working directory and venv
mkdir -p ~/ansible-tutorial && cd ~/ansible-tutorial
python3 -m venv .venv
source .venv/bin/activate

# Upgrade pip and install the latest ansible-core stable
pip install --upgrade pip
pip install "ansible-core==2.20.5"

# Verify the install
ansible --version

The expected output should look like this:

ansible [core 2.20.5]
 config file = None
 configured module search path = ['/home/you/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
 ansible python module location = /home/you/ansible-tutorial/.venv/lib/python3.12/site-packages/ansible
 executable location = /home/you/ansible-tutorial/.venv/bin/ansible
 python version = 3.12.3 (main, Jan 17 2026, 14:28:11) [GCC 13.2.0]
 jinja version = 3.1.4
 libyaml = True

If you prefer the curated meta-package with hundreds of collections pre-bundled, install ansible instead of ansible-core. The meta-package weighs about 600 MB unpacked and pulls in collections for AWS, Azure, VMware, and dozens more out of the box. For learning, ansible-core plus targeted collections keeps the surface area smaller.

Step 2: Configure SSH Access to Managed Nodes

Ansible pushes work over SSH. If you cannot SSH into a node by hand, ansible cannot either. Generate a dedicated key pair (do not reuse your personal key) and copy the public half to each managed node.

# Generate an Ed25519 key pair without passphrase for automation
ssh-keygen -t ed25519 -f ~/.ssh/ansible_id -N "" -C "ansible-control@$(hostname)"

# Copy to each managed node (replace IPs with yours)
for host in 10.0.0.21 10.0.0.22; do
 ssh-copy-id -i ~/.ssh/ansible_id.pub ubuntu@$host
done

# Sanity check: should print uname output without prompting for a password
for host in 10.0.0.21 10.0.0.22; do
 ssh -i ~/.ssh/ansible_id -o BatchMode=yes ubuntu@$host uname -a
done

Production deployments should use a CA-signed SSH key with short-lived certificates rather than long-lived static keys, but for this Ansible tutorial a static Ed25519 key is fine. Make sure the private key permissions are 600; sshd refuses to use looser keys and Ansible will fall back to password auth without a clear error.

Step 3: Build a Static Inventory File

The inventory is Ansible's source of truth for which hosts exist. INI and YAML formats are both supported; YAML scales better once you add group variables. Create inventory.yml in your project directory:

# inventory.yml
all:
 vars:
 ansible_user: ubuntu
 ansible_ssh_private_key_file: ~/.ssh/ansible_id
 ansible_python_interpreter: /usr/bin/python3
 children:
 webservers:
 hosts:
 web01:
 ansible_host: 10.0.0.21
 web02:
 ansible_host: 10.0.0.22
 databases:
 hosts:
 db01:
 ansible_host: 10.0.0.31

Verify the inventory parses cleanly and that you can reach every host with the built-in ping module (it does not use ICMP; it runs a no-op Python check over SSH).

ansible-inventory -i inventory.yml --graph
ansible -i inventory.yml all -m ping

A successful ping returns SUCCESS => { "changed": false, "ping": "pong" } for each host. If you see UNREACHABLE, check the SSH key path, the username, and that the firewall allows port 22 from the control node.

Step 4: Write Your First Playbook

A playbook is a YAML file describing the desired state. Each play targets a group, runs as a specific user, and lists tasks. Tasks call modules. Modules are idempotent: run them twice and the second invocation reports changed: false.

πŸ‘ Step 4: Write Your First Playbook
# site.yml
---
- name: Baseline configuration for all hosts
 hosts: all
 become: true
 tasks:
 - name: Ensure base packages are installed
 ansible.builtin.package:
 name:
 - curl
 - vim
 - htop
 - ca-certificates
 state: present
 update_cache: true

 - name: Set the system timezone to UTC
 community.general.timezone:
 name: Etc/UTC

 - name: Create a deploy user
 ansible.builtin.user:
 name: deploy
 shell: /bin/bash
 groups: sudo
 append: true
 create_home: true

Run it with:

ansible-playbook -i inventory.yml site.yml

The community.general.timezone module lives in the community.general collection, which is bundled with the ansible meta-package but ships separately for ansible-core users. Install it with ansible-galaxy collection install community.general. The first run will report several changed tasks; the second run, against the same nodes, should report all ok with zero changes. That is idempotency, the single most important property of a configuration management system.

Step 5: Use Variables, Facts, and Templates

Variables turn a playbook from a shell script into a configuration system. Ansible has six places variables can live: command line, inventory group_vars, inventory host_vars, role defaults, role vars, and play vars. Precedence is documented but easy to forget; stick to group_vars/ and host_vars/ for almost everything.

# group_vars/webservers.yml
nginx_worker_connections: 1024
nginx_server_name: www.example.com
app_release: "2026.04.03"

# host_vars/web01.yml
nginx_server_name: www.example.com
deployment_role: primary

Facts are variables Ansible discovers about each managed node automatically. They cover everything from CPU count to network interfaces. Inspect them with ansible -i inventory.yml web01 -m setup; the output is a multi-thousand-line JSON document. Useful facts for templates include ansible_facts.distribution, ansible_facts.distribution_version, ansible_facts.processor_vcpus, and ansible_facts.default_ipv4.address.

Templates use Jinja2 syntax. Save the following as templates/nginx.conf.j2:

# templates/nginx.conf.j2
worker_processes {{ ansible_facts.processor_vcpus }};
events {
 worker_connections {{ nginx_worker_connections }};
}
http {
 server {
 listen 80;
 server_name {{ nginx_server_name }};
 location / {
 return 200 "Hello from {{ inventory_hostname }} ({{ ansible_facts.default_ipv4.address }})n";
 }
 }
}

Render it on each web host with the ansible.builtin.template module:

- name: Render nginx config
 ansible.builtin.template:
 src: nginx.conf.j2
 dest: /etc/nginx/nginx.conf
 owner: root
 group: root
 mode: "0644"
 validate: nginx -t -c %s
 notify: Restart nginx

The validate argument runs a syntax check against the rendered file before Ansible installs it. If the validation command exits non-zero, the file is never written. That single line has saved more outages than any other feature in the entire toolchain.

Step 6: Refactor Into Reusable Roles

Roles are the unit of reuse in Ansible. A role is a directory with a strict layout: tasks/, handlers/, defaults/, vars/, templates/, files/, and meta/. Generate the skeleton with the official tool:

mkdir -p roles
ansible-galaxy role init roles/nginx
ansible-galaxy role init roles/postgres

Move the nginx tasks from site.yml into roles/nginx/tasks/main.yml, the template into roles/nginx/templates/, and the variables into roles/nginx/defaults/main.yml. The top-level playbook becomes a thin wrapper:

# site.yml
---
- name: Configure web tier
 hosts: webservers
 become: true
 roles:
 - role: nginx
 tags: [web, nginx]

- name: Configure database tier
 hosts: databases
 become: true
 roles:
 - role: postgres
 tags: [db, postgres]

Tags let you target a subset on demand: ansible-playbook -i inventory.yml site.yml --tags nginx only runs the nginx role across the webserver group. Combined with --limit web01 you get surgical execution: one role, one host, with the rest of the fleet untouched.

Step 7: Manage Secrets With Ansible Vault

Hard-coded secrets in YAML are how breaches start. Ansible Vault encrypts variables (and entire files) with AES-256, so you can commit the encrypted blob to git and only decrypt at runtime. Create an encrypted variables file:

ansible-vault create group_vars/databases/vault.yml
# Editor opens; add:
# vault_postgres_password: "S3cret!Passw0rd"
# vault_replication_password: "AnotherS3cret"

# Reference the vaulted variable from a non-vault file
cat > group_vars/databases/vars.yml <<EOF
postgres_password: "{{ vault_postgres_password }}"
replication_password: "{{ vault_replication_password }}"
EOF

# Run the playbook with vault password prompt
ansible-playbook -i inventory.yml site.yml --ask-vault-pass

For CI environments, store the vault password in a file outside the repo and pass --vault-password-file ~/.ansible_vault_pass. Even better, integrate with Ansible Automation Platform's credential store or a HashiCorp Vault lookup so the secret never sits on disk in any form.

Step 8: Add Galaxy Collections for AWS, Azure, and Kubernetes

Collections are the modern packaging format. Where the old monolithic Ansible 2.9 had thousands of modules in one tree, the modern world has them split by namespace. The big three cloud collections plus Kubernetes are essential for any non-trivial automation:

πŸ‘ Step 8: Add Galaxy Collections for AWS, Azure, and Kubernetes
# requirements.yml
---
collections:
 - name: amazon.aws
 version: ">=8.0.0"
 - name: community.aws
 - name: azure.azcollection
 - name: google.cloud
 - name: kubernetes.core
 - name: community.general
 - name: community.crypto

roles: []
# Install everything declared in requirements.yml
ansible-galaxy collection install -r requirements.yml --upgrade

# Sample task: launch an EC2 instance
- name: Launch web server in AWS
 amazon.aws.ec2_instance:
 name: web03
 instance_type: t3.micro
 image_id: ami-0c02fb55956c7d316
 region: us-east-1
 key_name: ansible_id
 network:
 assign_public_ip: true
 security_group: web-sg
 tags:
 ManagedBy: ansible
 Environment: production

Collections are versioned independently, so you can pin amazon.aws at 8.0.0 while letting community.general float. Browse the full catalog at Ansible Galaxy; every collection lists its supported ansible-core versions in the README.

Step 9: Use Dynamic Inventory for Cloud Infrastructure

A static inventory file does not scale to autoscaling groups. Dynamic inventory plugins query the cloud API at runtime and produce hosts on the fly. The AWS plugin lives in amazon.aws:

# inventory_aws.aws_ec2.yml (filename suffix matters)
plugin: amazon.aws.aws_ec2
regions:
 - us-east-1
 - us-west-2
filters:
 tag:Environment: production
 instance-state-name: running
keyed_groups:
 - prefix: tag
 key: tags
 - prefix: az
 key: placement.availability_zone
hostnames:
 - tag:Name
 - private-ip-address
compose:
 ansible_host: private_ip_address
export AWS_PROFILE=prod
ansible-inventory -i inventory_aws.aws_ec2.yml --graph
ansible -i inventory_aws.aws_ec2.yml tag_role_web -m ping

Dynamic inventory respects the same group_vars/ and host_vars/ directories, so you can layer static configuration over discovered hosts. For Kubernetes, the equivalent plugin is kubernetes.core.k8s; for Azure, azure.azcollection.azure_rm; for GCP, google.cloud.gcp_compute. All four are covered by the same dynamic inventory specification, which keeps the mental model consistent across providers.

Step 10: Test Roles With Molecule and Containers

Molecule is the standard test harness for Ansible roles. It spins up ephemeral containers (or VMs), applies the role, runs verification, and tears everything down. The 2026 stack uses molecule-plugins[docker] with Podman as the default driver on RHEL-family hosts.

pip install "molecule[docker]" "ansible-lint"
cd roles/nginx
molecule init scenario default

# molecule/default/molecule.yml
---
dependency:
 name: galaxy
driver:
 name: docker
platforms:
 - name: ubuntu2404
 image: geerlingguy/docker-ubuntu2404-ansible:latest
 pre_build_image: true
 - name: rocky9
 image: geerlingguy/docker-rockylinux9-ansible:latest
 pre_build_image: true
provisioner:
 name: ansible
verifier:
 name: ansible
# Full test cycle
molecule test

# Iterate quickly during development
molecule create
molecule converge
molecule verify
molecule destroy

The verify.yml file is where you assert post-conditions. Useful patterns include checking that a service is listening on a port, a config file contains an expected line, or a URL returns 200. Treat Molecule scenarios like unit tests: one happy path, one failure path, and any regression that has bitten you in production.

Step 11: Lint and Format With ansible-lint and yamllint

Style consistency matters more than you think. ansible-lint catches over 200 rule violations from naming conventions to deprecated syntax. yamllint catches the YAML problems that ansible-lint ignores. Run them in CI on every push.

pip install ansible-lint yamllint

# Lint the whole project
ansible-lint
yamllint .

# Auto-fix what is fixable
ansible-lint --fix

Configure both tools with dotfiles in the repo root:

# .ansible-lint
profile: production
exclude_paths:
 - .venv/
 - .cache/
skip_list:
 - yaml[line-length] # handled by yamllint with custom max

# .yamllint
extends: default
rules:
 line-length:
 max: 160
 truthy:
 allowed-values: ["true", "false"]
 comments:
 min-spaces-from-content: 1

The production profile is strict; it forces fully qualified collection names (ansible.builtin.copy rather than copy), bans the bare command module when a dedicated module exists, and rejects unsafe variable interpolation. New code should pass on profile production from day one; legacy code can start at min and ratchet up.

Step 12: Run Playbooks From CI With GitHub Actions

Manual ansible-playbook runs from a developer laptop are how change history gets lost. Pipe everything through CI so every change has a reviewer, a build log, and a rollback path. The minimum useful pipeline runs lint on every PR and runs the full playbook (against a staging environment) on every merge to main.

πŸ‘ Step 12: Run Playbooks From CI With GitHub Actions
# .github/workflows/ansible.yml
name: ansible
on:
 pull_request:
 push:
 branches: [main]

jobs:
 lint:
 runs-on: ubuntu-24.04
 steps:
 - uses: actions/checkout@v4
 - uses: actions/setup-python@v5
 with:
 python-version: "3.12"
 - run: pip install "ansible-core==2.20.5" ansible-lint yamllint
 - run: ansible-galaxy collection install -r requirements.yml
 - run: ansible-lint
 - run: yamllint .

 deploy-staging:
 needs: lint
 if: github.ref == 'refs/heads/main'
 runs-on: ubuntu-24.04
 environment: staging
 steps:
 - uses: actions/checkout@v4
 - uses: actions/setup-python@v5
 with:
 python-version: "3.12"
 - run: pip install "ansible-core==2.20.5"
 - run: ansible-galaxy collection install -r requirements.yml
 - name: Run playbook
 env:
 ANSIBLE_VAULT_PASSWORD: ${{ secrets.ANSIBLE_VAULT_PASSWORD }}
 ANSIBLE_HOST_KEY_CHECKING: "False"
 run: |
 echo "$ANSIBLE_VAULT_PASSWORD" > .vault_pass
 ansible-playbook -i inventory_aws.aws_ec2.yml site.yml 
 --vault-password-file .vault_pass

For deeper coverage on the CI side, see our companion guide on GitHub Actions for production CI/CD; the same Ansible job pattern slots into GitLab CI, Jenkins, or Azure Pipelines with minimal changes.

Step 13: Deploy a Real Project: 3-Node Web Stack With PostgreSQL

Time to wire it all together. The complete project provisions two NGINX web servers and one PostgreSQL primary, registers the deploy user across all three, sets up firewall rules, and confirms reachability. Project layout:

ansible-tutorial/
β”œβ”€β”€ ansible.cfg
β”œβ”€β”€ inventory.yml
β”œβ”€β”€ requirements.yml
β”œβ”€β”€ site.yml
β”œβ”€β”€ group_vars/
β”‚ β”œβ”€β”€ all.yml
β”‚ β”œβ”€β”€ webservers.yml
β”‚ └── databases/
β”‚ β”œβ”€β”€ vars.yml
β”‚ └── vault.yml # ansible-vault encrypted
β”œβ”€β”€ host_vars/
β”‚ β”œβ”€β”€ web01.yml
β”‚ β”œβ”€β”€ web02.yml
β”‚ └── db01.yml
└── roles/
 β”œβ”€β”€ common/
 β”œβ”€β”€ nginx/
 └── postgres/
# ansible.cfg
[defaults]
inventory = inventory.yml
host_key_checking = False
retry_files_enabled = False
stdout_callback = yaml
forks = 20
collections_path = ./.ansible/collections
roles_path = ./roles

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=300s -o PreferredAuthentications=publickey

Run the entire stack:

ansible-galaxy collection install -r requirements.yml
ansible-playbook site.yml --ask-vault-pass

Expected output (truncated):

PLAY RECAP **************************************************
db01 : ok=14 changed=8 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
web01 : ok=22 changed=14 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
web02 : ok=22 changed=14 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

A second run should report changed=0 across the board. That is your proof that the playbook is fully idempotent and safe to schedule on a cron or trigger from CI without surprise side effects.

Bonus: Build a Production-Ready nginx Role From Scratch

The earlier steps gave you the layout. Here is the actual content of a production-quality nginx role you can drop into roles/nginx/. Reading this carefully is the difference between a tutorial that works on day one and a role that survives in production for years. Every block is annotated with the reasoning behind it.

# roles/nginx/defaults/main.yml
---
nginx_packages:
 - nginx
 - nginx-extras
nginx_user: www-data
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_client_max_body_size: "10m"
nginx_server_name: "_"
nginx_listen_port: 80
nginx_tls_enabled: false
nginx_tls_cert_path: "/etc/ssl/certs/site.crt"
nginx_tls_key_path: "/etc/ssl/private/site.key"
nginx_log_format_json: true
nginx_extra_repos: []
# roles/nginx/tasks/main.yml
---
- name: Add extra apt repositories
 ansible.builtin.apt_repository:
 repo: "{{ item }}"
 state: present
 update_cache: true
 loop: "{{ nginx_extra_repos }}"
 when:
 - ansible_facts.os_family == "Debian"
 - nginx_extra_repos | length > 0

- name: Install nginx packages
 ansible.builtin.package:
 name: "{{ nginx_packages }}"
 state: present

- name: Render nginx.conf
 ansible.builtin.template:
 src: nginx.conf.j2
 dest: /etc/nginx/nginx.conf
 owner: root
 group: root
 mode: "0644"
 backup: true
 validate: nginx -t -c %s
 notify: Reload nginx

- name: Render site config
 ansible.builtin.template:
 src: site.conf.j2
 dest: /etc/nginx/sites-available/default
 owner: root
 group: root
 mode: "0644"
 backup: true
 notify: Reload nginx

- name: Ensure nginx is enabled and running
 ansible.builtin.service:
 name: nginx
 state: started
 enabled: true
# roles/nginx/handlers/main.yml
---
- name: Reload nginx
 ansible.builtin.service:
 name: nginx
 state: reloaded

- name: Restart nginx
 ansible.builtin.service:
 name: nginx
 state: restarted

The two handlers cover the difference that surprises newcomers: reloaded reads new configuration without dropping connections; restarted kills the process and starts a fresh one, briefly interrupting traffic. Use reloaded for nginx, postgres, sshd, and any other long-running daemon that supports SIGHUP. Reserve restarted for cases where the binary itself changed (for example, after a package upgrade).

The backup: true option creates a timestamped copy of the previous config under /etc/nginx/nginx.conf.YYYY-MM-DD@HH:MM:SS~ before overwriting. When a deploy goes wrong at 2 AM, those backups are the fastest path to a working state. Keep them around for at least 30 days; rotate them with logrotate if disk pressure becomes a concern.

Tying it together, your roles/nginx/meta/main.yml declares supported platforms and dependencies. Galaxy uses this metadata when other developers consume your role, and Molecule consumes it when picking test images:

# roles/nginx/meta/main.yml
---
galaxy_info:
 author: ops-team
 description: Production-ready nginx with TLS and JSON logs
 license: MIT
 min_ansible_version: "2.18"
 platforms:
 - name: Ubuntu
 versions: [jammy, noble]
 - name: EL
 versions: ["8", "9"]
 - name: Debian
 versions: [bookworm]
 galaxy_tags:
 - web
 - nginx
 - tls
dependencies: []

Common Pitfalls and How to Avoid Them

Five mistakes catch almost every newcomer to Ansible. Understanding them up front saves hours of debugging later.

Pitfall 1: Forgetting to fully qualify module names. Writing copy instead of ansible.builtin.copy works today but raises a deprecation warning and may break under collection conflicts. Always use the FQCN form. ansible-lint with the production profile flags this automatically.

Pitfall 2: Mutating vars/ instead of defaults/ in roles. Anything in vars/main.yml has near-top precedence and overrides almost every external definition. Anything in defaults/main.yml can be overridden by group_vars and host_vars. Default to defaults; only use vars for true constants.

Pitfall 3: Missing become: true on tasks that need root. Ansible runs as the SSH user; package installs and service restarts need privilege escalation. Set become: true at the play level and turn it off per-task with become: false only where required.

Pitfall 4: Using shell or command when a module exists. Wrapping apt install nginx -y in a shell task is the anti-pattern. The ansible.builtin.package module is idempotent, reports change correctly, and works across distributions. Reach for raw shell only when no module covers the case.

Pitfall 5: Forgetting handlers when configs change. Editing /etc/nginx/nginx.conf does not restart nginx. The notify: directive on a task triggers a handler at the end of the play. If you forget the notify, configuration changes silently fail to take effect until the next reboot.

Eight Troubleshooting Items for Real-World Failures

SymptomLikely CauseFix
UNREACHABLE: ssh: connect to host ... port 22Firewall, wrong IP, or sshd downVerify with nc -vz host 22; check security groups
Permission denied (publickey)Wrong key path or wrong usernameTest ssh -i path user@host; fix ansible_user
sudo: a password is requiredPasswordless sudo not configuredAdd user to sudoers NOPASSWD or use --ask-become-pass
The module ... was not foundMissing collectionansible-galaxy collection install -r requirements.yml
FAILED! => YAML parsingInconsistent indent or tabsRun yamllint .; configure editor to insert spaces
changed on every run for templatesTrailing whitespace or newline driftAdd trim_blocks: yes in template, normalize source
Could not find or access ... fileWrong relative path resolutionUse {{ playbook_dir }}/path for absolute reference
Slow runs (10+ seconds per task)Pipelining off, no ControlPersistEnable both in ansible.cfg as shown in Step 13

For deeper diagnostics, run with -vvvv to see the SSH commands Ansible executes, the JSON returned by each module, and the connection plugin negotiation. The output is verbose but invaluable when modules silently misbehave.

πŸ‘ Eight Troubleshooting Items for Real-World Failures

Advanced Tips for Production Ansible

Once your playbook works, the next priority is keeping it fast, safe, and observable. These five practices separate hobbyist Ansible from production Ansible.

Use execution environments. An execution environment is a container image that pins ansible-core, every collection, and every Python dependency. Build one with ansible-builder, push it to your registry, and run all playbooks from inside it via ansible-navigator. Drift between developer laptops and CI disappears overnight.

Cap blast radius with serial. A play that targets 200 hosts and fails on host 1 will, by default, attempt the next 199. Set serial: 10% and max_fail_percentage: 5 to roll across the fleet in batches and abort early on widespread failure. Combined with strategy: linear (the default) this turns a runaway change into a manageable rolling deployment.

Ship structured logs. The community.general.log_plays callback writes per-host JSON to a log directory. Pipe those into your existing log aggregator (Loki, Splunk, Datadog) and you get per-task timing, change events, and failure reasons that survive past the terminal scrollback.

Adopt check mode and diff. Running with --check --diff shows what Ansible would change without applying anything. Wire it into pull-request automation so reviewers see a unified diff of every config file change before approving the merge. Most modules support check mode out of the box; the few that do not (mostly command and shell) should be wrapped in changed_when or replaced with proper modules.

Pair Ansible with Terraform for IaC. Terraform provisions infrastructure (VMs, networks, load balancers); Ansible configures it (packages, services, application code). Each tool stays in its lane and the combination scales further than either alone. See our Terraform AWS tutorial for the provisioning half of the story, or compare alternatives in Pulumi vs Terraform 2026.

Ansible vs Terraform vs Puppet: When to Use Which

DimensionAnsibleTerraformPuppet
Primary use caseConfiguration management, app deployInfrastructure provisioningConfiguration management at scale
ArchitectureAgentless over SSH/WinRMAgentless via APIsAgent-based pull
LanguageYAML + Jinja2HCLPuppet DSL (Ruby-flavored)
State modelPush, no central stateDeclarative state filePull, central manifest
Learning curveLowMediumHigh
Best fitHybrid OS fleets, app rolloutCloud infrastructureLarge RHEL or Windows estates

The honest answer is that production teams use multiple tools. Terraform spins up the VPC and EC2 instances; Ansible installs and configures the application stack; ArgoCD or Flux handles the Kubernetes workloads on top. Picking a single tool for everything is the exception, not the rule. For a side-by-side on the IaC layer specifically, see Terraform vs CloudFormation.

Performance Tuning: From 5 Forks to 200

The default forks = 5 in ansible.cfg is the single biggest performance constraint most teams hit. With five SSH connections in flight, a hundred-host playbook serializes into twenty waves of work. Bumping forks to 50 turns the same run into two waves. Memory cost is around 30 MB per fork on the control node, so 50 forks costs roughly 1.5 GB of RAM, which any modern laptop or CI runner can spare.

SSH pipelining is the second lever. Without it, every task makes three round trips: a temporary directory creation, a script upload, and the execution. With pipelining = True in ansible.cfg the script streams directly to the remote Python interpreter on stdin, collapsing the three round trips into one. Combined with ControlPersist (which keeps the SSH master socket open for five minutes after the first connection) the per-task latency drops by a factor of three on links with high RTT.

For very large fleets, switch the strategy plugin from linear (the default, where every host completes a task before the next starts) to free (each host runs its own task list as fast as it can). Free strategy can cut wall-clock time in half on heterogeneous fleets where some hosts are slower than others. The trade-off is that you lose synchronized barriers; if Task 5 depends on Task 4 finishing on every host, free strategy will not give you that guarantee.

Fact gathering is the third hidden cost. By default Ansible runs setup on every host at the start of every play. On a 500-host inventory that is 500 fact gathers, each taking a couple of seconds. Set gathering = smart with fact_caching = jsonfile in ansible.cfg and the second run reuses cached facts. Set gather_facts: false at the play level when the play does not actually use any facts, and the cost drops to zero.

Frequently Asked Questions

Which Ansible version should I install in 2026?

For new projects, install ansible-core 2.20.5 (the latest patch as of April 21, 2026). It is supported through May 31, 2027 and works with Python 3.11, 3.12, and 3.13 on the control node. Avoid 2.18, which reaches end-of-life on May 31, 2026, and treat 2.21 as a beta target only until the GA ships.

Do I need Python on the managed nodes?

Almost always yes. The vast majority of modules are Python scripts that Ansible copies to the target and executes. The exception is the raw module, which runs an arbitrary shell command and is useful for bootstrapping a node that does not yet have Python installed. Once Python 3 is in place, switch to proper modules.

Can Ansible manage Windows servers?

Yes, via WinRM or SSH. The control node still has to run Linux (or WSL2 on Windows). Use the ansible.windows and community.windows collections for Windows-native modules covering services, registry keys, scheduled tasks, and Active Directory operations.

What is the difference between ansible-core and ansible?

ansible-core is the engine plus the ansible.builtin collection only. The ansible meta-package on PyPI bundles ansible-core with hundreds of curated community collections. Ops teams that want full reproducibility usually pin ansible-core and declare collections explicitly in requirements.yml; learners often grab the meta-package for one-shot installation.

Is Ansible still relevant alongside Kubernetes?

Yes. Even in pure Kubernetes shops, the underlying nodes need OS hardening, kubelet configuration, container runtime setup, monitoring agents, and CIS benchmark enforcement. Ansible owns that layer. The kubernetes.core collection also lets you manage Kubernetes objects directly from playbooks when GitOps tools would be overkill. Compare orchestration approaches in Docker vs Kubernetes 2026.

How do I run Ansible against thousands of hosts?

Tune forks in ansible.cfg from the default 5 up to 50 or 100, enable SSH pipelining, use strategy: free when tasks are independent, and split the inventory into shards run in parallel CI jobs. For five-figure fleets, move to Ansible Automation Platform's controller (formerly Tower) which adds queueing, RBAC, and a UI on top of execution environments.

Where can I find official documentation and community help?

The reference manual lives at docs.ansible.com. Real-time discussion happens on the Ansible Community Forum, with the source and issue tracker on GitHub. PyPI download stats and release notes are at pypi.org/project/ansible-core.

Related Coverage

Bottom Line: Ansible Earns Its Place in 2026

If you started 2026 wondering whether Ansible still mattered, the release cadence and ecosystem health are clear answers. ansible-core 2.20 ships every few weeks with bug fixes; 2.21 is a quarter away from GA; the collection ecosystem covers every cloud provider plus Kubernetes, Windows, and network gear. The thirteen steps in this Ansible tutorial took you from pip install to a fully automated three-node deployment with vault-encrypted secrets, dynamic inventory, role-based reuse, container-tested code, and a CI pipeline. That is more than enough to start owning the configuration management layer in any serious production environment.

The remaining work is repetition. Pick a small slice of your existing manual operations, automate it with a single role, gate it behind --check --diff, and ship it through CI. Then pick the next slice. Within a quarter your runbook will read like a series of ansible-playbook commands rather than copy-pasted shell, and your on-call burden will drop accordingly. That is the payoff every production engineer who learned Ansible discovers, and it is why this sixty-three-thousand-star project is still rewriting its own future every six months.

πŸ‘ Sofia LindstrΓΆm

Sofia LindstrΓΆm

Editor-in-Chief

Sofia LindstrΓΆm is the Editor-in-Chief at Tech Insider, where she leads editorial strategy and oversees coverage across AI, cybersecurity, and enterprise technology. With over a decade in Swedish tech journalism, she previously served as technology editor at Dagens Industri and covered the Nordic startup ecosystem for Breakit. Sofia holds an MSc in Media Technology from KTH Royal Institute of Technology and is a frequent speaker at Web Summit and Slush. She is passionate about making complex technology accessible to business leaders.

View all articles
πŸ‘ Tech Insider
Tech
Insider

Tech Insider delivers in-depth coverage of the technologies shaping the future: AI, cybersecurity, cloud computing, hardware, and the trends that matter.

Company

Explore

Categories

Β© 2026 Tech Insider Media AB. All rights reserved.