Ansible has quietly become the default control plane for IT operations teams who refuse to write yet another bash script. With roughly 63,000 GitHub stars, an active release train that just shipped ansible-core 2.20.5 on April 21, 2026, and a 2.21 beta already in the hands of testers, the project keeps proving that agentless automation still has the most use per line of code. This Ansible tutorial walks you from a clean Ubuntu 24.04 box to a working multi-node deployment in 13 steps, using only the tools that ship with the latest stable release.
Unlike the screenshots you may have read in 2022, the 2026 toolchain has tightened a great deal. Execution environments are now first-class, ansible-navigator replaces ad-hoc ansible-playbook invocations in production, and the Galaxy collection ecosystem has consolidated into namespaced bundles for AWS, Azure, GCP, and Kubernetes. We will cover all of it, plus the eight troubleshooting items and pitfalls that catch newcomers before they reach their first idempotent run.
Why Ansible Still Wins the Configuration Management Fight in 2026
The configuration management space looked very different five years ago. Puppet, Chef, and SaltStack split the enterprise market while Ansible chipped away at the long tail with a YAML-first, agentless model that ran over plain SSH. By April 2026 that long tail has become the head of the curve. Red Hat's acquisition cemented the project's commercial footing, the Ansible Automation Platform 2.x release line gave operators a supportable execution-environment runtime, and the community split between ansible-core (the engine) and the ansible meta-package (the engine plus a curated bundle of collections) finally made dependency management sane.
What keeps Ansible relevant against Terraform, Pulumi, and Kubernetes operators is the agentless story. There is nothing to install on a target node beyond Python, and even that requirement has been relaxed in many modules through raw command execution. For a sysadmin who needs to patch 400 RHEL boxes on Saturday morning, the equation is simple: an inventory file, an SSH key, and a playbook. No Helm chart, no provider plugin, no state file to corrupt. That simplicity is exactly why this Ansible tutorial uses the same toolchain that Fortune 500 operations teams ship to production.
The release cadence also matters. According to endoflife.date, ansible-core 2.20 became the latest stable on November 3, 2025 and is supported through May 31, 2027. Version 2.19 stays in maintained status until November 30, 2026. Version 2.18 hits end-of-life on May 31, 2026, so anyone still on it should plan an upgrade before summer. The 2.21 beta line has already produced 2.21.0b3 (April 13, 2026) and the Community Package 14.0.0 will pin to it once it ships.
Prerequisites and Versions for This Ansible Tutorial
Before we touch a single playbook, lock down a known-good baseline. The control node is the workstation that runs ansible-playbook; managed nodes are the servers it talks to. The control node has the strict requirements; managed nodes only need a Python interpreter and an SSH login.
| Component | Version Used in This Tutorial | Notes |
|---|---|---|
| Control node OS | Ubuntu 24.04 LTS or Fedora 41 | WSL2 works; native Windows control node not supported |
| ansible-core | 2.20.5 (April 21, 2026) | Install from PyPI or distro packages |
| Python on control node | 3.11, 3.12, or 3.13 | 3.10 dropped in core 2.19 |
| Python on managed nodes | 3.8 minimum, 3.11+ recommended | raw module bypasses this if needed |
| OpenSSH client | 8.0 or newer | ControlPersist required for performance |
| ansible-lint | latest version from PyPI | Catches 200+ rule violations |
| Molecule (optional) | latest version from PyPI | Role testing harness |
For the worked example you will need three Linux machines: one control node and two managed nodes. Cheap options include three Ubuntu droplets on DigitalOcean, three EC2 t3.micro instances, or three LXD containers on a single laptop. Allocate at least 1 GB of RAM per node so you do not fight the OOM killer when collections expand.
Step 1: Install ansible-core 2.20 on the Control Node
Distro packages lag the upstream release by months, sometimes more than a year. The clean approach in 2026 is a Python virtual environment with pip. That isolates ansible-core, its collections, and any Python SDKs (boto3, kubernetes, azure-mgmt) from your system Python.
# Create a clean working directory and venv
mkdir -p ~/ansible-tutorial && cd ~/ansible-tutorial
python3 -m venv .venv
source .venv/bin/activate
# Upgrade pip and install the latest ansible-core stable
pip install --upgrade pip
pip install "ansible-core==2.20.5"
# Verify the install
ansible --version
The expected output should look like this:
ansible [core 2.20.5]
config file = None
configured module search path = ['/home/you/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/you/ansible-tutorial/.venv/lib/python3.12/site-packages/ansible
executable location = /home/you/ansible-tutorial/.venv/bin/ansible
python version = 3.12.3 (main, Jan 17 2026, 14:28:11) [GCC 13.2.0]
jinja version = 3.1.4
libyaml = True
If you prefer the curated meta-package with hundreds of collections pre-bundled, install ansible instead of ansible-core. The meta-package weighs about 600 MB unpacked and pulls in collections for AWS, Azure, VMware, and dozens more out of the box. For learning, ansible-core plus targeted collections keeps the surface area smaller.
Step 2: Configure SSH Access to Managed Nodes
Ansible pushes work over SSH. If you cannot SSH into a node by hand, ansible cannot either. Generate a dedicated key pair (do not reuse your personal key) and copy the public half to each managed node.
# Generate an Ed25519 key pair without passphrase for automation
ssh-keygen -t ed25519 -f ~/.ssh/ansible_id -N "" -C "ansible-control@$(hostname)"
# Copy to each managed node (replace IPs with yours)
for host in 10.0.0.21 10.0.0.22; do
ssh-copy-id -i ~/.ssh/ansible_id.pub ubuntu@$host
done
# Sanity check: should print uname output without prompting for a password
for host in 10.0.0.21 10.0.0.22; do
ssh -i ~/.ssh/ansible_id -o BatchMode=yes ubuntu@$host uname -a
done
Production deployments should use a CA-signed SSH key with short-lived certificates rather than long-lived static keys, but for this Ansible tutorial a static Ed25519 key is fine. Make sure the private key permissions are 600; sshd refuses to use looser keys and Ansible will fall back to password auth without a clear error.
Step 3: Build a Static Inventory File
The inventory is Ansible's source of truth for which hosts exist. INI and YAML formats are both supported; YAML scales better once you add group variables. Create inventory.yml in your project directory:
# inventory.yml
all:
vars:
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/ansible_id
ansible_python_interpreter: /usr/bin/python3
children:
webservers:
hosts:
web01:
ansible_host: 10.0.0.21
web02:
ansible_host: 10.0.0.22
databases:
hosts:
db01:
ansible_host: 10.0.0.31
Verify the inventory parses cleanly and that you can reach every host with the built-in ping module (it does not use ICMP; it runs a no-op Python check over SSH).
ansible-inventory -i inventory.yml --graph
ansible -i inventory.yml all -m ping
A successful ping returns SUCCESS => { "changed": false, "ping": "pong" } for each host. If you see UNREACHABLE, check the SSH key path, the username, and that the firewall allows port 22 from the control node.
Step 4: Write Your First Playbook
A playbook is a YAML file describing the desired state. Each play targets a group, runs as a specific user, and lists tasks. Tasks call modules. Modules are idempotent: run them twice and the second invocation reports changed: false.
# site.yml
---
- name: Baseline configuration for all hosts
hosts: all
become: true
tasks:
- name: Ensure base packages are installed
ansible.builtin.package:
name:
- curl
- vim
- htop
- ca-certificates
state: present
update_cache: true
- name: Set the system timezone to UTC
community.general.timezone:
name: Etc/UTC
- name: Create a deploy user
ansible.builtin.user:
name: deploy
shell: /bin/bash
groups: sudo
append: true
create_home: true
Run it with:
ansible-playbook -i inventory.yml site.yml
The community.general.timezone module lives in the community.general collection, which is bundled with the ansible meta-package but ships separately for ansible-core users. Install it with ansible-galaxy collection install community.general. The first run will report several changed tasks; the second run, against the same nodes, should report all ok with zero changes. That is idempotency, the single most important property of a configuration management system.
Step 5: Use Variables, Facts, and Templates
Variables turn a playbook from a shell script into a configuration system. Ansible has six places variables can live: command line, inventory group_vars, inventory host_vars, role defaults, role vars, and play vars. Precedence is documented but easy to forget; stick to group_vars/ and host_vars/ for almost everything.
# group_vars/webservers.yml
nginx_worker_connections: 1024
nginx_server_name: www.example.com
app_release: "2026.04.03"
# host_vars/web01.yml
nginx_server_name: www.example.com
deployment_role: primary
Facts are variables Ansible discovers about each managed node automatically. They cover everything from CPU count to network interfaces. Inspect them with ansible -i inventory.yml web01 -m setup; the output is a multi-thousand-line JSON document. Useful facts for templates include ansible_facts.distribution, ansible_facts.distribution_version, ansible_facts.processor_vcpus, and ansible_facts.default_ipv4.address.
Templates use Jinja2 syntax. Save the following as templates/nginx.conf.j2:
# templates/nginx.conf.j2
worker_processes {{ ansible_facts.processor_vcpus }};
events {
worker_connections {{ nginx_worker_connections }};
}
http {
server {
listen 80;
server_name {{ nginx_server_name }};
location / {
return 200 "Hello from {{ inventory_hostname }} ({{ ansible_facts.default_ipv4.address }})n";
}
}
}
Render it on each web host with the ansible.builtin.template module:
- name: Render nginx config
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: "0644"
validate: nginx -t -c %s
notify: Restart nginx
The validate argument runs a syntax check against the rendered file before Ansible installs it. If the validation command exits non-zero, the file is never written. That single line has saved more outages than any other feature in the entire toolchain.
Step 6: Refactor Into Reusable Roles
Roles are the unit of reuse in Ansible. A role is a directory with a strict layout: tasks/, handlers/, defaults/, vars/, templates/, files/, and meta/. Generate the skeleton with the official tool:
mkdir -p roles
ansible-galaxy role init roles/nginx
ansible-galaxy role init roles/postgres
Move the nginx tasks from site.yml into roles/nginx/tasks/main.yml, the template into roles/nginx/templates/, and the variables into roles/nginx/defaults/main.yml. The top-level playbook becomes a thin wrapper:
# site.yml
---
- name: Configure web tier
hosts: webservers
become: true
roles:
- role: nginx
tags: [web, nginx]
- name: Configure database tier
hosts: databases
become: true
roles:
- role: postgres
tags: [db, postgres]
Tags let you target a subset on demand: ansible-playbook -i inventory.yml site.yml --tags nginx only runs the nginx role across the webserver group. Combined with --limit web01 you get surgical execution: one role, one host, with the rest of the fleet untouched.
Step 7: Manage Secrets With Ansible Vault
Hard-coded secrets in YAML are how breaches start. Ansible Vault encrypts variables (and entire files) with AES-256, so you can commit the encrypted blob to git and only decrypt at runtime. Create an encrypted variables file:
ansible-vault create group_vars/databases/vault.yml
# Editor opens; add:
# vault_postgres_password: "S3cret!Passw0rd"
# vault_replication_password: "AnotherS3cret"
# Reference the vaulted variable from a non-vault file
cat > group_vars/databases/vars.yml <<EOF
postgres_password: "{{ vault_postgres_password }}"
replication_password: "{{ vault_replication_password }}"
EOF
# Run the playbook with vault password prompt
ansible-playbook -i inventory.yml site.yml --ask-vault-pass
For CI environments, store the vault password in a file outside the repo and pass --vault-password-file ~/.ansible_vault_pass. Even better, integrate with Ansible Automation Platform's credential store or a HashiCorp Vault lookup so the secret never sits on disk in any form.
Step 8: Add Galaxy Collections for AWS, Azure, and Kubernetes
Collections are the modern packaging format. Where the old monolithic Ansible 2.9 had thousands of modules in one tree, the modern world has them split by namespace. The big three cloud collections plus Kubernetes are essential for any non-trivial automation:
# requirements.yml
---
collections:
- name: amazon.aws
version: ">=8.0.0"
- name: community.aws
- name: azure.azcollection
- name: google.cloud
- name: kubernetes.core
- name: community.general
- name: community.crypto
roles: []
# Install everything declared in requirements.yml
ansible-galaxy collection install -r requirements.yml --upgrade
# Sample task: launch an EC2 instance
- name: Launch web server in AWS
amazon.aws.ec2_instance:
name: web03
instance_type: t3.micro
image_id: ami-0c02fb55956c7d316
region: us-east-1
key_name: ansible_id
network:
assign_public_ip: true
security_group: web-sg
tags:
ManagedBy: ansible
Environment: production
Collections are versioned independently, so you can pin amazon.aws at 8.0.0 while letting community.general float. Browse the full catalog at Ansible Galaxy; every collection lists its supported ansible-core versions in the README.
Step 9: Use Dynamic Inventory for Cloud Infrastructure
A static inventory file does not scale to autoscaling groups. Dynamic inventory plugins query the cloud API at runtime and produce hosts on the fly. The AWS plugin lives in amazon.aws:
# inventory_aws.aws_ec2.yml (filename suffix matters)
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
- us-west-2
filters:
tag:Environment: production
instance-state-name: running
keyed_groups:
- prefix: tag
key: tags
- prefix: az
key: placement.availability_zone
hostnames:
- tag:Name
- private-ip-address
compose:
ansible_host: private_ip_address
export AWS_PROFILE=prod
ansible-inventory -i inventory_aws.aws_ec2.yml --graph
ansible -i inventory_aws.aws_ec2.yml tag_role_web -m ping
Dynamic inventory respects the same group_vars/ and host_vars/ directories, so you can layer static configuration over discovered hosts. For Kubernetes, the equivalent plugin is kubernetes.core.k8s; for Azure, azure.azcollection.azure_rm; for GCP, google.cloud.gcp_compute. All four are covered by the same dynamic inventory specification, which keeps the mental model consistent across providers.
Step 10: Test Roles With Molecule and Containers
Molecule is the standard test harness for Ansible roles. It spins up ephemeral containers (or VMs), applies the role, runs verification, and tears everything down. The 2026 stack uses molecule-plugins[docker] with Podman as the default driver on RHEL-family hosts.
pip install "molecule[docker]" "ansible-lint"
cd roles/nginx
molecule init scenario default
# molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: ubuntu2404
image: geerlingguy/docker-ubuntu2404-ansible:latest
pre_build_image: true
- name: rocky9
image: geerlingguy/docker-rockylinux9-ansible:latest
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible
# Full test cycle
molecule test
# Iterate quickly during development
molecule create
molecule converge
molecule verify
molecule destroy
The verify.yml file is where you assert post-conditions. Useful patterns include checking that a service is listening on a port, a config file contains an expected line, or a URL returns 200. Treat Molecule scenarios like unit tests: one happy path, one failure path, and any regression that has bitten you in production.
Step 11: Lint and Format With ansible-lint and yamllint
Style consistency matters more than you think. ansible-lint catches over 200 rule violations from naming conventions to deprecated syntax. yamllint catches the YAML problems that ansible-lint ignores. Run them in CI on every push.
pip install ansible-lint yamllint
# Lint the whole project
ansible-lint
yamllint .
# Auto-fix what is fixable
ansible-lint --fix
Configure both tools with dotfiles in the repo root:
# .ansible-lint
profile: production
exclude_paths:
- .venv/
- .cache/
skip_list:
- yaml[line-length] # handled by yamllint with custom max
# .yamllint
extends: default
rules:
line-length:
max: 160
truthy:
allowed-values: ["true", "false"]
comments:
min-spaces-from-content: 1
The production profile is strict; it forces fully qualified collection names (ansible.builtin.copy rather than copy), bans the bare command module when a dedicated module exists, and rejects unsafe variable interpolation. New code should pass on profile production from day one; legacy code can start at min and ratchet up.
Step 12: Run Playbooks From CI With GitHub Actions
Manual ansible-playbook runs from a developer laptop are how change history gets lost. Pipe everything through CI so every change has a reviewer, a build log, and a rollback path. The minimum useful pipeline runs lint on every PR and runs the full playbook (against a staging environment) on every merge to main.
# .github/workflows/ansible.yml
name: ansible
on:
pull_request:
push:
branches: [main]
jobs:
lint:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install "ansible-core==2.20.5" ansible-lint yamllint
- run: ansible-galaxy collection install -r requirements.yml
- run: ansible-lint
- run: yamllint .
deploy-staging:
needs: lint
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-24.04
environment: staging
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install "ansible-core==2.20.5"
- run: ansible-galaxy collection install -r requirements.yml
- name: Run playbook
env:
ANSIBLE_VAULT_PASSWORD: ${{ secrets.ANSIBLE_VAULT_PASSWORD }}
ANSIBLE_HOST_KEY_CHECKING: "False"
run: |
echo "$ANSIBLE_VAULT_PASSWORD" > .vault_pass
ansible-playbook -i inventory_aws.aws_ec2.yml site.yml
--vault-password-file .vault_pass
For deeper coverage on the CI side, see our companion guide on GitHub Actions for production CI/CD; the same Ansible job pattern slots into GitLab CI, Jenkins, or Azure Pipelines with minimal changes.
Step 13: Deploy a Real Project: 3-Node Web Stack With PostgreSQL
Time to wire it all together. The complete project provisions two NGINX web servers and one PostgreSQL primary, registers the deploy user across all three, sets up firewall rules, and confirms reachability. Project layout:
ansible-tutorial/
βββ ansible.cfg
βββ inventory.yml
βββ requirements.yml
βββ site.yml
βββ group_vars/
β βββ all.yml
β βββ webservers.yml
β βββ databases/
β βββ vars.yml
β βββ vault.yml # ansible-vault encrypted
βββ host_vars/
β βββ web01.yml
β βββ web02.yml
β βββ db01.yml
βββ roles/
βββ common/
βββ nginx/
βββ postgres/
# ansible.cfg
[defaults]
inventory = inventory.yml
host_key_checking = False
retry_files_enabled = False
stdout_callback = yaml
forks = 20
collections_path = ./.ansible/collections
roles_path = ./roles
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=300s -o PreferredAuthentications=publickey
Run the entire stack:
ansible-galaxy collection install -r requirements.yml
ansible-playbook site.yml --ask-vault-pass
Expected output (truncated):
PLAY RECAP **************************************************
db01 : ok=14 changed=8 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
web01 : ok=22 changed=14 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
web02 : ok=22 changed=14 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
A second run should report changed=0 across the board. That is your proof that the playbook is fully idempotent and safe to schedule on a cron or trigger from CI without surprise side effects.
Bonus: Build a Production-Ready nginx Role From Scratch
The earlier steps gave you the layout. Here is the actual content of a production-quality nginx role you can drop into roles/nginx/. Reading this carefully is the difference between a tutorial that works on day one and a role that survives in production for years. Every block is annotated with the reasoning behind it.
# roles/nginx/defaults/main.yml
---
nginx_packages:
- nginx
- nginx-extras
nginx_user: www-data
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_client_max_body_size: "10m"
nginx_server_name: "_"
nginx_listen_port: 80
nginx_tls_enabled: false
nginx_tls_cert_path: "/etc/ssl/certs/site.crt"
nginx_tls_key_path: "/etc/ssl/private/site.key"
nginx_log_format_json: true
nginx_extra_repos: []
# roles/nginx/tasks/main.yml
---
- name: Add extra apt repositories
ansible.builtin.apt_repository:
repo: "{{ item }}"
state: present
update_cache: true
loop: "{{ nginx_extra_repos }}"
when:
- ansible_facts.os_family == "Debian"
- nginx_extra_repos | length > 0
- name: Install nginx packages
ansible.builtin.package:
name: "{{ nginx_packages }}"
state: present
- name: Render nginx.conf
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: "0644"
backup: true
validate: nginx -t -c %s
notify: Reload nginx
- name: Render site config
ansible.builtin.template:
src: site.conf.j2
dest: /etc/nginx/sites-available/default
owner: root
group: root
mode: "0644"
backup: true
notify: Reload nginx
- name: Ensure nginx is enabled and running
ansible.builtin.service:
name: nginx
state: started
enabled: true
# roles/nginx/handlers/main.yml
---
- name: Reload nginx
ansible.builtin.service:
name: nginx
state: reloaded
- name: Restart nginx
ansible.builtin.service:
name: nginx
state: restarted
The two handlers cover the difference that surprises newcomers: reloaded reads new configuration without dropping connections; restarted kills the process and starts a fresh one, briefly interrupting traffic. Use reloaded for nginx, postgres, sshd, and any other long-running daemon that supports SIGHUP. Reserve restarted for cases where the binary itself changed (for example, after a package upgrade).
The backup: true option creates a timestamped copy of the previous config under /etc/nginx/nginx.conf.YYYY-MM-DD@HH:MM:SS~ before overwriting. When a deploy goes wrong at 2 AM, those backups are the fastest path to a working state. Keep them around for at least 30 days; rotate them with logrotate if disk pressure becomes a concern.
Tying it together, your roles/nginx/meta/main.yml declares supported platforms and dependencies. Galaxy uses this metadata when other developers consume your role, and Molecule consumes it when picking test images:
# roles/nginx/meta/main.yml
---
galaxy_info:
author: ops-team
description: Production-ready nginx with TLS and JSON logs
license: MIT
min_ansible_version: "2.18"
platforms:
- name: Ubuntu
versions: [jammy, noble]
- name: EL
versions: ["8", "9"]
- name: Debian
versions: [bookworm]
galaxy_tags:
- web
- nginx
- tls
dependencies: []
Common Pitfalls and How to Avoid Them
Five mistakes catch almost every newcomer to Ansible. Understanding them up front saves hours of debugging later.
Pitfall 1: Forgetting to fully qualify module names. Writing copy instead of ansible.builtin.copy works today but raises a deprecation warning and may break under collection conflicts. Always use the FQCN form. ansible-lint with the production profile flags this automatically.
Pitfall 2: Mutating vars/ instead of defaults/ in roles. Anything in vars/main.yml has near-top precedence and overrides almost every external definition. Anything in defaults/main.yml can be overridden by group_vars and host_vars. Default to defaults; only use vars for true constants.
Pitfall 3: Missing become: true on tasks that need root. Ansible runs as the SSH user; package installs and service restarts need privilege escalation. Set become: true at the play level and turn it off per-task with become: false only where required.
Pitfall 4: Using shell or command when a module exists. Wrapping apt install nginx -y in a shell task is the anti-pattern. The ansible.builtin.package module is idempotent, reports change correctly, and works across distributions. Reach for raw shell only when no module covers the case.
Pitfall 5: Forgetting handlers when configs change. Editing /etc/nginx/nginx.conf does not restart nginx. The notify: directive on a task triggers a handler at the end of the play. If you forget the notify, configuration changes silently fail to take effect until the next reboot.
Eight Troubleshooting Items for Real-World Failures
| Symptom | Likely Cause | Fix |
|---|---|---|
UNREACHABLE: ssh: connect to host ... port 22 | Firewall, wrong IP, or sshd down | Verify with nc -vz host 22; check security groups |
Permission denied (publickey) | Wrong key path or wrong username | Test ssh -i path user@host; fix ansible_user |
sudo: a password is required | Passwordless sudo not configured | Add user to sudoers NOPASSWD or use --ask-become-pass |
The module ... was not found | Missing collection | ansible-galaxy collection install -r requirements.yml |
FAILED! => YAML parsing | Inconsistent indent or tabs | Run yamllint .; configure editor to insert spaces |
changed on every run for templates | Trailing whitespace or newline drift | Add trim_blocks: yes in template, normalize source |
Could not find or access ... file | Wrong relative path resolution | Use {{ playbook_dir }}/path for absolute reference |
| Slow runs (10+ seconds per task) | Pipelining off, no ControlPersist | Enable both in ansible.cfg as shown in Step 13 |
For deeper diagnostics, run with -vvvv to see the SSH commands Ansible executes, the JSON returned by each module, and the connection plugin negotiation. The output is verbose but invaluable when modules silently misbehave.
Advanced Tips for Production Ansible
Once your playbook works, the next priority is keeping it fast, safe, and observable. These five practices separate hobbyist Ansible from production Ansible.
Use execution environments. An execution environment is a container image that pins ansible-core, every collection, and every Python dependency. Build one with ansible-builder, push it to your registry, and run all playbooks from inside it via ansible-navigator. Drift between developer laptops and CI disappears overnight.
Cap blast radius with serial. A play that targets 200 hosts and fails on host 1 will, by default, attempt the next 199. Set serial: 10% and max_fail_percentage: 5 to roll across the fleet in batches and abort early on widespread failure. Combined with strategy: linear (the default) this turns a runaway change into a manageable rolling deployment.
Ship structured logs. The community.general.log_plays callback writes per-host JSON to a log directory. Pipe those into your existing log aggregator (Loki, Splunk, Datadog) and you get per-task timing, change events, and failure reasons that survive past the terminal scrollback.
Adopt check mode and diff. Running with --check --diff shows what Ansible would change without applying anything. Wire it into pull-request automation so reviewers see a unified diff of every config file change before approving the merge. Most modules support check mode out of the box; the few that do not (mostly command and shell) should be wrapped in changed_when or replaced with proper modules.
Pair Ansible with Terraform for IaC. Terraform provisions infrastructure (VMs, networks, load balancers); Ansible configures it (packages, services, application code). Each tool stays in its lane and the combination scales further than either alone. See our Terraform AWS tutorial for the provisioning half of the story, or compare alternatives in Pulumi vs Terraform 2026.
Ansible vs Terraform vs Puppet: When to Use Which
| Dimension | Ansible | Terraform | Puppet |
|---|---|---|---|
| Primary use case | Configuration management, app deploy | Infrastructure provisioning | Configuration management at scale |
| Architecture | Agentless over SSH/WinRM | Agentless via APIs | Agent-based pull |
| Language | YAML + Jinja2 | HCL | Puppet DSL (Ruby-flavored) |
| State model | Push, no central state | Declarative state file | Pull, central manifest |
| Learning curve | Low | Medium | High |
| Best fit | Hybrid OS fleets, app rollout | Cloud infrastructure | Large RHEL or Windows estates |
The honest answer is that production teams use multiple tools. Terraform spins up the VPC and EC2 instances; Ansible installs and configures the application stack; ArgoCD or Flux handles the Kubernetes workloads on top. Picking a single tool for everything is the exception, not the rule. For a side-by-side on the IaC layer specifically, see Terraform vs CloudFormation.
Performance Tuning: From 5 Forks to 200
The default forks = 5 in ansible.cfg is the single biggest performance constraint most teams hit. With five SSH connections in flight, a hundred-host playbook serializes into twenty waves of work. Bumping forks to 50 turns the same run into two waves. Memory cost is around 30 MB per fork on the control node, so 50 forks costs roughly 1.5 GB of RAM, which any modern laptop or CI runner can spare.
SSH pipelining is the second lever. Without it, every task makes three round trips: a temporary directory creation, a script upload, and the execution. With pipelining = True in ansible.cfg the script streams directly to the remote Python interpreter on stdin, collapsing the three round trips into one. Combined with ControlPersist (which keeps the SSH master socket open for five minutes after the first connection) the per-task latency drops by a factor of three on links with high RTT.
For very large fleets, switch the strategy plugin from linear (the default, where every host completes a task before the next starts) to free (each host runs its own task list as fast as it can). Free strategy can cut wall-clock time in half on heterogeneous fleets where some hosts are slower than others. The trade-off is that you lose synchronized barriers; if Task 5 depends on Task 4 finishing on every host, free strategy will not give you that guarantee.
Fact gathering is the third hidden cost. By default Ansible runs setup on every host at the start of every play. On a 500-host inventory that is 500 fact gathers, each taking a couple of seconds. Set gathering = smart with fact_caching = jsonfile in ansible.cfg and the second run reuses cached facts. Set gather_facts: false at the play level when the play does not actually use any facts, and the cost drops to zero.
Frequently Asked Questions
Which Ansible version should I install in 2026?
For new projects, install ansible-core 2.20.5 (the latest patch as of April 21, 2026). It is supported through May 31, 2027 and works with Python 3.11, 3.12, and 3.13 on the control node. Avoid 2.18, which reaches end-of-life on May 31, 2026, and treat 2.21 as a beta target only until the GA ships.
Do I need Python on the managed nodes?
Almost always yes. The vast majority of modules are Python scripts that Ansible copies to the target and executes. The exception is the raw module, which runs an arbitrary shell command and is useful for bootstrapping a node that does not yet have Python installed. Once Python 3 is in place, switch to proper modules.
Can Ansible manage Windows servers?
Yes, via WinRM or SSH. The control node still has to run Linux (or WSL2 on Windows). Use the ansible.windows and community.windows collections for Windows-native modules covering services, registry keys, scheduled tasks, and Active Directory operations.
What is the difference between ansible-core and ansible?
ansible-core is the engine plus the ansible.builtin collection only. The ansible meta-package on PyPI bundles ansible-core with hundreds of curated community collections. Ops teams that want full reproducibility usually pin ansible-core and declare collections explicitly in requirements.yml; learners often grab the meta-package for one-shot installation.
Is Ansible still relevant alongside Kubernetes?
Yes. Even in pure Kubernetes shops, the underlying nodes need OS hardening, kubelet configuration, container runtime setup, monitoring agents, and CIS benchmark enforcement. Ansible owns that layer. The kubernetes.core collection also lets you manage Kubernetes objects directly from playbooks when GitOps tools would be overkill. Compare orchestration approaches in Docker vs Kubernetes 2026.
How do I run Ansible against thousands of hosts?
Tune forks in ansible.cfg from the default 5 up to 50 or 100, enable SSH pipelining, use strategy: free when tasks are independent, and split the inventory into shards run in parallel CI jobs. For five-figure fleets, move to Ansible Automation Platform's controller (formerly Tower) which adds queueing, RBAC, and a UI on top of execution environments.
Where can I find official documentation and community help?
The reference manual lives at docs.ansible.com. Real-time discussion happens on the Ansible Community Forum, with the source and issue tracker on GitHub. PyPI download stats and release notes are at pypi.org/project/ansible-core.
Related Coverage
- How to Master Terraform: 12-Step AWS Tutorial with Modules and Remote State [2026]
- Terraform vs CloudFormation 2026: 3,000 Providers vs Zero-Cost Gap
- Pulumi vs Terraform 2026: 4,800 vs 1,800 Providers and a 76% Market Share Gap
- Docker vs Kubernetes 2026: 300K vs 95K Containers and a 3x Node Scaling Gap
- GitHub Actions Tutorial: 12 Steps to Production CI/CD [2026]
- AWS vs Azure 2026: 31% vs 24% Market Share and a 75% Archive Cost Gap
- Cloud Computing 2026: The Leading Pillar Guide
Bottom Line: Ansible Earns Its Place in 2026
If you started 2026 wondering whether Ansible still mattered, the release cadence and ecosystem health are clear answers. ansible-core 2.20 ships every few weeks with bug fixes; 2.21 is a quarter away from GA; the collection ecosystem covers every cloud provider plus Kubernetes, Windows, and network gear. The thirteen steps in this Ansible tutorial took you from pip install to a fully automated three-node deployment with vault-encrypted secrets, dynamic inventory, role-based reuse, container-tested code, and a CI pipeline. That is more than enough to start owning the configuration management layer in any serious production environment.
The remaining work is repetition. Pick a small slice of your existing manual operations, automate it with a single role, gate it behind --check --diff, and ship it through CI. Then pick the next slice. Within a quarter your runbook will read like a series of ansible-playbook commands rather than copy-pasted shell, and your on-call burden will drop accordingly. That is the payoff every production engineer who learned Ansible discovers, and it is why this sixty-three-thousand-star project is still rewriting its own future every six months.
Sofia LindstrΓΆm
Sofia LindstrΓΆm is the Editor-in-Chief at Tech Insider, where she leads editorial strategy and oversees coverage across AI, cybersecurity, and enterprise technology. With over a decade in Swedish tech journalism, she previously served as technology editor at Dagens Industri and covered the Nordic startup ecosystem for Breakit. Sofia holds an MSc in Media Technology from KTH Royal Institute of Technology and is a frequent speaker at Web Summit and Slush. She is passionate about making complex technology accessible to business leaders.
View all articles