4.1 KiB
Performance tuning
This project has two performance layers:
- ansibleTUI decides which compliance jobs can safely run at the same time.
- Ansible decides how much parallel SSH/module work each job can perform.
The best wins usually come from tuning both. Measure one change at a time so it is easy to tell whether a speedup came from app scheduling, Ansible forks, SSH pipelining, or playbook changes.
Compliance job scheduling
Compliance mappings can define universal jobs and group-scoped jobs. Universal check and check+diff jobs run concurrently because they are read-only drift scans. Group-scoped jobs still serialize within their group, and apply runs stay conservative so fixes do not roll out across overlapping targets unexpectedly.
That means a mapping like this can overlap the two universal checks:
universal:
- site.yml
- xcp_guest_tools.yml
This is most useful when the playbooks spend time waiting on different remote work. It will not make one slow playbook faster by itself. If one playbook takes 72 seconds and another takes 11 seconds, overlapping them saves at most about 11 seconds.
Avoid splitting a single playbook into several concurrent jobs that touch the same hosts unless the tasks are known to be independent. Package managers, service restarts, handlers, and fact gathering can contend with themselves and make runs slower or less predictable.
Focused tags are a safer way to make recurring scans cheaper:
universal:
- playbook: site.yml
tags: [packages, ipv6, sudo]
Keep broad all-host jobs only when they are already fast or when their coverage is worth the scan time.
Ansible configuration
For a small VM fleet, start with settings like this in the playbook repo's
ansible.cfg:
[defaults]
forks = 10
[connection]
pipelining = True
forks controls how many hosts Ansible works on in parallel. The default is 5,
so an 11-host inventory can leave capacity idle. Start near the fleet size, then
adjust based on controller CPU, network behavior, SSH agent behavior, and load on
the managed hosts. Remember that overlapping compliance jobs can multiply the
number of simultaneous SSH connections.
pipelining = True reduces connection round trips for module execution. If a
playbook starts failing around sudo or privilege escalation, check whether the
managed hosts require tty allocation for sudo. In that case, either disable
pipelining or remove the tty requirement on those hosts.
Persistent fact caching can be useful, but use it carefully when playbooks mix
become: true system plays and non-become user plays. Facts such as
ansible_facts.user_dir can differ depending on whether facts were gathered as
root or as the connecting user. If gathering = smart reuses root-gathered
facts in a later user play, user tasks may incorrectly target paths such as
/root/.fonts. Enable persistent fact caching only after the playbooks have
explicit facts or variables for user-scoped paths.
Do not default to strategy = free unless the playbooks are designed for hosts
to move through tasks independently. It can improve throughput for some
workloads, but it changes ordering behavior and can surprise roles that expect
lockstep execution, handlers, or cross-host coordination.
What to measure
Look at the per-job durations in run history:
- The fleet wrapper duration shows wall-clock time for the whole scan.
- Individual job durations show which playbook is actually slow.
- A compact log can make history browsing faster, but it does not reduce Ansible execution time.
Recent local runs showed the main site.yml compliance job taking roughly 72
seconds and the VM tools job roughly 11 seconds. Concurrent universal checks can
hide the short job under the long one, while higher forks and playbook-specific
task tuning are the more likely improvements for the long job.