Performance tuning

This project has two performance layers:

ansibleTUI decides which compliance jobs can safely run at the same time.
Ansible decides how much parallel SSH/module work each job can perform.

The best wins usually come from tuning both. Measure one change at a time so it is easy to tell whether a speedup came from app scheduling, Ansible forks, SSH pipelining, or playbook changes.

Compliance job scheduling

Compliance mappings can define universal jobs and group-scoped jobs. Universal check and check+diff jobs run concurrently because they are read-only drift scans. Group-scoped jobs still serialize within their group, and apply runs stay conservative so fixes do not roll out across overlapping targets unexpectedly.

That means a mapping like this can overlap the two universal checks:

universal:
  - site.yml
  - xcp_guest_tools.yml

This is most useful when the playbooks spend time waiting on different remote work. It will not make one slow playbook faster by itself. If one playbook takes 72 seconds and another takes 11 seconds, overlapping them saves at most about 11 seconds.

Avoid splitting a single playbook into several concurrent jobs that touch the same hosts unless the tasks are known to be independent. Package managers, service restarts, handlers, and fact gathering can contend with themselves and make runs slower or less predictable.

Focused tags are a safer way to make recurring scans cheaper:

universal:
  - playbook: site.yml
    tags: [packages, ipv6, sudo]

Keep broad all-host jobs only when they are already fast or when their coverage is worth the scan time.

Ansible configuration

For a small VM fleet, start with settings like this in the playbook repo's ansible.cfg:

[defaults]
forks = 10

[connection]
pipelining = True

forks controls how many hosts Ansible works on in parallel. The default is 5, so an 11-host inventory can leave capacity idle. Start near the fleet size, then adjust based on controller CPU, network behavior, SSH agent behavior, and load on the managed hosts. Remember that overlapping compliance jobs can multiply the number of simultaneous SSH connections.

pipelining = True reduces connection round trips for module execution. If a playbook starts failing around sudo or privilege escalation, check whether the managed hosts require tty allocation for sudo. In that case, either disable pipelining or remove the tty requirement on those hosts.

Persistent fact caching can be useful, but use it carefully when playbooks mix become: true system plays and non-become user plays. Facts such as ansible_facts.user_dir can differ depending on whether facts were gathered as root or as the connecting user. If gathering = smart reuses root-gathered facts in a later user play, user tasks may incorrectly target paths such as /root/.fonts. Enable persistent fact caching only after the playbooks have explicit facts or variables for user-scoped paths.

Do not default to strategy = free unless the playbooks are designed for hosts to move through tasks independently. It can improve throughput for some workloads, but it changes ordering behavior and can surprise roles that expect lockstep execution, handlers, or cross-host coordination.

What to measure

Look at the per-job durations in run history:

The fleet wrapper duration shows wall-clock time for the whole scan.
Individual job durations show which playbook is actually slow.
A compact log can make history browsing faster, but it does not reduce Ansible execution time.

Recent local runs showed the main site.yml compliance job taking roughly 72 seconds and the VM tools job roughly 11 seconds. Concurrent universal checks can hide the short job under the long one, while higher forks and playbook-specific task tuning are the more likely improvements for the long job.

4.1 KiB Raw Permalink Blame History

Performance tuning

Compliance job scheduling

Ansible configuration

What to measure

References

4.1 KiB

Raw Permalink Blame History