Skip to content

Instantly share code, notes, and snippets.

@JM1
Last active May 30, 2024 12:02
Show Gist options
  • Save JM1/9363beeb9fb5055e054b5f64aea0a598 to your computer and use it in GitHub Desktop.
Save JM1/9363beeb9fb5055e054b5f64aea0a598 to your computer and use it in GitHub Desktop.
Ansible Roles with OS-specific defaults

Ansible Roles with OS-specific Defaults

This Ansible guide discusses several approaches on how to set different role default variables based / depending on the host operating system aka ansible_distribution / ansible_facts.distribution or other variables. For example, a role variable image_uri should point to the latest cloud image for the host. For CentOS 8 or Red Hat Enterprise Linux (RHEL) 8 the default value should be:

image_uri: 'https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.2.2004-20200611.2.x86_64.qcow2'

For Ubuntu 20.04 LTS (Focal Fossa) it is supposed to be:

image_uri: 'https://cloud-images.ubuntu.com/focal/20200616/focal-server-cloudimg-amd64.img'

This guide applies to Ansible 2.9 and later, up to the latest (21.06.2020) revision in Ansible's devel branch on GitHub.com.

First off, Ansible loads default variables from defaults/main.yml file in the role directory. Role default variables have a very low precedence / priority in comparison to variables defined in other places:

Anything that goes into “role defaults” (the defaults folder inside the role) is the most malleable and easily overridden

For details see Variable precedence: Where should I put a variable?.

Approach using include_vars

Put the supposed-to-be-default variables into distinct files in the role's vars/ folder and load these files with include_vars:

  • vars/Ubuntu.yml, vars/CentOS.yml etc.:
    image_uri: https://...
  • tasks/main.yml:
    - name: Load OS-specific variables
      include_vars: '{{ ansible_facts.distribution }}.yml'
    
    - name: Do something with variable
      debug:
        var: image_uri

Downsides:

The big issue is, that variables which are loaded with include_vars have more precedence than variables from most other places, e.g. they override variables in group_vars and host_vars. Most often, this behaviour is not wanted for role defaults.

Approach using prefixed variables, include_vars and conditional set_fact

The intention here is, to give variables from e.g. host_vars and group_vars a higher priority over default role variables. To achieve this, include_vars is combined with a conditional set_fact:

Add a prefix such as double underscores __ to the default variables defined in the vars/ folder:

  • vars/Ubuntu.yml, vars/CentOS.yml etc.:
    __image_uri: https://...

Load the variables using include_vars in tasks/main.yml but assign the non-prefixed variable only if it has not been defined yet:

  • tasks/main.yml:
    - name: Load OS-specific default variables
      include_vars: '{{ ansible_facts.distribution }}.yml'
    
    - name: Set image_uri variable to default value
      set_fact:
        image_uri: "{{ __image_uri }}"
      when: image_uri|default(None) == None
    
    - name: Do something with variable
      debug:
        var: image_uri

The conditional set_fact is a workaround for the high variable precedence of include_vars which caused unwanted side effects in the previous approach.

This approach is used by Jeff Geerling (@geerlingguy).

Downsides:

To allow multiple role executions, e.g. using import_role or include_role, non-prefixed variables may have to be "undefined" i.e. reset to !!null / none:

  • tasks/main.yml:
    - name: Load OS-specific default variables
      include_vars: '{{ ansible_facts.distribution }}.yml'
    
    - name: Set image_uri variable to default value
      set_fact:
        image_uri: "{{ __image_uri }}"
      when: image_uri|default(None) == None
    
    - name: Do something with variable
      debug:
        var: image_uri
    
    - name: Cleanup role variables
      set_fact:
        image_uri: !!null

Else subsequent role executions might be affected by previous role executions. Setting variables to !!null has side effects though. Suppose one default (prefixed) variable is a Jinja2 Template that uses a non-prefixed variable, such as:

__image: '{{image_uri|urlsplit("path")|basename}}'

Later one dumps all variables with e.g.:

- name: List all known variables and facts
  debug:
    var: hostvars

Ansible will try to evaluate __image but fails because image_uri has been set to !!null / none during the variable cleanup at the end of the role. Hence any Jinja2 template in default (prefixed) variables must handle invalid and !!null values properly to avoid those 'NoneType' object errors.

One might be tempted to workaround this by cleaning default variables as well:

- name: Load OS-specific default variables
  include_vars: '{{ ansible_facts.distribution }}.yml'

- name: Set image_uri variable to default value
  set_fact:
    image_uri: "{{ __image_uri }}"
  when: image_uri|default(None) == None

- name: Set image variable to default value
  set_fact:
    image: "{{ __image }}"
  when: image|default(None) == None

- name: Cleanup role default variables
  set_fact:
    __image_uri: !!null
    __image: !!null

- name: Do something with variables
  debug:
    msg: '{{ image_uri }} / {{ image }}'

- name: Cleanup role variables
  set_fact:
    image_uri: !!null
    image: !!null

Unfortunately this will prohibit subsequent role executions, because set_fact has precedence over include_vars. Hence once e.g. __image has been set to !!null using set_fact a subsequent call to include_vars won't change that nullified value back to the value defined in vars/*.yml files.

Another drawback is that set_fact causes Ansible to immediately evaluate and template variables:

Because of the nature of tasks, set_fact will produce ‘static’ values for a variable. Unlike normal ‘lazy’ variables, the value gets evaluated and templated on assignment.

Approach using include_vars with os_vars dictionary and conditional set_fact

This approach is similar to the previous one. First, use include_vars to fetch default variables from vars/ folder. But instead of making them top level variables, assign them into a variable named os_vars. Then loop through all variables in os_vars and set them as top level variables if no variable with the same name already exist, i.e. they have not been defined by the user:

  • vars/Ubuntu.yml, vars/CentOS.yml etc.:

    image_uri: https://...
  • tasks/main.yml:

    - name: Fetch OS dependent variables
      include_vars:
        file: '{{ item }}'
        name: 'os_vars'
      with_first_found:
        - files:
            - '{{ ansible_facts.distribution }}_{{ ansible_facts.distribution_major_version }}.yml'
            - '{{ ansible_facts.distribution }}.yml'
            - '{{ ansible_facts.os_family }}_{{ ansible_facts.distribution_major_version }}.yml'
            - '{{ ansible_facts.os_family }}.yml'
          skip: true
    
    # we only override variables with our default, if they have not been specified already
    # by default the lookup functions finds all varnames containing the string, therefore
    # we add ^ and $ to denote start and end of string, so this returns only exact matches
    - name: Set OS dependent variables, if not already defined by user  # noqa var-naming
      set_fact:
        '{{ item.key }}': '{{ item.value }}'
      when: "not lookup('varnames', '^' + item.key + '$')"
      loop: '{{ os_vars|dict2items }}'
    
    - name: Do something with variables
      debug:
        var: image_uri

This approach is used in Ansible collection devsec.hardening, e.g. refer to roles/ssh_hardening/tasks/hardening.yml.

Downsides:

Using set_fact causes Ansible to immediately evaluate and template variables:

Because of the nature of tasks, set_fact will produce ‘static’ values for a variable. Unlike normal ‘lazy’ variables, the value gets evaluated and templated on assignment.

Hence variables defined in vars/, loaded with include_vars and set with set_fact cannot include references to variables from the same file, because Ansible does not lazy evaluate those variables. For example:

  • vars/Ubuntu.yml:
    conf_dir: /etc/foo
    conf_file: "{{ conf_dir }}/foo.conf"

This will fail with an 'conf_dir' is undefined error if conf_dir has not been defined outside of vars/Ubuntu.yml before calling set_fact.

Approach using custom include_defaults plugin

Ansible Plugin include_defaults has been developed by Daniele Varrazzo (@dvarrazzo). But:

Warning! unfortunately this implementation of include_defaults has an issue: because it changes some data structures in-place it doesn't work when ansible runs in parallel on many hosts, because the process forks and the modified variables get lost.

The author does suggest some workarounds though. Details

include_defaults has been proposed for inclusion in Ansible but the pull request has been rejected for these reasons:

  • we've discussed this topic on ansible-devel and cannot pin down a use case where this can't be modelled more idiomatically through other ansible-means
  • we believe introducing extra syntax for this feature would add to complexity in learning the application that we would like to solve through more idiomatic means

Approach using lookup('file', ...)

This approach uses Lookup Plugins and indirections in defaults/main.yml to load the OS-specific default variables. Hence, role default variables have the intended precedence.

Put default variables into distinct files in the role's defaults/ folder:

  • defaults/Ubuntu.yml, defaults/CentOS.yml etc.:
    image_uri: https://...

Use the file lookup to load the OS-specific variables from disk and then convert this string to a dict using the from_yaml filter:

  • tasks/main.yml:
    - name: Load OS-specific default variables
      set_fact:
        role_default_vars: |
            {{ lookup('file', '../defaults/' + ansible_facts.distribution + '.yml')|from_yaml }}
    
    - name: Do something with variable
      debug:
        var: image_uri

The role's defaults/main.yml then uses indirections to initialize default variables from the role_default_vars dictionary:

  • defaults/main.yml:
    image_uri: "{{ role_default_vars['image_uri'] }}"

Downsides:

One assumption, that must be satisfied, is that the set of variables must be the same across all OS's.

The lookup('file', ...) call does not render any Jinja2 Template, hence e.g. image: '{{image_uri|urlsplit("path")|basename}}' will not evaluate to a filename, instead it will contain the raw string {{image_uri|urlsplit("path")|basename}}. The template lookup plugin would render templates inside the defaults/*.yml files immediately during load. But template evaluation is done before the from_yaml filter has been executed, hence if a template inside defaults/*.yml uses any default variable that is defined inside the same file, then Ansible may raise errors because this variable has not yet been defined.

One has to force Ansible to render those templated default variables after the indirection inside defaults/main.yml or later on their first use.

Unfortunately, Ansible does not provide any filters that render templates and a custom filter plugin does not work either: The template rendering is done in class Templar but no (?) instance of this class is available inside the FilterModule classes.

An instance of Templar is available to the LookupModule though. Hence a custom LookupModule class allows to force Ansible into rendering the templates, e.g. inside defaults/main.yml. An example lookup plugin might look like this:

  • NAMESPACE/COLLECTION/plugins/lookup/template.py (irrelevant code stripped for the sake of brevity):

    class LookupModule(LookupBase):
    
        def run(self, terms, variables=None, **kwargs):
            if variables is not None:
                self._templar.available_variables = variables
    
            ret = []
            for term in terms:
    
                if isinstance(term, AnsibleUnsafeBytes):
                    term = super(AnsibleUnsafeBytes, term).decode().encode()
                elif isinstance(term, AnsibleUnsafeText):
                    term = super(AnsibleUnsafeText, term).encode().decode()
    
                if not isinstance(term, string_types):
                    raise AnsibleError('Invalid setting identifier, "%s" is not a string, its a %s' % (term, type(term)))
    
                ret.append(self._templar.template(term, fail_on_undefined=True))
            return ret

The eagle-eyed reader might wonder about super(AnsibleUnsafeText, term).encode().decode(): Ansible marks text (i.e. bytes and strings), that is assigned using set_fact, as unsafe. In practice, Ansible wraps unsafe texts in AnsibleUnsafe objects. For example, all variables inside the role_default_vars dictionary are marked unsafe. Unsafe variables are skipped during template rendering. To remove the outer AnsibleUnsafe wrapper, strings are encoded to bytes and decoded back to strings.

Side note: Lookup plugins do provide an allow_unsafe=True argument, which skips this unsafe wrapper, but this only applies to the current evaluation context: Once task set_fact: { role_default_vars: "{{ lookup('file', ..., allow_unsafe=True)|from_yaml }}" } has been completed, all entries inside the role_default_vars dictionary are unsafe (AnsibleUnsafe) texts ultimately. One cannot simply call the custom lookup plugin inside the same evaluation context for the same reason it is not possible to use the template lookup plugin here.

Let's get back to how to use the custom template.py lookup plugin:

  • defaults/main.yml:
    image_uri: "{{ lookup('NAMESPACE.COLLECTION.template', role_default_vars['image_uri']) }}"

First, variable image_uri is extracted from the role_default_vars dict, then plugins/lookup/template.py removes the AnsibleUnsafe wrapper and uses class Templar to render the Jinja2 template. This works because Ansible delays these steps until image_uri is used actually.

NOTE: It still has to be determined whether this approach causes side effects.

Approach using modified group variable precedence merge order

Change Ansible's group variable precedence rules with configuration setting VARIABLE_PRECEDENCE as explained by George Shuklin.

Downsides:

Ansible only allows to change merge order of group variables. It is not possible to completely override Ansible's variable precedence rules.

Changing the group variable precedence rules might cause conflicts with external Ansible content, i.e. third party roles from Ansible Galaxy which most likely assume default precedence rules.

Messing with variable precedence rules might cause confusion for external developers and might be counterintuitive even for developers working on the project.

Approach using OS-agnostic dictionaries in defaults/main.yml

Create OS-agnostic dictionaries in defaults/main.yml and assign suitable values from those dictionaries to default variables using as keys e.g. ansible_facts.distribution:

  • defaults/main.yml:
    image_uri: |-
        {{
            {
                'CentOS': 'https://...',
                'Ubuntu': 'https://...'
            }[ansible_facts.distribution]
        }}

This approach is used in Ansible collection jm1.cloudy, e.g. refer to roles/tftpd/defaults/main.yml.

Downsides:

As before, one assumption that must be satisfied is that the set of variables must be the same across all OS's.

With an increasing number of variables and operating systems the syntax might get hard to opaque.

Author

Jakob Meng @jm1 (github, galaxy, web)

@JM1
Copy link
Author

JM1 commented May 30, 2024

@wookietreiber If you have some time and leisure, give Ansible inventories a try. Try to split configuration from code. For example, try to move host specific configuration to the inventory. Only keep functionality and configuration which applies to all distributions and hosts in roles. If configuration applies to several hosts, try to put that configuration into groups (in the inventory) first. Try to store configuration in roles only has last resort. It will greatly enhance reusability, composability and readability of your roles and your Ansible collection in general!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment