Skip to content

Instantly share code, notes, and snippets.

@alikins
Created October 1, 2018 15:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alikins/373bcce9e283bb37bb5e78ce665e519c to your computer and use it in GitHub Desktop.
Save alikins/373bcce9e283bb37bb5e78ce665e519c to your computer and use it in GitHub Desktop.
3.0-ish idea/proposals
Common problem areas
Ssh
Errors from ‘ssh’ command are obscure
But not meaningless, so they could be processed into more useful ansible error messages
Early versions of https://github.com/ansible/ansible/pull/17598
https://github.com/ansible/ansible/pull/16649
Default verbosity level hides useful troubleshooting info
Ie, no stderr
Even at -vvv or higher, the ssh stderr output is presented in a unreadable way by default callbacks
controlMaster/controlPersist also hides/obscures ssh troubleshooting info
No easy way to temporarily disable it
No easy way to collect ssh config info from ansible
‘ssh -G’
- Running ‘ssh’ from the cli is not equivalent to how it is invoked from ansible
- Even when cut & pasting the ssh command line shown at higher verbosity levels (doesn’t take env vars or user config into account)
https://github.com/ansible/ansible/pull/23241
- Playbook include unpredictability/complexity
- No one understands what a ‘static’ include is or why it causes/fixes problems
- Conditionals and handlers on or in includes are very confusing
- Pretty much no one understands the interactions
- Lots of unexpected include behavior filed as issues
- Undefined var behavior
- Obtuse error messages [1]
- Variable precedence
- Figuring out why a variable ended up with a particular value is almost impossible
- Proposal: https://github.com/ansible/ansible/compare/devel...alikins:varman_show_precedence (example output: https://gist.github.com/alikins/405352d8521ce8792fc2f72fde26f9ef)
YAML errors
- Current yaml errors are pretty good, but they are often still really vague. And until https://github.com/ansible/ansible/pull/24468 was recently merged, often wrong.
- Jinja2/Templating debugging
- ansible var scope/precedence info [prototyped: varman_show_precedence branch]
- see var_man_changed_show_change branch
- add scope/scope_info to playbook Base (Block?)
- scope label ('role_vars', 'extra_vars', etc)
- scope info
- task/role/etc name
- file path of file var came from ('group_vars/all', 'role/myrole/defaults/main.yml', etc)
- various 'get a dict of vars from some source' bits called in vars.manager.Manager.get_vars() could
return the info
- as atts on the object?
- as internal/magic dict items? ('_ansible_scope_label', etc)
- vault extensibility
- further split envelope/wrappers out
- plugins for vault secrets
- plugins for vault ciphers
- figure out what we need to enable PKI based tools (ie, gpg)
- make password rounds configurable
- likely needs to add the number of rounds to envelope format
- vault edit inline vault
- parse yaml
- decrypt !vault-encrypted
- replace with !vault-plaintext
- parse saved yaml
- encrypted !vault-plaintext
- save with encrypted values
- can't serialize to yaml
- need to do text munging
- better serialization/dump/yaml of playbook related objects
Goal is to make the playbook objects easier to debug and troubleshoot for users and developers.
# easier, more useful parts
- for dumping parsed/compiled Playbook object as it exist before execution
- start with leaf nodes (FieldAttributes)
- get unsafe canonical yaml working
- get safe canonical yaml working
- get safe non-canonical yaml working (ie, output that looks like a playbook)
- proceed up the tree
- make sure container/list type objects know
to serialize their contents
- add debug/troubleshooting hooks for displaying/presenting/persisting this info
to users. --dump or whatever
# shouldn't be that hard actually
- start trying to dump 'mid execution' Playbook
- ie, latest values of vars
- any new vars
- any new blocks/tasks/plays/roles
# harder, but opens up arch even more
- after everything is safe and non-canonical yaml, maybe repeat with:
- json
- pickle
- repr
- maybe str
- logging
- I can dream.
- at least setup a logger correctly
- and use debug()/exception()
- playbook, inv, host,group, etc serialize/deserialize support
- gh bugs for display log
- logger name
- not standard
- not useful/hard to predict/misused
- not using %(process)s
- log file level tied to cli/display level
- doesn't create logger unless using log file
- nothing else can attach a handler to the display log
- only uses two log levels
- ansible callbacks
- freeze current callback api
- add a new one with better versioning/introspection
- make sure new interface is more clearly display/progress callbacks
and not api hooks or entry points (ie, internals are 'ro')
- split single callback interface into smaller composable
parts
- Task callbacks
- Play callbacks
- Playbook callbacks
- Handler callbacks (see 'ansible handlers'
- ansible process/instance/run lifetime callbacks
- app startup
- inventory load
- etc
- MAYBE: try adding 'rw' hook/slot/callback API entrypoints
- yum plugin-ish
- not a great example of maintainable interfaces, but it
is/was a very flexible/powerful approach
- make DataLoader pluggable
- split into DataReader and DataDeserializer
- FileReader, VaultFileReader
- yaml/json
- reuse/share some/more inventory code?
- ansible payload proposal
- make module_common and the bits of executor/ that build anziballs a seperate cli
- or at least decouple the code
- build modules based on target platform/runtime/versions
- ie, python-posix-ansible-2.4 or powershell-windows-ansible-2.4 or golang-posix-ansible-2.5
- make it easier to do things like add PEX support
- move base.Base._post_validate logic into FieldAttributes subclasses?
- better serialization/dump/yaml of playbook/
# continue from above
# getting pretty darn hard
- then maybe try supporting dumping back to 'original' playbook form
- means tracking extra info
- dir/filenames of includes
# very unlikely
- then maybe trying supporting dumping back to 'original, pre templating' playbook form
- means tracking the source of vars and parent templates
- means serializing jinja template objects, if that is a thing
- ansible handlers
- per block handlers
- fail/error/changed/skipped handlers
- per host handlers
- or pass some 'user data' obj/ref to handlers with
extra info (like the host name, or error info, etc)
- support a generic handler that matches all notifies
- mostly for debugging
- implicit handlers
- pre/post task
- fail/changed/skipped mentioned above
- task would always notify/emit 'task done handler' etc
- default would be no handlers
- add handler specific hooks to callback plugins
- v3_on_handler_called
- v3_on_handler_ok
- v3_on_handler_error
# this is gobject or DOM style property notifications. Non trivial, but super useful.
- let varmanager emit handler notifies
- tasks/plays/roles/etc using a set of vars could
set 'listen' for varmanager change notifies
- ie, like GObject 'properties' and prop change signals
or web browser DOM 'mutationObservers'
- set_fact: blip='foobar'
- would 'notify' a 'facts_blip.changed' handler
- if there is a handler listening for 'facts_blip.changed', it would
get notified and run at next approriate time (idle loop-ish)
- if handlers are per block/task/play/playbook/role, then each could have
a handler listening for 'facts_blip.changed'
- block could ignore it and let it propagate
- play would catch it, handle it (say, restart a service for classic example) and
stop propagating it
- if play doesn't handle it, propagate to playbook
- ... then onto global
- ... then onto universal persistent handler? (ie, tower etc)
- handling changing vars event driven would allow for setting/changing global
semi-immutable vars (like inventory)
- ie, queue var change, idle loop, pop it, change it, queue 'changed' signal
- then next (concurent-ish) var change is queued, idle loop, popped, change and emit 'changed'
- any block with (implicit, default) change handlers would handle changed signals before
using there local var closure
- possible impls?
- strategy checks for task result _ansible_notify
- task executor sets _ansible_notify from Task 'notify' field attribute
- strategy only does handlers on success and on 'changed'
- extract the handler running code to method (deep in strategy _process_pending_results)
- amongst other things, this is also where 'handler hierarchy resolution' is handled (ie, role or play or global)
- param the result field for handler names ('_ansible_notify')
- each task result stanza could check for its handlers. ie, failed would _get_handlers(name='_ansible_failed_notify')
- handle ok/failed/skipped/unreachable * changed/notchanged
- add_host/add_group/diff etc as internal implicit handlers?
- need a Role like HandlerDef to have a ds for handler/listen args ie
- notify:
- some_task:
src: foo
dest: /bar
register: some_task_result
notify:
- my_blip_handler:
host: the_other_machine
result: some_task_result
- restart_a_service_or_whatever:
svc: httpd
- ansible update/partial results
- see 'update_json' for one approach
- would be nice to have more connection channels for 'out of band' control/updates/partial results:
- would like to avoid:
- multiplexing multiple 'channels' to just stdout/stderr
- having to do locking around output streams to avoid corrupt messages
- having to do any sort of 'escape' from stdout stream
- for ex, if random module writes out the same format as proposed json updates
- having to do any additional parsing of stdout
- better would be to be able to get rid of some filter_non_json kind of things
- related: see module_log branch for returning log records as json
Troubleshooting / Support tools
Better logging support
Ansible core does not really use logging. There are some bits of display that can also log to a log file, but it has a lot of problems
install/env collection tools
Ala ‘sosreport’ or similar tools
Collect
where/how ansible is installed
Python modules used
Configuration
Env
Info about external tools used
Ssh
Local and remote config
Logs if possible
Sudo/su etc
Shell type/version
Ansible related system logging
Could be playbook/role based
End notes
1. fatal: [testhost]: FAILED! => {
"failed": true,
"msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined.The error was: 'test' is undefined\n\nThe error appears to have been in '/root/ansible/test/integration/targets/any_errors_fatal/test_fatal.yml': line 7, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- shell: \"echo {{ test }}\"\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'test' is undefined"
}. Huh?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment