Created
October 1, 2018 15:03
-
-
Save alikins/373bcce9e283bb37bb5e78ce665e519c to your computer and use it in GitHub Desktop.
3.0-ish idea/proposals
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Common problem areas | |
Ssh | |
Errors from ‘ssh’ command are obscure | |
But not meaningless, so they could be processed into more useful ansible error messages | |
Early versions of https://github.com/ansible/ansible/pull/17598 | |
https://github.com/ansible/ansible/pull/16649 | |
Default verbosity level hides useful troubleshooting info | |
Ie, no stderr | |
Even at -vvv or higher, the ssh stderr output is presented in a unreadable way by default callbacks | |
controlMaster/controlPersist also hides/obscures ssh troubleshooting info | |
No easy way to temporarily disable it | |
No easy way to collect ssh config info from ansible | |
‘ssh -G’ | |
- Running ‘ssh’ from the cli is not equivalent to how it is invoked from ansible | |
- Even when cut & pasting the ssh command line shown at higher verbosity levels (doesn’t take env vars or user config into account) | |
https://github.com/ansible/ansible/pull/23241 | |
- Playbook include unpredictability/complexity | |
- No one understands what a ‘static’ include is or why it causes/fixes problems | |
- Conditionals and handlers on or in includes are very confusing | |
- Pretty much no one understands the interactions | |
- Lots of unexpected include behavior filed as issues | |
- Undefined var behavior | |
- Obtuse error messages [1] | |
- Variable precedence | |
- Figuring out why a variable ended up with a particular value is almost impossible | |
- Proposal: https://github.com/ansible/ansible/compare/devel...alikins:varman_show_precedence (example output: https://gist.github.com/alikins/405352d8521ce8792fc2f72fde26f9ef) | |
YAML errors | |
- Current yaml errors are pretty good, but they are often still really vague. And until https://github.com/ansible/ansible/pull/24468 was recently merged, often wrong. | |
- Jinja2/Templating debugging | |
- ansible var scope/precedence info [prototyped: varman_show_precedence branch] | |
- see var_man_changed_show_change branch | |
- add scope/scope_info to playbook Base (Block?) | |
- scope label ('role_vars', 'extra_vars', etc) | |
- scope info | |
- task/role/etc name | |
- file path of file var came from ('group_vars/all', 'role/myrole/defaults/main.yml', etc) | |
- various 'get a dict of vars from some source' bits called in vars.manager.Manager.get_vars() could | |
return the info | |
- as atts on the object? | |
- as internal/magic dict items? ('_ansible_scope_label', etc) | |
- vault extensibility | |
- further split envelope/wrappers out | |
- plugins for vault secrets | |
- plugins for vault ciphers | |
- figure out what we need to enable PKI based tools (ie, gpg) | |
- make password rounds configurable | |
- likely needs to add the number of rounds to envelope format | |
- vault edit inline vault | |
- parse yaml | |
- decrypt !vault-encrypted | |
- replace with !vault-plaintext | |
- parse saved yaml | |
- encrypted !vault-plaintext | |
- save with encrypted values | |
- can't serialize to yaml | |
- need to do text munging | |
- better serialization/dump/yaml of playbook related objects | |
Goal is to make the playbook objects easier to debug and troubleshoot for users and developers. | |
# easier, more useful parts | |
- for dumping parsed/compiled Playbook object as it exist before execution | |
- start with leaf nodes (FieldAttributes) | |
- get unsafe canonical yaml working | |
- get safe canonical yaml working | |
- get safe non-canonical yaml working (ie, output that looks like a playbook) | |
- proceed up the tree | |
- make sure container/list type objects know | |
to serialize their contents | |
- add debug/troubleshooting hooks for displaying/presenting/persisting this info | |
to users. --dump or whatever | |
# shouldn't be that hard actually | |
- start trying to dump 'mid execution' Playbook | |
- ie, latest values of vars | |
- any new vars | |
- any new blocks/tasks/plays/roles | |
# harder, but opens up arch even more | |
- after everything is safe and non-canonical yaml, maybe repeat with: | |
- json | |
- pickle | |
- repr | |
- maybe str | |
- logging | |
- I can dream. | |
- at least setup a logger correctly | |
- and use debug()/exception() | |
- playbook, inv, host,group, etc serialize/deserialize support | |
- gh bugs for display log | |
- logger name | |
- not standard | |
- not useful/hard to predict/misused | |
- not using %(process)s | |
- log file level tied to cli/display level | |
- doesn't create logger unless using log file | |
- nothing else can attach a handler to the display log | |
- only uses two log levels | |
- ansible callbacks | |
- freeze current callback api | |
- add a new one with better versioning/introspection | |
- make sure new interface is more clearly display/progress callbacks | |
and not api hooks or entry points (ie, internals are 'ro') | |
- split single callback interface into smaller composable | |
parts | |
- Task callbacks | |
- Play callbacks | |
- Playbook callbacks | |
- Handler callbacks (see 'ansible handlers' | |
- ansible process/instance/run lifetime callbacks | |
- app startup | |
- inventory load | |
- etc | |
- MAYBE: try adding 'rw' hook/slot/callback API entrypoints | |
- yum plugin-ish | |
- not a great example of maintainable interfaces, but it | |
is/was a very flexible/powerful approach | |
- make DataLoader pluggable | |
- split into DataReader and DataDeserializer | |
- FileReader, VaultFileReader | |
- yaml/json | |
- reuse/share some/more inventory code? | |
- ansible payload proposal | |
- make module_common and the bits of executor/ that build anziballs a seperate cli | |
- or at least decouple the code | |
- build modules based on target platform/runtime/versions | |
- ie, python-posix-ansible-2.4 or powershell-windows-ansible-2.4 or golang-posix-ansible-2.5 | |
- make it easier to do things like add PEX support | |
- move base.Base._post_validate logic into FieldAttributes subclasses? | |
- better serialization/dump/yaml of playbook/ | |
# continue from above | |
# getting pretty darn hard | |
- then maybe try supporting dumping back to 'original' playbook form | |
- means tracking extra info | |
- dir/filenames of includes | |
# very unlikely | |
- then maybe trying supporting dumping back to 'original, pre templating' playbook form | |
- means tracking the source of vars and parent templates | |
- means serializing jinja template objects, if that is a thing | |
- ansible handlers | |
- per block handlers | |
- fail/error/changed/skipped handlers | |
- per host handlers | |
- or pass some 'user data' obj/ref to handlers with | |
extra info (like the host name, or error info, etc) | |
- support a generic handler that matches all notifies | |
- mostly for debugging | |
- implicit handlers | |
- pre/post task | |
- fail/changed/skipped mentioned above | |
- task would always notify/emit 'task done handler' etc | |
- default would be no handlers | |
- add handler specific hooks to callback plugins | |
- v3_on_handler_called | |
- v3_on_handler_ok | |
- v3_on_handler_error | |
# this is gobject or DOM style property notifications. Non trivial, but super useful. | |
- let varmanager emit handler notifies | |
- tasks/plays/roles/etc using a set of vars could | |
set 'listen' for varmanager change notifies | |
- ie, like GObject 'properties' and prop change signals | |
or web browser DOM 'mutationObservers' | |
- set_fact: blip='foobar' | |
- would 'notify' a 'facts_blip.changed' handler | |
- if there is a handler listening for 'facts_blip.changed', it would | |
get notified and run at next approriate time (idle loop-ish) | |
- if handlers are per block/task/play/playbook/role, then each could have | |
a handler listening for 'facts_blip.changed' | |
- block could ignore it and let it propagate | |
- play would catch it, handle it (say, restart a service for classic example) and | |
stop propagating it | |
- if play doesn't handle it, propagate to playbook | |
- ... then onto global | |
- ... then onto universal persistent handler? (ie, tower etc) | |
- handling changing vars event driven would allow for setting/changing global | |
semi-immutable vars (like inventory) | |
- ie, queue var change, idle loop, pop it, change it, queue 'changed' signal | |
- then next (concurent-ish) var change is queued, idle loop, popped, change and emit 'changed' | |
- any block with (implicit, default) change handlers would handle changed signals before | |
using there local var closure | |
- possible impls? | |
- strategy checks for task result _ansible_notify | |
- task executor sets _ansible_notify from Task 'notify' field attribute | |
- strategy only does handlers on success and on 'changed' | |
- extract the handler running code to method (deep in strategy _process_pending_results) | |
- amongst other things, this is also where 'handler hierarchy resolution' is handled (ie, role or play or global) | |
- param the result field for handler names ('_ansible_notify') | |
- each task result stanza could check for its handlers. ie, failed would _get_handlers(name='_ansible_failed_notify') | |
- handle ok/failed/skipped/unreachable * changed/notchanged | |
- add_host/add_group/diff etc as internal implicit handlers? | |
- need a Role like HandlerDef to have a ds for handler/listen args ie | |
- notify: | |
- some_task: | |
src: foo | |
dest: /bar | |
register: some_task_result | |
notify: | |
- my_blip_handler: | |
host: the_other_machine | |
result: some_task_result | |
- restart_a_service_or_whatever: | |
svc: httpd | |
- ansible update/partial results | |
- see 'update_json' for one approach | |
- would be nice to have more connection channels for 'out of band' control/updates/partial results: | |
- would like to avoid: | |
- multiplexing multiple 'channels' to just stdout/stderr | |
- having to do locking around output streams to avoid corrupt messages | |
- having to do any sort of 'escape' from stdout stream | |
- for ex, if random module writes out the same format as proposed json updates | |
- having to do any additional parsing of stdout | |
- better would be to be able to get rid of some filter_non_json kind of things | |
- related: see module_log branch for returning log records as json | |
Troubleshooting / Support tools | |
Better logging support | |
Ansible core does not really use logging. There are some bits of display that can also log to a log file, but it has a lot of problems | |
install/env collection tools | |
Ala ‘sosreport’ or similar tools | |
Collect | |
where/how ansible is installed | |
Python modules used | |
Configuration | |
Env | |
Info about external tools used | |
Ssh | |
Local and remote config | |
Logs if possible | |
Sudo/su etc | |
Shell type/version | |
Ansible related system logging | |
Could be playbook/role based | |
End notes | |
1. fatal: [testhost]: FAILED! => { | |
"failed": true, | |
"msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined.The error was: 'test' is undefined\n\nThe error appears to have been in '/root/ansible/test/integration/targets/any_errors_fatal/test_fatal.yml': line 7, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- shell: \"echo {{ test }}\"\n ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written as:\n\n with_items:\n - \"{{ foo }}\"\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'test' is undefined" | |
}. Huh? | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment