Skip to content

Instantly share code, notes, and snippets.

@dbarrosop
Last active December 28, 2016 10:16
Show Gist options
  • Save dbarrosop/9830a04d365d9098c948db32b5752e2c to your computer and use it in GitHub Desktop.
Save dbarrosop/9830a04d365d9098c948db32b5752e2c to your computer and use it in GitHub Desktop.

Philosophy

Although this might look similar to other configuration management systems its philosophy is a bit different. Some of the key aspects of its philosophy are:

  1. Data is key. Data should be easy to gather, process and make it avaiable to potential consumers.
  2. YAML is not a programming language, it's just a mean to define simple data and glue different building blocks.
  3. It's easier to write, troubleshoot and debug simple python than complex YAML.
  4. It's easier to write, troubleshoot and debug simple python than complex Jinja2 templates.

Inventory

The inventory is just a replaceable python script which adheres to some standards. It's supposed to be simple and provide generic information like groups/roles, devices and a mapping between the two. It also provides connectivity information (hostname, username, password, OS, etc). More specific data is gathered later on by modules (see Modules section).

Modules

Modules provide reusable functionality and provide the basic building blocks.

Writing Modules

To write modules a base class will be provided with some basic functionality. The module can then implement any of the following two methods run_scoped and run_global to provide functionality.

Modes of operation

Modules can run in two modes:

  • global - Tasks run in global mode are run once only and data is made available to everybody. Tasks in the pre and post sections are global. A module, to provide global functionality has to implement run_global method.
  • scoped - Tasks run in scoped mode are run per device and data is only made available to other tasks within the same scope or to other global tasks. Tasks defined in the tasks section are scoped. A module, to provide scoped functionality has to implement run_scoped method.

A module can combine run_global and run_scope to easily provide both functionailities. For example:

class MyModule(BaseClassYetToProvide):

    def run_scope(self, host):
        do_something(host)

    def run_global(self):
        for group in self.inventory.groups():
            for host in self.inventory.hosts_in_group(group):
                self.run_scope(hosts)

A module doesn't need to overload both methods. It might make sense for some methods to only run in scoped mode (for example, a module that loads configuration on a device) or in global mode (a module that process data).

Modules

  • facts_yaml - Reads a directory structure with hostfiles and groupfiles and provides data. Provides similar functionality to other configuration magenement systems.
  • napalm_facts - Gather facts from live devices using napalm getters.
  • network_services - Based on a directory structure $service/$vendor/template.j2 defines a set of services that can be mapped to devices and groups.
  • napalm_configure - Provides configuration capabilities based on napalm.
  • ip_fabric - Reads a file defining a topology, correlates the definition with data from the inventory and generates all the necessary data for the deployment.
  • template - Provides generic jinja2 functionality.
  • napalm_validate - Provides the validate functionality of napalm.

Commits

Brigade doesn't have the notion of commit or dry-run. However, modules that perform changes should provide a commit argument to dictate if you want to perform the change or not.

Data

All the modules have access to all the data collected/generated previously. More interestingly, due to the global/scoped nature of tasks. Data availability follows the following rules:

  1. All data provided by the inventory is available to everybody.
  2. Data gathered by pre tasks are available to all subsequent tasks.
  3. Data gathered by scoped tasks defined in the tasks section are available to:
    1. To the subsequent tasks for that device.
    2. To all the tasks in the post section.

Example of data flow:

  1. The inventory could provide some basic information; devices, groups,parameters to connect to de devices, etc... Something simple, generic and very fast to run.
  2. A pre task could read specialized information to deploy an IP fabric or a WAN network or something else and make it consumable to everybody. Some sanity checks could be performed here as well. In addition, further data could be gathered from live devices and make it available through the entirity of the runbook. For example, you could read a prescriptive topology file, compare it with LLDP information and compute a fabric configuration (interfaces, IPs, BGP sessions, etc). The idea is to generate/munge data and make it consumable
  3. During the tasks execution phase, modules defined there could use all the data gathered so far. Individual devices could expand and collect more data in case they need them, things like OS version, interface names, etc... Things that might be useful to pick the right configuration
  4. Finally, a post task could process the result, log the result somewhere, validate the deployment, etc...

Runbooks

Runbooks are yaml files that glue all the building blocks. Runbooks are not code and thus there are not if statements or for loops. Because all modules have access to all the data, there is no need for dynamic variables in the runbook. Modules can still register data but that's just useful so other methods know where to find that data.

The only exceptions is the when clause. This accepts a string and the task will only be executed if the string eval is True.

In addition, modules can also tweak their behavior via CLI options and/or the environment.

@jedelman8
Copy link

I'm in agreement with this philosophy. I'd also add this: It's easier to write, troubleshoot and debug simple python than complex Jinja2 templates.

I think we all agree Ansible has a lower barrier to entry and it does a lot really well...but a few things to think about....

  • Think about making it even more usable than Ansible
  • What about using principles from behave which is more meant as human language such as this
Scenario: New lists are empty
  Given a new list
   then the list should be empty.

Scenario: Lists with things in them are not empty.
  Given a new list
   when we add an object
   then the list should not be empty.

https://pythonhosted.org/behave/philosophy.html

While BDD seems like it emerged from TDD and testing, I do not see why it couldn't be used as the core methodology for writing Brigade runbooks even this means adding our own sentences such as "save data as output" which is like register in Ansible. After all the "sentence" is just a decorator built atop of a Python function with Behave. We could have verbs such as get, configure, ensure and maybe configure is not idempotent and ensure is.

Scenario: deploying network configurations
Given a full configuration we want to replace
....

Back to core things / features would be good..

I'd like to see more things that Enterprise products have from a feature perspective:

  • Logging / reporting - not just the stdout in a file
  • Have brigade show log or something to history of jobs, status, summary data, things like that.
  • If things like brigade device device01 <inventory file/script> to see all variables for a given device....I need to use debug: var=hostvars[inventory_hostname]
  • Upon device failure / task failure - have an option to run a given set of tasks (maybe getters) on failed tasks after the runbook is done

Rather than pre and post - how about before and after - can even carve up more so there are options for a before scope and after scope...maybe before_<specific_scope> can be used for build tasks...we definitely need to be able to create directories and such. also before_ backup config files before the run....but if we are building a config snippet a single time (standard snmp configs) that could be a before_<run_once>

Regards points above, agree on facts....there should be a difference of static data vs. dynamic - and we shouldn't call them all facts like Ansible.

  • hw, os, platform, vendor - should be facts
  • others can be divided up into two buckets config_state and operational_state - would be nice to have ways to get both and the diff for these very easily

Inventory - what you have makes sense. Supporting dynamic inventory scripts however would be nice to pass in args to them.

Writing modules - using a base class - all good, makes sense.

Module names above are confusing me. I wouldn't use napalm any module name. I read facts_yaml a few times, and my head hurts. Are you talking about a module to simply do a yaml.load() on a predefined directory structure? If so, I wouldn't use yaml in the module name...YAML is a superset of JSON and we should support JSON too. Can it simplified to read_data and pass in a type json, yaml, xml ? Or just require valid yaml or json...

napalm_configure - I kind like install_config, but hey, I know module names matter less at this point

While having a bass class is nice, what are thoughts on supporting any programming language as long as they return JSON. To be more strict, maybe there is validation of the data (keys, data types) in this data?

Could think about this from a higher level perspective in different utilities, modules, etc. around:

  • Design - out of scope for now, but down the road, using topology file to drive intent
  • Build - templating, etc.
  • Deploy - pushing configs, etc.
  • Test - validating, etc.
  • Operate - day 2 changes, etc.

Sorry for any typos/errors :)

@dbarrosop
Copy link
Author

dbarrosop commented Dec 25, 2016

It's easier to write, troubleshoot and debug simple python than complex Jinja2 templates

Added!

What about using principles from behave which is more meant as human language such as this

Not sure about that, a couple of things:

  1. To support we would need some framework for the parsing, something that does the heavy lifting. I have written in the past language parsers and it's even worse than writing compilers. Not sure if behave provides a framework for that.
  2. (personal preference disclaimer) I personally prefer formal definitions. When I have to deal with software like behave I always have the impression I must be leaving a magic word out or that I might be using the wrong one to do what I want XD Not everybody is an english speaker and having something like this might scare people that feel insecure with the language.

In any case, if (1) is solved by behave or some other framework, we can look into how this would work. Otherwise, I'd say writing one is kind of complex, at least for an MVP.

I'd like to see more things that Enterprise products have from a feature perspective:

For the MVP and for getting more contributors, I was thinking that brigade could be, a simpler way of scripting around napalm and other modules. To provide fancier functionality (at least at the very beginning). We could integrate with openstack, salt and/or stackstorm, etc... I think those platforms do really good jobs at scheduling and orchestrating so I'd like to leverage on those and let brigade be just a python abstraction for network engineers. Later on, something like ansible tower could be added for the enterprise, that's for sure.

can even carve up more so there are options for a before scope and after scope

Given that tasks are executed linearly, what would be the use case? How is this:

pre_scoped:
   - task1
   - task2

tasks:
  - task3
  - task4

Different from:

tasks:
   - task1
   - task2
   - task3
   - task4

Inventory - what you have makes sense. Supporting dynamic inventory scripts however would be nice to pass in args to them.

Totally! That's something that's been annoying me so much on ansible. So my idea around that is the same as the modules. All data is available all the time. Given the inventory is the first thing to run, obviously data is a bit limited at that point but anything that's passed to the CLI command should be available to the inventory. I want to tweak things via the CLI/ENV so brigade is simple to containerize.

Module names above are confusing me

Yeah, don't sweat about the naming. I was just trying to dump some ideas to provide a simple example.

While having a bass class is nice, what are thoughts on supporting any programming language as long as they return JSON

I thought about that and I see very little value on that and it has huge implications. It makes hard to implement this philosophy where data is always available to everybody because you have to start serializing/deserializing data and when you are dealing with large objects that is very CPU/time consuming. My opinion on this is "don't do it" and if someone wants/needs to do this, wrap your ruby/golang/java module with a native module written in python. That way your wrapper has all the data and can decide what to pass on onto the module. I wouldn't mind even adding some helper functions on the module class to do that if people has interest but I think modules should be always native even if they are just wrapping something else. Does it make sense? I knew this one was going to show up at some point XD

Could think about this from a higher level perspective in different utilities, modules, etc. around:

Yes, I pretty much love that and I want to add modules early on to proof this is not ansible. This is a way of abstracting operations rather than building a fancy parallel ssh. The IP fabric, for example, it's a very simple to implement use case for the "Design" category that I want to implement very early on.

@dbarrosop
Copy link
Author

dbarrosop commented Dec 25, 2016

Regarding naming. I know some people wanted to talk about that. I was thinking the following names

  • tactic. A module that provides some functionality
  • planning phase. The pre section.
  • execution phase. The tasks section.
  • analysis phase. The post section.
  • plan. The runbook/playbook, the yaml describing what to do.

And our motto: "I love it when a plan comes together" (@GGabriele is probably too young to get this one ;) )

So a plan consists of a set of tactics divided in the planning, execution and analysis phases. We could also add aliases like cleaning (alias for analysis) or intel (alias for planning) so people can categorize their tasks more accordingly. @jedelman8 mentioned something along these lines but I am not sure it's useful other than "it looks good xD"

Thoughts?

@ogenstad
Copy link

I would vote for not using the Behave way of describing things. While I haven't used it a formal definition seems more clear. It's easier just to look at it and see which parts are parameters and values. From my point of view it also makes it easier to interchange the format json/yaml or through some other systems api.

Regarding modules in another language than Python. Does anyone here want/need that or is it just a cool feature? I would also vote for the wrapper module written in Python if needed.

Perhaps another way to have it stand out a bit would be to allow the controller to run from a Windows box. Having said that I don't know if Napalm (or all drivers) currently runs on Windows.

The name analysis implies verification of what happened. This step could also be to things like log the job or close a change in a system like ServiceNow or something. Something like paperwork might be broader. recon -> execution -> paperwork, but it's also nice if it's fairly clear what the things mean so you don't have to read a manual just to understand. Perhaps analysis is better.

tactic sounds wrong for a module though. Something like operation would be better.

Regarding the motto, they did a remake remember? 😄

@GGabriele
Copy link

GGabriele commented Dec 26, 2016

And our motto: "I love it when a plan comes together" (@GGabriele is probably too young to get this one ;) )
I had to Google that lol

Regarding Behave, it may be something nice to add later on if both your assumptions are right @dbarrosop, but I would still start with YAML/JSON. From a user perspective, netengs had to learn and deal with Python for scripting and NAPALM, then with YAML for Ansible, then with something new called Behave for Brigade? I don't know if they would see this as a simplification at first.

@ktbyers
Copy link

ktbyers commented Dec 27, 2016

I agree with @dbarosso and think that outputting JSON is a mistake. This is one of the reasons Ansible is a pain to debug/troubleshoot. It makes sense for server people where they were shipping the Ansible modules to remote machines. Probably doesn't make sense here.

I am against doing the behave-like language.

Can we get rid of the conditionals?

The only exceptions is the when clause. This accepts a string and the task will only 
be executed if the string eval is True.

The above is the road to adding more programming constructs i.e. if we add conditionals, then shortly we will need loops, exception handlers.

I am inclined to say that we should do one of two things...no programming constructs (although I don't know how we necessarily pull this off) or full programming (i.e. can just embed straight Python in Brigrade playbook and entire section will be eval() as Python code).

@ktbyers
Copy link

ktbyers commented Dec 27, 2016

You can ignore my comment on 'when'. After thinking about this some more, I am starting to like the above.

I do think we probably need to specify/consider how variables (facts/inventory) are going to work.

I also think we might want to work through a few concrete examples and see how what we proposed would actually work (and whether it is meaningfully better than other existing tools).

@dbarrosop
Copy link
Author

Regarding when, I think we need at least that one to signal some tasks to be skipped or not. For example, if you want to slack/mail someone as part of your "plan" but don't want to do it when developing/testing/dry-running. I would like to keep it as simple as possible but I think we need at least a simple way of selecting which tasks to run.

I also think we might want to work through a few concrete examples and see how what we proposed would actually work (and whether it is meaningfully better than other existing tools).

I agree. I think we can start writing an MVP. I am going to start writing a POC. It will just implement the behavior we discussed here. It's not going to be modular and it's not going to be pretty but I hope I can write it relatively quick. In the meantime, we can discuss the "naming" and once we have the POC and if we like the idea we can start deciding how to make it more modules and then how to write the inventory, modules, etc...

I created a new gist for the naming and pasted my proposal, @ogenstad, can you add your comment there as well, please?

https://gist.github.com/dbarrosop/33a2b68b1afa337c1ef9b1588a3fced0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment