dzimine/st2-troubleshoot-rule.md

## st2-troubleshoot-rule.md

      
    Raw
  

              st2-troubleshoot-rule.md
            
          
    Troubleshoot a Rule

Scenario.

I set up a sensor to watch for a trigger (trigger represents an external event; sensor will fire a trigger-instance of the trigger type when the event is detected). I created a rule: if the trigger happens, and matches the criteria, it should fire an action. I see that event had happened. I expected the actions to fire. But it didn't happen.
Where does it break?
This is a long read, and may look complicated. But really, it's just three debugging steps. And it's long because I refuse to write briefly, drop bunch of hints on the way and get you distracted. But as they say in math, the thicker the math book the faster it reads. Embrace yourself.
In the example below, I'll be using a case of tweeter to slack relay that we debugged last night exactly by this "runbook". The "runbook" is generic and applies to other rule debugging just fine.
First, let's look at the chain.
TODO: picture it.
event -> SENSOR -> trigger-instance -> RULES ENGINE -> match rule -> match criteria -> form action parameters -> request action execution -> ACTION EXECUTION -> fire action.

An event happens. Sensors captures an event and emits... what? Previously we said for briefity, "emits trigger". Now it's time to get nuanced. It emits "trigger-instance". WTF? Let's see. If a tweet is an event, how many of them do we have? billions! and they are all of the same - what? type! They are tweets! So, a tweet is an event type, while each individual tweets are  instances of "tweet" event type. Good so far? Ok, now twitter.matched_tweet is a trigger that corresponds to a tweet event type. And each individual tweet, an instance of "tweet" event type, is represented by "trigger-instance". So, simply: trigger is a type, trigger-instance is an instance of this type. Therefore, when an actual tweet goes off, the sensor will emit a trigger-instance. Not Clear? Read it again. Rinse. Spit. Continue. Proceed when it's clear. Send us a note to break from infinite loop.
An event happens. Sensor captures the event, and emits a trigger-instance.
Trigger-instance goes to a message bus, and hits the rule engine.
Rule engine checks if this trigger instance is of interest to any rule, and if so, does it match the rule criteria.
The act of matching the trigger-instance against the rule is called "rule-enforcement".
If the rule matches, it requests an action execution. Execution id is created, and a request for action execution goes back into the message bus to find an action runner that picks it up and runs it, as the name implies.
Step 0. Did the external event actually happening?

Check outside of StackStorm. I go to tweeter and see that tweet.
Step 1. Sensor configured and working?

$ st2 sensor list
+-----------------------------+----------+-----------------------------+---------+
| ref                         | pack     | description                 | enabled |
+-----------------------------+----------+-----------------------------+---------+
| ...
| twitter.TwitterSearchSensor | twitter  | Sensor which monitors       | True    |
| ...

st2 trigger list --pack=twitter
+-----------------------+---------+--------------------------------------+
| ref                   | pack    | description                          |
+-----------------------+---------+--------------------------------------+
| twitter.matched_tweet | twitter | Trigger which represents a matching  |
|                       |         | tweet                                |
+-----------------------+---------+--------------------------------------+

Remember that if you reconfigure a sensor (using config files or new config options), you must reload it for the configurations to take effect: st2ctl reload-component st2sensorcontainer. It's only for sensors: actions pick any change or any new configurations without reloads.
Step 2. Did the sensor emit the trigger-instance for an event?

# st2 trigger-instance list
.... loads of output....

Oh no! This output is SO NOISY! How can I possibly find anything? How to find the niddle in a haystack here? Well look up at the rule what is the trigger type, and filter by it. It's twitter.matched_tweet, so:
st2 trigger-instance list --trigger=twitter.matched_tweet
+--------------------------+-----------------------+---------------------------+-----------+
| id                       | trigger               | occurrence_time           | status    |
+--------------------------+-----------------------+---------------------------+-----------+
| 57ae23b0d805641b8ed11de1 | twitter.matched_tweet | Fri, 12 Aug 2016 19:29:52 | processed |
|                          |                       | UTC                       |           |
| 57ae2ce2d805641b8ed12543 | twitter.matched_tweet | Fri, 12 Aug 2016 20:09:06 | processed |
|                          |                       | UTC                       |           |
|...
| 57ae834bd805641b8ed16c5d | twitter.matched_tweet | Sat, 13 Aug 2016 02:17:47 | processed |
|                          |                       | UTC                       |           |
+--------------------------+-----------------------+---------------------------+-----------+


If the trigger-instance for the event is not there, something is wrong with a sensor. It may not have captured it, or . Check the logs at /var/log/st2/st2sensorcontainer.log and debug the sensor.
If the tirgger-instance IS here, we move on to the rule.
If you're not sure, use st2 trigger-instance
Hint: form your ideal CLI output with combinations of -a and -y or -j parameters. Limit the number of records with -n E.g.
# st2 trigger-instance list -a "id" "occurence_type" "payload" -y --trigger=twitter.matched_tweet -n 5
-   id: 57ae6724d805641b8ed155c3
    payload:
        created_at: Sat Aug 13 00:19:01 +0000 2016
        favorite_count: 0
        id: 764254896379932672
        lang: en
        place: null
        retweet_count: 0
        text: '@jiangu In that case, @Stack_Storm presentation at @Brocade. #NFD12'
        url: https://twitter.com/ecbanks/status/764254896379932672
        user:
            description: 'PacketPushers dot net co-founder. Podcaster & writer covering
                data center design & network engineering. I interview nerds so you
                don''t have to. CCIE #20655.'
            location: New Hampshire
            name: Ethan Banks
         screen_name: ecbanks
...

Step 3. Did the rule get enforced, matched, and created execution?

Scenario 1: NO.

It did not get enforced. So the trigger-instance didn't reach the rule engine. Go to Step 2, tripple-check that the trigger-instance got emitted, and if it did, dive into the logs (run st2sensorcontainer with DEBUG) and troubleshoot at RabbitMQ level.
Scenario 2: YES

It does get enforced but didn't create execution. For example:
$ st2 rule-enforcement list --rule=tweeter.relay_tweet_to_slack
+--------------------------+------------------+---------------------+--------------+------------------+
| id                       | rule.ref         | trigger_instance_id | execution_id | enforced_at      |
+--------------------------+------------------+---------------------+--------------+------------------+
| 57ae7037d805641b8ed15d18 | tweeter.relay_tw | 57ae7037d805641b8ed |              | Sat, 13 Aug 2016 |
|                          | eet_to_slack     | 15d16               |              | 00:56:23 UTC     |
+--------------------------+------------------+---------------------+--------------+------------------+

O-oh...

if "execution_id" is emptly, it's TROUBLE. It's either criteria didn't match, or Jinga template is messed up. Fire up st2-rule-tester, and see, will this trigger instance match this rule? all input is conveniently at your disposal - rule.ref and trigger -nstance_id is in the above output of rule enforcement list.

HINT: when copying IDs from table output kills you, remember -y option, it may be handy!
st2 rule-enforcement list --rule=tweeter.relay_tweet_to_slack -y
-   enforced_at: '2016-08-13T00:56:23.576716Z'
    id: 57ae7037d805641b8ed15d18
    rule:
        ref: tweeter.relay_tweet_to_slack
    trigger_instance_id: 57ae7037d805641b8ed15d16
-   enforced_at: '2016-08-13T02:17:47.443764Z'
    execution_id: 57ae834bd805641b8ed16c60
    id: 57ae834bd805641b8ed16c61
    rule:
        ref: tweeter.relay_tweet_to_slack
    trigger_instance_id: 57ae834bd805641b8ed16c5d

Here we go, testing the rule!
st2-rule-tester --trigger-instance-id=57ae7037d805641b8ed15d16 --rule-ref=tweeter.relay_tweet_to_slack
2016-08-13 01:06:52,158 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2016-08-13 01:06:52,224 INFO [-] Validating rule tweeter.relay_tweet_to_slack for matched_tweet.
2016-08-13 01:06:52,224 INFO [-] 1 rule(s) found to enforce for matched_tweet.
2016-08-13 01:06:52,232 ERROR [-] Failed to resolve parameters
	Original error : 'dict object' has no attribute 'errorHereForSure'
2016-08-13 01:06:52,233 INFO [-] === RULE DOES NOT MATCH ===

Aha! I've messed up the Jinga template. To fix it, I edit and update the rule. Before I update, I may want to check it. Note that st2-rule-testercan be used in both "online" mode, working against real trigger-instance and rule objects in the system, or "offline mode", using rules from the file, and trigger-instance captured to the file, or in any combinations. Like this  - here I edited the rule definition in a file, and before updating it, trying it with st2-rule-tester:
$ st2-rule-tester --trigger-instance-id=57ae7037d805641b8ed15d16 --rule=relay_tweet_to_slack.yaml
2016-08-13 01:14:07,084 INFO [-] Connecting to database "st2" @ "0.0.0.0:27017" as user "None".
2016-08-13 01:14:07,142 INFO [-] Validating rule tweeter.relay_tweet_to_slack for matched_tweet.
2016-08-13 01:14:07,142 INFO [-] 1 rule(s) found to enforce for matched_tweet.
2016-08-13 01:14:07,150 INFO [-] Action parameters resolved to:
2016-08-13 01:14:07,150 INFO [-] 	message: A tweet from @dzimine:
https://twitter.com/dzimine/status/764264543321100288
2016-08-13 01:14:07,150 INFO [-] 	channel: #twitter-relay
2016-08-13 01:14:07,150 INFO [-] === RULE MATCHES ===
-

Works! and watch you can see what kind of action parameters I'm gonna send to my action from this particular trigger-instance.
Ok, now st2 rule update tweeter.relay_tweet_to_slack relay_tweet_to_slack.yaml, rule is fixed.
If the external event is too important to miss, but now it is already happened and not gonna happen again... you may want to re-fire your automation for it - by re-emitting the trigger-instance, now that the rule is fixed:
st2 trigger-instance re-emit 57ae7037d805641b8ed15d16
Trigger instance 57ae7037d805641b8ed15d16 succesfully re-sent.

Checking... Look ma, now same trigger-instance appears twice here, and the re-emitted one did trigger the desired action!
st2 rule-enforcement list --rule=tweeter.relay_tweet_to_slack
+--------------------------+----------------------+----------------------+----------------------+----------------------+
| id                       | rule.ref             | trigger_instance_id  | execution_id         | enforced_at          |
+--------------------------+----------------------+----------------------+----------------------+----------------------+
| 57ae7037d805641b8ed15d18 | tweeter.relay_tweet_ | 57ae7037d805641b8ed1 |                      | Sat, 13 Aug 2016     |
|                          | to_slack             | 5d16                 |                      | 00:56:23 UTC         |
| 57ae834bd805641b8ed16c61 | tweeter.relay_tweet_ | 57ae834bd805641b8ed1 | 57ae834bd805641b8ed1 | Sat, 13 Aug 2016     |
|                          | to_slack             | 6c5d                 | 6c60                 | 02:17:47 UTC         |
+--------------------------+----------------------+----------------------+----------------------+----------------------+