Skip to content

Instantly share code, notes, and snippets.

@kapilt
Last active December 6, 2016 19:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kapilt/681927a4ce7241c62a6ddf8e613b4389 to your computer and use it in GitHub Desktop.
Save kapilt/681927a4ce7241c62a6ddf8e613b4389 to your computer and use it in GitHub Desktop.
exploring diffs

Three python diff libraries were evaluated for comparing resource revisions.

  • jsonpatch
  • dictdiffer
  • DeepDiff

Additional a consideration of rolling our own thats specific to custodian's needs.

jsonpatch

On a whole it does a good job of producing a minimal diff that matches the semantic changes. There are some bugs on the repo that need investigation.

[{u'op': u'replace', u'path': u'/IpPermissions/0/ToPort', u'value': 80},
 {u'op': u'add',
  u'path': u'/Tags/1',
  u'value': {'Key': 'AppId', 'Value': 'SomethingGood'}}]

dictdiffer

[('change', ['IpPermissions', 0, 'ToPort'], (53, 80)),
 ('change', ['Tags', 1, 'Value'], ('Name', 'SomethingGood')),
 ('change', ['Tags', 1, 'Key'], ('Origin', 'AppId')),
 ('add', 'Tags', [(2, {'Key': 'Origin', 'Value': 'Name'})])]

The change here is correct, but requires a bit of semantic interpretation, it ends up mutating elements in position as it considers position within a list a strict diff, where as in all circumstances we want the semantic delta on a list rather than a mutation in place.

DeepDiff

{'iterable_item_added': {"root['IpPermissions'][0]": {'FromPort': 53,
                                                      'IpProtocol': 'tcp',
                                                      'IpRanges': ['10.0.0.0/8'],
                                                      'PrefixListIds': [],
                                                      'ToPort': 80,
                                                      'UserIdGroupPairs': []},
                         "root['Tags'][1]": {'Key': 'AppId',
                                             'Value': 'SomethingGood'}},
 'iterable_item_removed': {"root['IpPermissions'][0]": {'FromPort': 53,
                                                        'IpProtocol': 'tcp',
                                                        'IpRanges': ['10.0.0.0/8'],
                                                        'PrefixListIds': [],
                                                        'ToPort': 53,
                                                        'UserIdGroupPairs': []}}}

Deep diff is fairly configurable, the only non default param here is ignore_order.

The returned semantic structure of the diff is quite obtuse, and idiosyncratic.

Rolling our own

The issue with most of the diff libraries, is that they require significant interpretation to line up with the api call semantics around any given resource. Ie. a security group rule is effectively immutable, and modification which might be represented by a diff library as a 'change', requires removal of original and addition of modified.

[('change', [u'Tags', 1, u'Value'], ('blue-moon', 'red-moon')),
('add', u'Tags', [(2, {u'Key': 'Stage', u'Value': 'production'})]),
('change', [u'IpPermissions', 0, u'FromPort'], (8080, 80)),
('change', [u'IpPermissions', 0, u'ToPort'], (8080, 80))]
{'iterable_item_added': {u"root['IpPermissions'][0]": {u'FromPort': 80,
u'IpProtocol': 'tcp',
u'IpRanges': [{u'CidrIp': '0.0.0.0/0'}],
u'PrefixListIds': [],
u'ToPort': 80,
u'UserIdGroupPairs': []},
u"root['Tags'][1]": {u'Key': 'App',
u'Value': 'red-moon'},
u"root['Tags'][2]": {u'Key': 'Stage',
u'Value': 'production'}},
'iterable_item_removed': {u"root['IpPermissions'][0]": {u'FromPort': 8080,
u'IpProtocol': 'tcp',
u'IpRanges': [{u'CidrIp': '0.0.0.0/0'}],
u'PrefixListIds': [],
u'ToPort': 8080,
u'UserIdGroupPairs': []},
u"root['Tags'][1]": {u'Key': 'App',
u'Value': 'blue-moon'}}}
[{u'op': u'replace', u'path': u'/Tags/1/Value', u'value': 'red-moon'},
{u'op': u'add',
u'path': u'/Tags/2',
u'value': {u'Key': 'Stage', u'Value': 'production'}},
{u'op': u'replace',
u'path': u'/IpPermissions/0',
u'value': {u'FromPort': 80,
u'IpProtocol': 'tcp',
u'IpRanges': [{u'CidrIp': '0.0.0.0/0'}],
u'PrefixListIds': [],
u'ToPort': 80,
u'UserIdGroupPairs': []}}]
import pprint
from common import BaseTest
class DiffLibTest(BaseTest):
def compare_diffs(self, s1, s2):
print
import dictdiffer
diff = list(dictdiffer.diff(s1, s2))
pprint.pprint(diff)
print
from deepdiff import DeepDiff
diff = DeepDiff(s1, s2, ignore_order=True)
pprint.pprint(diff)
print
from jsonpatch import make_patch
pprint.pprint(list(make_patch(s1, s2)))
print
def test_sg_mods(self):
factory = self.record_flight_data('test_security_group_revisions_delta')
client = factory().client('ec2')
vpc_id = client.create_vpc(CidrBlock="10.42.0.0/16")['Vpc']['VpcId']
self.addCleanup(client.delete_vpc, VpcId=vpc_id)
sg_id = client.create_security_group(
GroupName="allow-access",
VpcId=vpc_id,
Description="inbound access")['GroupId']
self.addCleanup(client.delete_security_group, GroupId=sg_id)
client.create_tags(
Resources=[sg_id],
Tags=[
{'Key': 'NetworkLocation', 'Value': 'DMZ'},
{'Key': 'App', 'Value': 'blue-moon'}
])
client.authorize_security_group_ingress(
GroupId=sg_id,
IpPermissions=[
{'IpProtocol': 'tcp',
'FromPort': 443,
'ToPort': 443,
'IpRanges': [{'CidrIp': '10.42.1.0/24'}]},
{'IpProtocol': 'tcp',
'FromPort': 8080,
'ToPort': 8080,
'IpRanges': [{'CidrIp': '0.0.0.0/0'}]}
])
s1 = client.describe_security_groups(GroupIds=[sg_id])[
'SecurityGroups'][0]
client.create_tags(
Resources=[sg_id],
Tags=[
{'Key': 'App', 'Value': 'red-moon'},
{'Key': 'Stage', 'Value': 'production'}])
client.revoke_security_group_ingress(
GroupId=sg_id,
IpPermissions=[
{'IpProtocol': 'tcp',
'FromPort': 8080,
'ToPort': 8080,
'IpRanges': [{'CidrIp': '0.0.0.0/0'}]}])
client.authorize_security_group_ingress(
GroupId=sg_id,
IpPermissions=[
{'IpProtocol': 'tcp',
'FromPort': 80,
'ToPort': 80,
'IpRanges': [{'CidrIp': '0.0.0.0/0'}]},
])
s2 = client.describe_security_groups(GroupIds=[sg_id])[
'SecurityGroups'][0]
self.compare_diffs(s1, s2)
@mandeepbal
Copy link

With DeepDiff, even though the entire obj gets returned, you could still use that when making a call back to AWS. When I do create a rule in AWS, I need to know all of the things listed in that object. Knowing just the line that changed doesn't help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment