Skip to content

Instantly share code, notes, and snippets.

@terriyu
Created August 25, 2013 04:39
Show Gist options
  • Save terriyu/6332037 to your computer and use it in GitHub Desktop.
Save terriyu/6332037 to your computer and use it in GitHub Desktop.

22 Aug 2013

Ceilometer team meeting

The team reviewed blueprints in progress for Havana-3.

What to do when dependency in patch is merged

My patch "Adds group by statistics for MongoDB driver" https://review.openstack.org/#/c/43043/ depends on the patch "Add SQLAlchemy implementation of groupby" https://review.openstack.org/#/c/41597/. However, the SQL Alchemy patch was merged today. I see that Gerrit accounted for this and no longer lists the SQL Alchemy patch as a dependency. The question is how do I submit my next patch set for the MongoDB driver?

I asked this question on #openstack-metering and sileht responded:

rebasing should be sufficient

If you want that your next "git-review -R" only push A , just git pull --rebase origin/master should works

you can direct use pull and the remote master, in one command

git pull --rebase origin/master < this rebase the current branch with the remote master and doesn't your local master

*touch your local master

I asked sileht if I should run git review or git review -R after that.

as you like, with one commit I think it does the same thing

-R == don't do automatic rebase before push

Group by blueprint

SQL Alchemy Patch Set 10 gordc's code review

Link to Patch Set 10: https://review.openstack.org/#/c/41597/10

gordc approved the patch and it was merged.

He also gave some minor comments on Patch Set 10, which he hopes will get addressed in a later patch.

  • In get_meter_statistics() of ceilometer/storage/impl_mongodb.py, there is some code to assign a value to the attribute groupby. This is a temporary placeholder until the MongoDB driver implements group by.

    # TODO(jd) implement groupby and remove this code
    for r in results['results']:
       r['value']['groupby'] = None
    

    gordc commented:

    i think groupby can be set in FINALIZE_STATS to avoid loop but i guess this is fine if we're removing it

  • In the storage tests tests/storage/base.py, tests are defined in class StatisticsGroupByTest for checking metadata fields. However, since we aren't implementing groupby for metadata fields right now, the tests simply say pass. For example:

    def test_group_by_metadata(self):
        pass
    

    gordc suggested to:

    add TODO to tests we are skipping

    above the test put: # TODO(terriyu): <what needs to be done>

  • In get_meter_statistics() of ceilometer/storage/impl_sqlalchemy.py, there is some code that checks for invalid groupby fields:

    if groupby:
        for group in groupby:
            if group not in ['user_id', 'project_id', 'resource_id']:
                raise NotImplementedError("Unable to group by these fields")
    

    gordc said:

    this should probably be a ValueError -- the test case passes in a value that would never be right but raises notimplementederror which doesn't make sense.

    I responded by saying that there's the possibility of metadata fields, which we decided not to implement right now. So maybe NotImplementedError is still appropriate.

    gordc responded:

    not a major issue. maybe a conditional statement eventually when it's a valid value raise notimplementederror and when it's not a valueerror. throwing notimplementederror for someone trying to groupby 'wtf' seems misleading. :)

    I also noted that the above if statement could be refactored to something like:

    if (groupby and set(groupby) -
        set(['user_id', 'project_id', 'resource_id', 'source'])):
    

    jd made this suggestion on a nearly identical block of code in the MongoDB group by patch.

  • Summary of issues that should be addressed in a followup patch:

    • Add TODO comments to tests we are skipping (metadata field statistics tests) in class StatisticsGroupByTest from tests/storage/base.py

    • Refactor groupby field validity check in get_meter_statistics() method of ceilometer/storage/impl_sqlalchemy.py to use set()

    • Revise groupby field validity check to throw NotImplementedError for stuff that's not implemented (e.g. metadata fields) and to throw ValueError for invalid values (e.g. groupby=['wtf'])

Discussion with jd

  • I was worried about using a float for a key in the map function when I use map_reduce in MongoDB. The variable period_start is used as a key and it's a float, though technically in Javascript, there are only numbers, no floats or ints. It seems to work okay with the key being "period_start" without any rounding or casting to ints.

    jd said:

    It should be ok since anyway we're always using the same.

  • I noticed that the groupby parameters are passed as a list, e.g. get_meter_statistics(f, groupby=['user_id', 'resource_id'] I'm wondering if it's okay as a list or if it should be changed to a tuple.

    jd responded:

    It does not matter much for such a small list. And in the end, we just pass it as a list for test, which does not really matter. It's pretty sure our code works with tuple too.

    What may matter will be what WSME will pass as query argument, tuple or list, but honestly for such small data set, this is not something to care about.

MongoDB group by Patch Set 3 jd's code review

  • For this block of code in the get_meter_statistics() function of ceilometer/storage/impl_mongodb.py

    if groupby:
        for group in groupby:
            if group not in ['user_id', 'project_id',
                             'resource_id', 'source']:
                raise NotImplementedError("Unable to group by these fields")
    

    jd suggested a more "idiomatic" approach using set():

    if set(groubpy) - set(['user_id', 'project_id', 'resource_id', 'source']):
        raise NotImplementedError(...)
    

    I asked jd what he meant by idiomatic?

    well it avoids doing if/for/if so the code is less nested and easier to read

  • In the map functions for class Connection in ceilometer/storage/impl_mongodb.py, I make many string substitutions to avoid duplicated code. jd had a suggestion:

    To make the code more readable, I suggest you change the various %s to % (somefieldname)s

    You'll have to pass a dict after the modulo operator instad of (value1, value2…), and it would be more readable.

  • In the get_meter_statistics() method from ceilometer/storage/impl_mongodb.py, I pass the value of the groupby fields to the map function BSON code as str(groupby). For example,

    map_stats = self.MAP_STATS_GROUPBY % str(groupby)
    

    jd remarked that this is incorrect:

    While str(groupby) works because the syntax of JavaScript and Python for array looks alike, I think it would be more correct to use json.dumps (groupby).

    Reference for json.dumps() function: http://docs.python.org/2/library/json.html#json.dumps

    I noticed that MongoDB also has a similar function bson.json_util.dumps()

    Reference for bson.json_util.dumps(): http://api.mongodb.org/python/current/api/bson/json_util.html#bson.json_util.dumps

    I asked jd if I should use bson.json_util.dumps() instead of json.dumps() and he said:

    no, bson's versino is just a wrapper around json.dumps that adds support for types that are created by bson

    here we use a standard list that is not related to bson at all, so there is no need for that

  • Earlier, I had a problem getting test_group_by_unknown_field in the class StatisticsGroupByTest from tests/storage/base.py to pass. The version in the patch https://review.openstack.org/#/c/41597 works for SQL Alchemy, but not for MongoDB. So I modified the test to work with both, using a lambda() hack:

    def test_group_by_unknown_field(self):
        f = storage.SampleFilter(
            meter='instance',
        )
        self.assertRaises(
            NotImplementedError,
            lambda: list(self.conn.get_meter_statistics(f, groupby=['wtf']))
        )
    

    jd said I could avoid modifying the test, by rewriting the block of code that returns the result in the get_meter_statistics() of ceilometer/storage/impl_mongdb.py

     return sorted((models.Statistics(**(r['value']))
                    for r in results['results']),
                   key=operator.attrgetter('period_start'))
    

    jd said:

    You could change that to return iter(sorted(...)) That would avoid changing the test like you had too.

    This function should return an iterator, the sort that is done here in Python is not a good idea, it should be done in MongoDB. But that's not the point of this patch, so don't try to fix it right now. Just adding iter() would make the return more consistent with the other databases drivers.

    I tried jd's suggestion but I found that using iter() didn't work. The error is still thrown in get_meter_statistics() before you apply list() to it.

    I asked jd if I should use yield instead to make a generator instead of an iterator. He said:

    �yes that would be the solution �however there's no point in running yield since we have the result built as a list

    iter() doesn't work because the arguments passed to it are evaluated, whereas with yield the arguments are not evaluated until needed.

    forget about this comment, your solution is good enough �we'll fix that sorted() later :)

    I asked jd if my lambda() hack was too "hacky." His response:

    �it's hacky but it's justifiable �so I'm fine with it

MongoDB group by Patch Set 4 submitted

I fixed everything jd asked me to in the code review (with the exception of adding iter() to test_group_by_unknown_field) and submitted the changes as Patch Set 4:

https://review.openstack.org/#/c/43043/4/

MongoDB group by Patch Set 5 submitted

Previously, this patch was dependent on the SQL Alchemy group by patch: https://review.openstack.org/#/c/41597/

The SQL Alchemy patch has been merged, so I rebased and added a FIXME comment to get_meter_statistics() in ceilometer/storage/impl_mongodb.py reminding the team to get rid of the sorted() function when returning the results:

# FIXME(terriyu) Fix get_meter_statistics() so we don't use sorted()
# to return the results
return sorted(
    (models.Statistics(**(r['value'])) for r in results['results']),
    key=operator.attrgetter('period_start'))

Next steps

The immediate next step is to work on writing the API tests.

The major components of the group by blueprint left to do:

  • API group by tests

  • API implementation of group by

  • API group by documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment