terriyu/diary-22aug2013.md

## diary-22aug2013.md

      
    Raw
  

              diary-22aug2013.md
            
          
    22 Aug 2013

Ceilometer team meeting

The team reviewed blueprints in progress for Havana-3.
What to do when dependency in patch is merged

My patch "Adds group by statistics for MongoDB driver"
https://review.openstack.org/#/c/43043/ depends on the patch "Add SQLAlchemy
implementation of groupby" https://review.openstack.org/#/c/41597/.  However,
the SQL Alchemy patch was merged today.  I see that Gerrit accounted for this
and no longer lists the SQL Alchemy patch as a dependency.  The question is
how do I submit my next patch set for the MongoDB driver?
I asked this question on #openstack-metering and sileht responded:

rebasing should be sufficient


If you want that your next "git-review -R" only push A , just git pull
--rebase origin/master should works


you can direct use pull and the remote master, in one command


git pull --rebase origin/master < this rebase the current branch with the
remote master and doesn't your local master


*touch your local master

I asked sileht if I should run git review or git review -R after that.

as you like, with one commit I think it does the same thing


-R == don't do automatic rebase before push

Group by blueprint


group by blueprint on Launchpad:
https://blueprints.launchpad.net/ceilometer/+spec/api-group-by
group by blueprint discussion on the OpenStack wiki:
https://wiki.openstack.org/wiki/Ceilometer/blueprints/api-group-by

SQL Alchemy Patch Set 10 gordc's code review

Link to Patch Set 10: https://review.openstack.org/#/c/41597/10
gordc approved the patch and it was merged.
He also gave some minor comments on Patch Set 10, which he hopes will get
addressed in a later patch.


In get_meter_statistics() of ceilometer/storage/impl_mongodb.py, there
is some code to assign a value to the attribute groupby.  This is a
temporary placeholder until the MongoDB driver implements group by.
# TODO(jd) implement groupby and remove this code
for r in results['results']:
   r['value']['groupby'] = None

gordc commented:

i think groupby can be set in FINALIZE_STATS to avoid loop but i guess
this is fine if we're removing it


In the storage tests tests/storage/base.py, tests are defined in class
StatisticsGroupByTest for checking metadata fields.  However, since we
aren't implementing groupby for metadata fields right now, the tests simply
say pass.  For example:
def test_group_by_metadata(self):
    pass

gordc suggested to:

add TODO to tests we are skipping


above the test put:
# TODO(terriyu): <what needs to be done>


In get_meter_statistics() of ceilometer/storage/impl_sqlalchemy.py,
there is some code that checks for invalid groupby fields:
if groupby:
    for group in groupby:
        if group not in ['user_id', 'project_id', 'resource_id']:
            raise NotImplementedError("Unable to group by these fields")

gordc said:

this should probably be a ValueError -- the test case passes in a value
that would never be right but raises notimplementederror which doesn't
make sense.

I responded by saying that there's the possibility of metadata fields, which
we decided not to implement right now. So maybe NotImplementedError is still
appropriate.
gordc responded:

not a major issue. maybe a conditional statement eventually when it's a
valid value raise notimplementederror and when it's not a valueerror.
throwing notimplementederror for someone trying to groupby 'wtf' seems
misleading. :)

I also noted that the above if statement could be refactored to something
like:
if (groupby and set(groupby) -
    set(['user_id', 'project_id', 'resource_id', 'source'])):

jd made this suggestion on a nearly identical block of code in the MongoDB
group by patch.


Summary of issues that should be addressed in a followup patch:


Add TODO comments to tests we are skipping (metadata field statistics
tests) in class StatisticsGroupByTest from tests/storage/base.py


Refactor groupby field validity check in get_meter_statistics()
method of ceilometer/storage/impl_sqlalchemy.py to use set()


Revise groupby field validity check to throw NotImplementedError for
stuff that's not implemented (e.g. metadata fields) and to throw
ValueError for invalid values (e.g. groupby=['wtf'])


Discussion with jd


I was worried about using a float for a key in the map function when I use
map_reduce in MongoDB.  The variable period_start is used as a key and
it's a float, though technically in Javascript, there are only numbers, no
floats or ints.  It seems to work okay with the key being "period_start"
without any rounding or casting to ints.
jd said:

It should be ok since anyway we're always using the same.


I noticed that the groupby parameters are passed as a list, e.g.
get_meter_statistics(f, groupby=['user_id', 'resource_id']  I'm wondering
if it's okay as a list or if it should be changed to a tuple.
jd responded:

It does not matter much for such a small list.  And in the end, we just
pass it as a list for test, which does not really matter. It's pretty sure
our code works with tuple too.


What may matter will be what WSME will pass as query argument, tuple or
list, but honestly for such small data set, this is not something to care
about.


MongoDB group by Patch Set 3 jd's code review


For this block of code in the get_meter_statistics() function of
ceilometer/storage/impl_mongodb.py
if groupby:
    for group in groupby:
        if group not in ['user_id', 'project_id',
                         'resource_id', 'source']:
            raise NotImplementedError("Unable to group by these fields")

jd suggested a more "idiomatic" approach using set():
if set(groubpy) - set(['user_id', 'project_id', 'resource_id', 'source']):
    raise NotImplementedError(...)

I asked jd what he meant by idiomatic?

well it avoids doing if/for/if
so the code is less nested and easier to read


In the map functions for class Connection in
ceilometer/storage/impl_mongodb.py, I make many string substitutions to
avoid duplicated code.  jd had a suggestion:

To make the code more readable, I suggest you change the various %s to %
(somefieldname)s


You'll have to pass a dict after the modulo operator instad of (value1,
value2…), and it would be more readable.


In the get_meter_statistics() method from
ceilometer/storage/impl_mongodb.py, I pass the value of the groupby fields
to the map function BSON code as str(groupby).  For example,
map_stats = self.MAP_STATS_GROUPBY % str(groupby)

jd remarked that this is incorrect:

While str(groupby) works because the syntax of JavaScript and Python for
array looks alike, I think it would be more correct to use json.dumps
(groupby).

Reference for json.dumps() function:
http://docs.python.org/2/library/json.html#json.dumps
I noticed that MongoDB also has a similar function bson.json_util.dumps()
Reference for bson.json_util.dumps():
http://api.mongodb.org/python/current/api/bson/json_util.html#bson.json_util.dumps
I asked jd if I should use bson.json_util.dumps() instead of
json.dumps() and he said:

no, bson's versino is just a wrapper around json.dumps that adds
support for types that are created by bson


here we use a standard list that is not related to bson at all, so there
is no need for that


Earlier, I had a problem getting test_group_by_unknown_field in the class
StatisticsGroupByTest from tests/storage/base.py to pass.  The version
in the patch https://review.openstack.org/#/c/41597 works for SQL Alchemy,
but not for MongoDB.  So I modified the test to work with both, using a
lambda() hack:
def test_group_by_unknown_field(self):
    f = storage.SampleFilter(
        meter='instance',
    )
    self.assertRaises(
        NotImplementedError,
        lambda: list(self.conn.get_meter_statistics(f, groupby=['wtf']))
    )

jd said I could avoid modifying the test, by rewriting the block of code
that returns the result in the get_meter_statistics() of
ceilometer/storage/impl_mongdb.py
 return sorted((models.Statistics(**(r['value']))
                for r in results['results']),
               key=operator.attrgetter('period_start'))

jd said:

You could change that to return iter(sorted(...)) That would avoid
changing the test like you had too.


This function should return an iterator, the sort that is done here in
Python is not a good idea, it should be done in MongoDB. But that's not
the point of this patch, so don't try to fix it right now. Just adding
iter() would make the return more consistent with the other databases
drivers.

I tried jd's suggestion but I found that using iter() didn't work.  The
error is still thrown in get_meter_statistics() before you apply list() to
it.
I asked jd if I should use yield instead to make a generator instead of
an iterator.  He said:

�yes that would be the solution
�however there's no point in running yield since we have the result built
as a list

iter() doesn't work because the arguments passed to it are evaluated,
whereas with yield the arguments are not evaluated until needed.

forget about this comment, your solution is good enough
�we'll fix that sorted() later :)

I asked jd if my lambda() hack was too "hacky."  His response:

�it's hacky but it's justifiable
�so I'm fine with it


MongoDB group by Patch Set 4 submitted

I fixed everything jd asked me to in the code review (with the exception of
adding iter() to test_group_by_unknown_field) and submitted the changes as
Patch Set 4:
https://review.openstack.org/#/c/43043/4/
MongoDB group by Patch Set 5 submitted

Previously, this patch was dependent on the SQL Alchemy group by patch:
https://review.openstack.org/#/c/41597/
The SQL Alchemy patch has been merged, so I rebased and added a FIXME comment
to get_meter_statistics() in ceilometer/storage/impl_mongodb.py reminding
the team to get rid of the sorted() function when returning the results:
# FIXME(terriyu) Fix get_meter_statistics() so we don't use sorted()
# to return the results
return sorted(
    (models.Statistics(**(r['value'])) for r in results['results']),
    key=operator.attrgetter('period_start'))

Next steps

The immediate next step is to work on writing the API tests.
The major components of the group by blueprint left to do:


API group by tests


API implementation of group by


API group by documentation