The team reviewed blueprints in progress for Havana-3.
My patch "Adds group by statistics for MongoDB driver" https://review.openstack.org/#/c/43043/ depends on the patch "Add SQLAlchemy implementation of groupby" https://review.openstack.org/#/c/41597/. However, the SQL Alchemy patch was merged today. I see that Gerrit accounted for this and no longer lists the SQL Alchemy patch as a dependency. The question is how do I submit my next patch set for the MongoDB driver?
I asked this question on #openstack-metering and sileht responded:
rebasing should be sufficient
If you want that your next "git-review -R" only push A , just git pull --rebase origin/master should works
you can direct use pull and the remote master, in one command
git pull --rebase origin/master < this rebase the current branch with the remote master and doesn't your local master
*touch your local master
I asked sileht if I should run git review
or git review -R
after that.
as you like, with one commit I think it does the same thing
-R == don't do automatic rebase before push
- group by blueprint on Launchpad: https://blueprints.launchpad.net/ceilometer/+spec/api-group-by
- group by blueprint discussion on the OpenStack wiki: https://wiki.openstack.org/wiki/Ceilometer/blueprints/api-group-by
Link to Patch Set 10: https://review.openstack.org/#/c/41597/10
gordc approved the patch and it was merged.
He also gave some minor comments on Patch Set 10, which he hopes will get addressed in a later patch.
-
In
get_meter_statistics()
ofceilometer/storage/impl_mongodb.py
, there is some code to assign a value to the attributegroupby
. This is a temporary placeholder until the MongoDB driver implements group by.# TODO(jd) implement groupby and remove this code for r in results['results']: r['value']['groupby'] = None
gordc commented:
i think groupby can be set in FINALIZE_STATS to avoid loop but i guess this is fine if we're removing it
-
In the storage tests
tests/storage/base.py
, tests are defined in classStatisticsGroupByTest
for checking metadata fields. However, since we aren't implementing groupby for metadata fields right now, the tests simply saypass
. For example:def test_group_by_metadata(self): pass
gordc suggested to:
add TODO to tests we are skipping
above the test put:
# TODO(terriyu): <what needs to be done>
-
In
get_meter_statistics()
ofceilometer/storage/impl_sqlalchemy.py
, there is some code that checks for invalid groupby fields:if groupby: for group in groupby: if group not in ['user_id', 'project_id', 'resource_id']: raise NotImplementedError("Unable to group by these fields")
gordc said:
this should probably be a ValueError -- the test case passes in a value that would never be right but raises notimplementederror which doesn't make sense.
I responded by saying that there's the possibility of metadata fields, which we decided not to implement right now. So maybe NotImplementedError is still appropriate.
gordc responded:
not a major issue. maybe a conditional statement eventually when it's a valid value raise notimplementederror and when it's not a valueerror. throwing notimplementederror for someone trying to groupby 'wtf' seems misleading. :)
I also noted that the above
if
statement could be refactored to something like:if (groupby and set(groupby) - set(['user_id', 'project_id', 'resource_id', 'source'])):
jd made this suggestion on a nearly identical block of code in the MongoDB group by patch.
-
Summary of issues that should be addressed in a followup patch:
-
Add TODO comments to tests we are skipping (metadata field statistics tests) in class
StatisticsGroupByTest
fromtests/storage/base.py
-
Refactor groupby field validity check in
get_meter_statistics()
method ofceilometer/storage/impl_sqlalchemy.py
to useset()
-
Revise groupby field validity check to throw
NotImplementedError
for stuff that's not implemented (e.g. metadata fields) and to throwValueError
for invalid values (e.g.groupby=['wtf']
)
-
-
I was worried about using a float for a key in the map function when I use
map_reduce
in MongoDB. The variable period_start is used as a key and it's a float, though technically in Javascript, there are only numbers, no floats or ints. It seems to work okay with the key being "period_start" without any rounding or casting to ints.jd said:
It should be ok since anyway we're always using the same.
-
I noticed that the groupby parameters are passed as a list, e.g.
get_meter_statistics(f, groupby=['user_id', 'resource_id']
I'm wondering if it's okay as a list or if it should be changed to a tuple.jd responded:
It does not matter much for such a small list. And in the end, we just pass it as a list for test, which does not really matter. It's pretty sure our code works with tuple too.
What may matter will be what WSME will pass as query argument, tuple or list, but honestly for such small data set, this is not something to care about.
-
For this block of code in the
get_meter_statistics()
function ofceilometer/storage/impl_mongodb.py
if groupby: for group in groupby: if group not in ['user_id', 'project_id', 'resource_id', 'source']: raise NotImplementedError("Unable to group by these fields")
jd suggested a more "idiomatic" approach using
set()
:if set(groubpy) - set(['user_id', 'project_id', 'resource_id', 'source']): raise NotImplementedError(...)
I asked jd what he meant by idiomatic?
well it avoids doing if/for/if so the code is less nested and easier to read
-
In the map functions for class
Connection
inceilometer/storage/impl_mongodb.py
, I make many string substitutions to avoid duplicated code. jd had a suggestion:To make the code more readable, I suggest you change the various %s to % (somefieldname)s
You'll have to pass a dict after the modulo operator instad of (value1, value2…), and it would be more readable.
-
In the
get_meter_statistics()
method fromceilometer/storage/impl_mongodb.py
, I pass the value of the groupby fields to the map function BSON code asstr(groupby)
. For example,map_stats = self.MAP_STATS_GROUPBY % str(groupby)
jd remarked that this is incorrect:
While str(groupby) works because the syntax of JavaScript and Python for array looks alike, I think it would be more correct to use json.dumps (groupby).
Reference for
json.dumps()
function: http://docs.python.org/2/library/json.html#json.dumpsI noticed that MongoDB also has a similar function
bson.json_util.dumps()
Reference for
bson.json_util.dumps()
: http://api.mongodb.org/python/current/api/bson/json_util.html#bson.json_util.dumpsI asked jd if I should use
bson.json_util.dumps()
instead ofjson.dumps()
and he said:no, bson's versino is just a wrapper around json.dumps that adds support for types that are created by bson
here we use a standard list that is not related to bson at all, so there is no need for that
-
Earlier, I had a problem getting
test_group_by_unknown_field
in the classStatisticsGroupByTest
fromtests/storage/base.py
to pass. The version in the patch https://review.openstack.org/#/c/41597 works for SQL Alchemy, but not for MongoDB. So I modified the test to work with both, using alambda()
hack:def test_group_by_unknown_field(self): f = storage.SampleFilter( meter='instance', ) self.assertRaises( NotImplementedError, lambda: list(self.conn.get_meter_statistics(f, groupby=['wtf'])) )
jd said I could avoid modifying the test, by rewriting the block of code that returns the result in the
get_meter_statistics()
ofceilometer/storage/impl_mongdb.py
return sorted((models.Statistics(**(r['value'])) for r in results['results']), key=operator.attrgetter('period_start'))
jd said:
You could change that to return iter(sorted(...)) That would avoid changing the test like you had too.
This function should return an iterator, the sort that is done here in Python is not a good idea, it should be done in MongoDB. But that's not the point of this patch, so don't try to fix it right now. Just adding iter() would make the return more consistent with the other databases drivers.
I tried jd's suggestion but I found that using
iter()
didn't work. The error is still thrown in get_meter_statistics() before you apply list() to it.I asked jd if I should use
yield
instead to make a generator instead of an iterator. He said:�yes that would be the solution �however there's no point in running yield since we have the result built as a list
iter()
doesn't work because the arguments passed to it are evaluated, whereas withyield
the arguments are not evaluated until needed.forget about this comment, your solution is good enough �we'll fix that sorted() later :)
I asked jd if my
lambda()
hack was too "hacky." His response:�it's hacky but it's justifiable �so I'm fine with it
I fixed everything jd asked me to in the code review (with the exception of
adding iter()
to test_group_by_unknown_field
) and submitted the changes as
Patch Set 4:
https://review.openstack.org/#/c/43043/4/
Previously, this patch was dependent on the SQL Alchemy group by patch: https://review.openstack.org/#/c/41597/
The SQL Alchemy patch has been merged, so I rebased and added a FIXME comment
to get_meter_statistics()
in ceilometer/storage/impl_mongodb.py
reminding
the team to get rid of the sorted()
function when returning the results:
# FIXME(terriyu) Fix get_meter_statistics() so we don't use sorted()
# to return the results
return sorted(
(models.Statistics(**(r['value'])) for r in results['results']),
key=operator.attrgetter('period_start'))
The immediate next step is to work on writing the API tests.
The major components of the group by blueprint left to do:
-
API group by tests
-
API implementation of group by
-
API group by documentation