terriyu/diary-19aug2013.md

## diary-19aug2013.md

      
    Raw
  

              diary-19aug2013.md
            
          
    19 Aug 2013

Good commit messages as recommended by OpenStack

sandywalsh recommended this OpenStack wiki page on how to write good commit
messages:
https://wiki.openstack.org/wiki/GitCommitMessages#Information_in_commit_messages
Commenting on patch sets

I was wondering if there is any convention for the author of a patch set to
leave a comment saying what the patch set does or how it's different from
previous versions.
jpich says:

Indeed it is helpful.  Some people do this but there is no convention that I'm
aware of. It's always ok to ask questions politely in a comment, though, if
you want to know/understand something.

Upstream versus downstream projects

I asked jpich to explain to me what the terms "upstream" and "downstream" mean
in the context of software development.

At a rough level, downstream are things that consume a project (that project
is then upstream).


For instance, distributions are "downstream" of OpenStack. Red Hat, Canonical,
SuSe consume OpenStack by packaging and distributing it.


OpenStack itself also has upstreams. For instance Pecan is consumed by
Ceilometer. If a bug is found in Pecan while working on Ceilometer, it's
better to fix it "upstream" i.e. in Pecan itself rather than carry what
effectively becomes a fork otherwise.

Horizon using Ceilometer API

I recall that in the 2013 July 25 Ceilometer team meeting, someone from Horizon
dropped by to ask about the Ceilometer API:

15:25:02  jd__: the ceilomter-plugin for horizon is not
completed yet and it's for havana-3.


15:25:14 <jd__> BrooklynChen: ok!


15:25:32  yeah I was gonna ask about horizon too


15:25:42  #link https://blueprints.launchpad.net/horizon/+spec/ceilometer


15:26:29  now the problem for horizon is that the usage summary
table takes a lot of time loading the data


15:27:04  BrooklynChen: is the usage summary timestamp-constrained?


15:27:21  BrooklynChen: e.g. last day/week/month


15:27:50  eglynn: the issue is, we don't have a group-by api in
ceilometer and the grouping is done by horizon. that needs a lot of time


15:28:08  BrooklynChen: a-ha I see


15:28:19  BrooklynChen: but there is broup-by on the cards for h3


15:28:42  #link https://blueprints.launchpad.net/ceilometer/+spec/api-group-by

Reference: http://eavesdrop.openstack.org/meetings/ceilometer/2013/ceilometer.2013-07-25-15.03.log.txt
jpich says that as far as she knows, the group by feature in the Ceilometer API
is not a blocker for the Horizon blueprint mentioned https://blueprints.launchpad.net/horizon/+spec/ceilometer
Group by blueprint


group by blueprint on Launchpad:
https://blueprints.launchpad.net/ceilometer/+spec/api-group-by
group by blueprint discussion on the OpenStack wiki:
https://wiki.openstack.org/wiki/Ceilometer/blueprints/api-group-by

Storage tests / SQL Alchemy group by Patch Set 10

Made a few revisions to class StatisticsGroupByTest in
tests/storage/base.py:


In test_group_by_multiple_regular, changed self.assertFalse to
self.assertNotEqual in the else clause.  The latter is more specific,
since I'm checking for equality.


Added checking for valid period_start values in
test_group_by_with_period and
test_group_by_with_query_filter_and_period, so that I can replace the
generic self.fail("Invalid groupby and period_start parameters") in the
else clause with self.assertNotEqual statements.


Changed duration checking to use self.assertEqual instead of
self.assertAlmostEqual in test_group_by_with_period and
test_group_by_with_query_filter_and_period.  I was worried about checking
equality of floating point numbers, but I decided to forget about this
issue.  For example,
self.assertAlmostEqual(r.duration, 0.0)

is changed to
self.assertEqual(r.duration, 0)


Updated blueprint and wiki


Added a list of the tests I wrote in class StatisticsGroupByTest to the
blueprint


Added some notes to the blueprint about the way I designed my test data


Copied my edits from the blueprint to the wiki, since the wiki saves
versioning information.  The wiki acts as a backup to the blueprint.


Cleaned up the whitespace in the wiki page


MongoDB group by implementation

There are three types of MongoDB aggregation commands: aggregate, mapReduce,
group
The MongoDB manual has a comparison of these three types:
http://docs.mongodb.org/manual/reference/aggregation-commands-comparison/
Apparently, MongoDB now recommends the aggregation pipeline (a new feature
since MongoDB version 2.2) when possible:

For most aggregation operations, the Aggregation Pipeline provides better
performance and more coherent interface. However, map-reduce operations
provide some flexibility that is not presently available in the aggregation
pipeline.

Reference: http://docs.mongodb.org/manual/core/map-reduce/
We are currently using the mapReduce method to calculate meter statistics in
Ceilometer.  (Although get_resources() was recently changed to use
aggregate(), which forced Ceilometer to require MongoDB version >= 2.2)
IMHO, it seems like it's still a good idea to stick with mapReduce because
it's the most flexible.  We can support non-standard aggregation operators
(operations that are not min, max, avg, etc.).
For example, in this blueprint "Improvements for API v2", it suggests an
improvement:

Provide additional statistical function (Deviation, Median, Variation,
Distribution, Slope, etc...) which could be given as multiple results for a
given data set collection

Functions like deviation and median are not standard aggregation operators in
the MongoDB aggregation pipeline aggregate().
The group aggregation command is less flexible than mapReduce, slower in
performance than aggregate, and doesn't support sharded collections (i.e.
a database distributed across multiple servers)
Also:

Additionally, map-reduce operations can have output sets that are that exceed
the 16 megabyte output limitation of the aggregation pipeline.

Reference: http://docs.mongodb.org/manual/core/aggregation-introduction/#map-reduce