Skip to content

Instantly share code, notes, and snippets.

@terriyu
Created August 22, 2013 06:15
Show Gist options
  • Save terriyu/6303769 to your computer and use it in GitHub Desktop.
Save terriyu/6303769 to your computer and use it in GitHub Desktop.

19 Aug 2013

Good commit messages as recommended by OpenStack

sandywalsh recommended this OpenStack wiki page on how to write good commit messages:

https://wiki.openstack.org/wiki/GitCommitMessages#Information_in_commit_messages

Commenting on patch sets

I was wondering if there is any convention for the author of a patch set to leave a comment saying what the patch set does or how it's different from previous versions.

jpich says:

Indeed it is helpful. Some people do this but there is no convention that I'm aware of. It's always ok to ask questions politely in a comment, though, if you want to know/understand something.

Upstream versus downstream projects

I asked jpich to explain to me what the terms "upstream" and "downstream" mean in the context of software development.

At a rough level, downstream are things that consume a project (that project is then upstream).

For instance, distributions are "downstream" of OpenStack. Red Hat, Canonical, SuSe consume OpenStack by packaging and distributing it.

OpenStack itself also has upstreams. For instance Pecan is consumed by Ceilometer. If a bug is found in Pecan while working on Ceilometer, it's better to fix it "upstream" i.e. in Pecan itself rather than carry what effectively becomes a fork otherwise.

Horizon using Ceilometer API

I recall that in the 2013 July 25 Ceilometer team meeting, someone from Horizon dropped by to ask about the Ceilometer API:

15:25:02 jd__: the ceilomter-plugin for horizon is not completed yet and it's for havana-3.

15:25:14 <jd__> BrooklynChen: ok!

15:25:32 yeah I was gonna ask about horizon too

15:25:42 #link https://blueprints.launchpad.net/horizon/+spec/ceilometer

15:26:29 now the problem for horizon is that the usage summary table takes a lot of time loading the data

15:27:04 BrooklynChen: is the usage summary timestamp-constrained?

15:27:21 BrooklynChen: e.g. last day/week/month

15:27:50 eglynn: the issue is, we don't have a group-by api in ceilometer and the grouping is done by horizon. that needs a lot of time

15:28:08 BrooklynChen: a-ha I see

15:28:19 BrooklynChen: but there is broup-by on the cards for h3

15:28:42 #link https://blueprints.launchpad.net/ceilometer/+spec/api-group-by

Reference: http://eavesdrop.openstack.org/meetings/ceilometer/2013/ceilometer.2013-07-25-15.03.log.txt

jpich says that as far as she knows, the group by feature in the Ceilometer API is not a blocker for the Horizon blueprint mentioned https://blueprints.launchpad.net/horizon/+spec/ceilometer

Group by blueprint

Storage tests / SQL Alchemy group by Patch Set 10

Made a few revisions to class StatisticsGroupByTest in tests/storage/base.py:

  • In test_group_by_multiple_regular, changed self.assertFalse to self.assertNotEqual in the else clause. The latter is more specific, since I'm checking for equality.

  • Added checking for valid period_start values in test_group_by_with_period and test_group_by_with_query_filter_and_period, so that I can replace the generic self.fail("Invalid groupby and period_start parameters") in the else clause with self.assertNotEqual statements.

  • Changed duration checking to use self.assertEqual instead of self.assertAlmostEqual in test_group_by_with_period and test_group_by_with_query_filter_and_period. I was worried about checking equality of floating point numbers, but I decided to forget about this issue. For example,

    self.assertAlmostEqual(r.duration, 0.0)
    

    is changed to

    self.assertEqual(r.duration, 0)
    

Updated blueprint and wiki

  • Added a list of the tests I wrote in class StatisticsGroupByTest to the blueprint

  • Added some notes to the blueprint about the way I designed my test data

  • Copied my edits from the blueprint to the wiki, since the wiki saves versioning information. The wiki acts as a backup to the blueprint.

  • Cleaned up the whitespace in the wiki page

MongoDB group by implementation

There are three types of MongoDB aggregation commands: aggregate, mapReduce, group

The MongoDB manual has a comparison of these three types:

http://docs.mongodb.org/manual/reference/aggregation-commands-comparison/

Apparently, MongoDB now recommends the aggregation pipeline (a new feature since MongoDB version 2.2) when possible:

For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.

Reference: http://docs.mongodb.org/manual/core/map-reduce/

We are currently using the mapReduce method to calculate meter statistics in Ceilometer. (Although get_resources() was recently changed to use aggregate(), which forced Ceilometer to require MongoDB version >= 2.2)

IMHO, it seems like it's still a good idea to stick with mapReduce because it's the most flexible. We can support non-standard aggregation operators (operations that are not min, max, avg, etc.).

For example, in this blueprint "Improvements for API v2", it suggests an improvement:

Provide additional statistical function (Deviation, Median, Variation, Distribution, Slope, etc...) which could be given as multiple results for a given data set collection

Functions like deviation and median are not standard aggregation operators in the MongoDB aggregation pipeline aggregate().

The group aggregation command is less flexible than mapReduce, slower in performance than aggregate, and doesn't support sharded collections (i.e. a database distributed across multiple servers)

Also:

Additionally, map-reduce operations can have output sets that are that exceed the 16 megabyte output limitation of the aggregation pipeline.

Reference: http://docs.mongodb.org/manual/core/aggregation-introduction/#map-reduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment