vonconrad/gist:2373086

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    So, I'm putting together a MongoDB database to store XML documents. There are plenty of reasons for why I would want to do something seemingly insane like this; most importantly, I need to be able to see what's changed in the document as well as easily access individual element and attribute values.
I want to store each individual "item" as a row. By "item," I mean each child of the root element. Each item will have standard XML attributes and element children.
Now, one thing to keep in mind: Some of these documents are going to be small (< 100kb), others pretty big (> 100mb). Obviously, plenty in between as well. The documents are going range from a few hundred items to potentially hundreds of thousands.
My initial thought was to have an Item model for each item, with its child elements and attributes as nested documents. Consider this XML doc:
<root>
  <item sku="123">
    <brand>Apple</brand>
    <model>iPhone 4s</model>
    <rating>5</rating>
    <price>599.95</price>
  </item>
  <item sku="456">
    <brand>Samsung</brand>
    <model>Galaxy S II</model>
    <rating>2</rating>
    <price>514.99</price>
  </item>
  <item sku="789">
    <brand>Nokia</brand>
    <model>Lumia 900</model>
    <rating>3</rating>
    <price>499.99</price>
  </item>
</root>

I thought that the item might look something like this:
#<Item _id: ..., document_id: ..., checksum: ..., attributes: <NestedDocument>, elements: <NestedDocument>>

However, the more I think about it, the more I'm not sure whether it's a good idea to have them as nested documents or separate.
For how I'll want to be able to access the data, I'm obviously going to need to fetch the items (together with attributes and elements) for a single document a lot. I can't think of a scenario where I need to fetch the items without attributes and elements.
But occasionally, I'm also going to need to grab, for example, all brands within a document--basically, returning an array of ['Apple', 'Samsung', 'Nokia']. This is where I'm unsure, as I think separate MongoDB documents would help with this operation. I guess I just don't know enough about Mongo yet!