albertoperdomo/mongify.txt

## mongify.txt
This is a case study of how we would model parts of the spotify app in MongoDB.

MODEL
=====

users
-----
username
name
plain_txt_password
plan
list_ids: [id, id, id]
friend_ids: [id, id, id]

tracks
------
id
title
album_id
artist_ids [1,2,8]

albums
------
id
title
artist_id
published_at
produced_by
track_count
awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]

artists
-------
id
name
bio
picture: [{path: ..., filename: ..., version: "thumb", ...}]
active: true
overdose: true

lists
-----
id
user_id
title
public: true/false
track_ids: [id, id, id, id]
subscriber_count

plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
-----
user_id
track_id
played_at
duration

SAMPLE QUERIES
==============

Your n favorite songs (the ones with the most plays by you)
-----------------------------------------------------------

One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.

Group query:

db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })

result:

[ { _id: ..., value: { n: ...,  last_played_at: ...}

map reduce query
----------------

db.plays.mapReduce(
		function () { emit( this.track_id, { n: 1, t: this.played_at }) },
		function (key, values) {


			//// Prettier implementation
			var n_res = 0;
			var max_t = 0;
			values.forEach(function(v){
				n_res+=v.n;
				if(v.t > max_t) max_t = v.t;
			})
			return {n: n_res, t: max_t};


			///// Implementation with better performance
			var ac = values[0];
			for(var i = 1; i < values.length; i++) {
				ac.n += values[i].n;
				var v_t = values[i].t;
				if(v_t > ac.t) ac.t = v_t;
			}
			return ac;
		},
			out: "fooo"
		 query: { user_id: xxx })

		db.foo.find().sort(..)

How this map reduce works:

Sample data:
track_id played_at
--------------
1 100
2 101
1 102
2 100
1 200

emit(1, { n: 1, t: 100 })
emit(2, { n: 1, t: 101 })
emit(1, { n: 1, t: 102 })
emit(2, { n: 1, t: 100 })
emit(1, { n: 1, t: 200 })


reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
	=>				  { n: 3, t: 200 }

reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
  =>				  { n: 2, t: 101 }

...

Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval

## posts_and_comments.txt
Option A: comments are referencing a blogpost, like typically in ActiveRecord
=============================================================================

posts
-----
id
title
body
author_id -> user

comments
--------
id
text
author_id -> user
post_id -> post

Option B: comments are embedded in the blogpost document
========================================================

posts
-----
id
title
body
author_id -> user
comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]


An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.

posts.find({"comments._id": xxx})

It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.
	This is a case study of how we would model parts of the spotify app in MongoDB.

	MODEL
	=====

	users
	-----
	username
	name
	plain_txt_password
	plan
	list_ids: [id, id, id]
	friend_ids: [id, id, id]

	tracks
	------
	id
	title
	album_id
	artist_ids [1,2,8]

	albums
	------
	id
	title
	artist_id
	published_at
	produced_by
	track_count
	awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]

	artists
	-------
	id
	name
	bio
	picture: [{path: ..., filename: ..., version: "thumb", ...}]
	active: true
	overdose: true

	lists
	-----
	id
	user_id
	title
	public: true/false
	track_ids: [id, id, id, id]
	subscriber_count

	plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
	-----
	user_id
	track_id
	played_at
	duration

	SAMPLE QUERIES
	==============

	Your n favorite songs (the ones with the most plays by you)
	-----------------------------------------------------------

	One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.

	Group query:

	db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })

	result:

	[ { _id: ..., value: { n: ..., last_played_at: ...}

	map reduce query
	----------------

	db.plays.mapReduce(
	function () { emit( this.track_id, { n: 1, t: this.played_at }) },
	function (key, values) {


	//// Prettier implementation
	var n_res = 0;
	var max_t = 0;
	values.forEach(function(v){
	n_res+=v.n;
	if(v.t > max_t) max_t = v.t;
	})
	return {n: n_res, t: max_t};


	///// Implementation with better performance
	var ac = values[0];
	for(var i = 1; i < values.length; i++) {
	ac.n += values[i].n;
	var v_t = values[i].t;
	if(v_t > ac.t) ac.t = v_t;
	}
	return ac;
	},
	out: "fooo"
	query: { user_id: xxx })

	db.foo.find().sort(..)

	How this map reduce works:

	Sample data:
	track_id played_at
	--------------
	1 100
	2 101
	1 102
	2 100
	1 200

	emit(1, { n: 1, t: 100 })
	emit(2, { n: 1, t: 101 })
	emit(1, { n: 1, t: 102 })
	emit(2, { n: 1, t: 100 })
	emit(1, { n: 1, t: 200 })


	reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
	=> { n: 3, t: 200 }

	reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
	=> { n: 2, t: 101 }

	...

	Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
	Option A: comments are referencing a blogpost, like typically in ActiveRecord
	=============================================================================

	posts
	-----
	id
	title
	body
	author_id -> user

	comments
	--------
	id
	text
	author_id -> user
	post_id -> post

	Option B: comments are embedded in the blogpost document
	========================================================

	posts
	-----
	id
	title
	body
	author_id -> user
	comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]


	An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.

	posts.find({"comments._id": xxx})

	It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.