Skip to content

Instantly share code, notes, and snippets.

@teddykishi
Forked from JedWatson/1-proposal.md
Created June 24, 2021 15:39
Show Gist options
  • Save teddykishi/77bbe02cd478173c721a6119a2d94b05 to your computer and use it in GitHub Desktop.
Save teddykishi/77bbe02cd478173c721a6119a2d94b05 to your computer and use it in GitHub Desktop.
Proposal: adding reverse-relationship population to Mongoose (as implemented in KeystoneJS)

I've developed a useful feature in KeystoneJS that lets you populate a relationship from either side, while only storing the data on one side, and am looking for feedback on whether it is something that could / should be brought back into mongoose itself. (It might be possible to add as a separate package but I suspect there'd be too much rewriting of mongoose internals for that to be a good idea).

I've added this as an issue in mongoose for consideration: #1888 but am leaving this gist in place because the examples are easier to read.

I've used Posts and Categories as a basic, contrived example to demonstrate what I'm talking about here; in reality you'd rarely load all the posts for a category but there are other real world cases where it's less unreasonable you'd want to do this, and Posts + Categories is an easy way to demo it.

The problem

The built-in population feature is really useful; not just for simple examples (like below) but if you're passing around a query (promise) in a sophisticated app, you can encapsulate both the query filters and the full dataset it should return, while allowing it to be modified by another method before it is run.

There are many ways to implement relationships between collections in mongo, one of the most performant (from a read perspective) being to store the relationship data on both models. Mongoose's population support also means this is one of the easiest to code against in many scenarios.

It requires a lot of management though; keeping arrays in sync means using pre/post save middleware, or piping any changes to the arrays through specific methods that keep them in sync for you.

See two-sided-relationship.js for an implementation of this (without the sync logic).

I think it's better to store relationships on one side only in many cases - either a single reference or an array of references on the primary Model. There's nothing to keep in sync, one 'source of truth' but it requires more code to query (from the secondary Model) and may not be as performant?

See one-sided-relationship.js for a (very rough) implementation of this.

The solution

I have developed a Relationship feature in Keystone that lets you populate one-sided relationships as if they were two-sided, by specifying a relationship on the secondary schema, and propose we implement it (better) in mongoose itself.

In Keystone there is a (currently undocumented) populateRelated method that is created on Lists. See keystone-relationships.js for an example of how this works.

The populateRelated method on Documents actually works in both directions, so it can populate (and sub-populate) nested relationships from either side, an example of that in use is outside the scope of the ones I've written, but it's very cool :)

The biggest downside to implementing it outside of mongoose's populate functionality is it can't be queued before the query is executed, so it has to be used similarly to the populate method that is available to Document objects. There's also a (currently horribly inefficient) method on keystone itself to run this for all documents in an array.

If we brought this across, you could call a method on the mongoose Schema telling it that another Schema holds a ref to it in a path (simple or array for one-to-many or many-to-many, could be detected from the related Schema). This relationship would then be taken into account by the populate method, and (although the underlying queries would be different) it is treated like a path storing { type: mongoose.Schema.Types.ObjectId, ref: '(primary Model)' }.

See mongoose-relationships.js and mongoose-relationships-alt.js for how I propose this would work if implemented natively in mongoose.

If it's something that would be welcomed in mongoose I'd be happy to help implement it (but I've looked through the code and might need a primer by somebody who understands how populate works better than I do). The method in Keystone is currently fairly rough; it works but the performance (and implementation) leaves quite a bit to be desired, whether it gets re-implemented in mongoose or stays a Keystone specific feature, I'd also love someone with more experience wrangling performance in mongoose to help me improve it.

The actual implementation in Keystone can be found here:

// This is a contrived example to demonstrate duplicating
// data on both sides of a relationship to enable population
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String,
posts: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Post' }]
});
var Category = mongoose.model('Category', CategorySchema);
// now we can populate on both sides
Post.find().populate('categories').exec(function(err, posts) { /* ... */ });
Category.find().populate('posts').exec(function(err, categories) { /* ... */ });
// This is a contrived example to demonstrate
// storing data on one side of a relationship
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String
});
var Category = mongoose.model('Category', CategorySchema);
// easy to get the categories for a post...
Post.find().populate('categories').exec(function(err, posts) {
// you have posts with categories
});
// but harder to get the posts for a category.
Category.find().populate('posts').exec(function(err, categories) {
// handle err
async.forEach(categories, function(category, done) {
Post.find().where('categories').in([category.id]).exec(function(err, posts) {
category.posts = posts;
done(err);
});
}, function(err) {
// ... you have categories with posts
});
});
// This is how you could use Keystone's features to simplify
// managing the relationship. Posts have gained authors.
var Post = new keystone.List({
autokey: { path: 'slug', from: 'title', unique: 'true' }
}).add({
title: String,
contents: keystone.Types.Markdown,
author: { type: keystone.Types.Relationship, ref: 'User' },
categories: { type: keystone.Types.Relationship, ref: 'Category', many: true }
});
Post.register();
var Category = new keystone.List().add({
name: String
});
Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });
Category.register();
// we can populate categories on posts using mongoose's populate
Post.model.find().populate('categories').exec(function(err, posts) { /* ... */ });
// there's one more step, but we can use keystone's populateRelated to achieve a similar effect for posts in categories
Category.model.find().exec(function(err, categories) {
keystone.populateRelated(categories, 'posts', function(err) {
// ... you have categories with posts
});
});
// if you've got a single document you want to populate a relationship on, it's neater
Category.model.findOne().exec(function(err, category) {
category.populateRelated('posts', function(err) {
// posts is populated
});
});
// if you also wanted to populate the author on the posts loaded for each category,
// you can do that too - it uses mongoose's populate method because the relationship
// is stored in a path on the Post
Category.model.findOne().exec(function(err, category) {
category.populateRelated('posts[author]', function(err) {
// posts is populated, and each author on each post is populated
});
});
// This is the same as the Keystone relationships example, but written as if
// the functionality were built into mongoose itself. Much simpler to use.
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String
});
Category.relationship({ path: 'posts', ref: 'Post', refPath: 'categories' });
var Category = mongoose.model('Category', CategorySchema);
// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
// ... you have categories with posts
});
// An alternative way of configuring relationships on Schemas in mongoose,
// if it were implemented as a SchemaType instead of a separate method.
// Not sure if this would be better or worse to implement in mongoose!
var PostSchema = new mongoose.Schema({
title: String,
slug: String,
contents: String,
categories: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Category' }]
});
var Post = mongoose.model('Post', PostSchema);
var CategorySchema = new mongoose.Schema({
name: String,
posts: { type: mongoose.Schema.Types.Relationship, ref: 'Post', refPath: 'categories' }
});
var Category = mongoose.model('Category', CategorySchema);
// with proper integration with populate, no need for another nested function!
Category.find().populate('posts').exec(function(err, categories) {
// ... you have categories with posts
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment