Skip to content

Instantly share code, notes, and snippets.

@jrnxf
Last active June 24, 2020 21:21
Show Gist options
  • Save jrnxf/a457c615e432f84b96de3552f5d7a3ff to your computer and use it in GitHub Desktop.
Save jrnxf/a457c615e432f84b96de3552f5d7a3ff to your computer and use it in GitHub Desktop.

I'm writing an application that contains an overall data model with some obvious relations. I started writing the application using MongoDB (what I'm familiar with), but decided to try and transition over to Postgres since again, my data has tons of foreign keys. For simplicity, let's consider the following models:

class GameBase {
  id: string
  entryIds: string[]
  submissionIds: string[]
}
class EntryBase {
  id: string
  description: string
  gameId: string
  userId: string // id of user who supplied entry
  submissionIds: string[] // submissions where entry is covered
}
class SubmissionBase {
  id: string
  url: string
  gameId: string
  userId: string // id of user who submitted
  entryIds: string[] // entries covered by submission
}

Now I understand if I use a tool like TypeOrm, I could retrieve these relations with something like:

const games = await gameRepository
	.find({ relations: ["entryIds", "submissionIds"] });

(Or something to that effect)

But I'm not really sure how that relates to GraphQL. What I've been doing up until now is adding @ResolveField inside my Resolvers and writing something like

// in resolver
@ResolveField(() => [SubmissionBase], { nullable: true })
  submissions(@Parent() game: GameBase) {
    return this.submissionService.getManySubmissions(game.submissionIds)
  }

// in service
async getManySubmissions(submissionIds: string[]): Promise<SubmissionBase[]> {
    if (!submissionIds) return []
    return await this.submissionRepository.find({
      where: {
        id: { $in: submissionIds },
      },
    })
  }

So this makes sense to me and has been working great. I just keep having a nagging feeling like I need to switch to a relational database. From what you all can tell, is that nagging feeling rooted in truth? For example, if the same .find method you see in my service above was instead backed by Postgres instead of MongoDB, and the appropriate foreign key relationship was established, would I see speed improvements? Although submissionIds is not a true foreign key (bc Mongo), it still acts as one in this setup. I guess I'm failing to see why MongoDB is a bad choice for relational data if you can use something like @ResolveField to grab whatever you need.

Why would YOU recommend using MongoDB over Postgres or Postgres over MongoDB given this type of model setup?

@andywgarcia
Copy link

andywgarcia commented Jun 24, 2020

From my understanding of GraphQL, you wouldn't be using foreign keys to reference another child/parent object somewhere else (at least not in the GraphQL schema). You would instead have the child object actually inside the parent object. For example, in GameBase, it would look something like this:

GameBase {
  id: string
  entries: [Entry]
  submissions: [Submission]
}

This doesn't convert perfectly to a NoSql nor a Relationship database model because the graphQL schemas are independent of how you actually store the data. GraphQL schemas are excellent at representing relational data, but it does not imply that a relational database would be better fitting.

The choice between a relational database and a NoSQL database should be made based off of a few things (off the top of my head):

  1. Read/Write ratio -- NoSQL databases optimize for your read queries but duplicating data in a form that is similar, if not exactly the same, as you would actually use it for display. If you have made different forms of displaying the data, but you only write to it every now and then, duplicating it isn't so bad. Every write actually means N writes where N is the number of places the data is duplicated. So, if you are constantly updating the data, then this is another thing to take into consideration.
  2. Query Flexbility -- Do you anticipate needing to add additional queries often more than the current ones with the same data, but in different forms? Another way of putting it is "Do you know ahead of time generally how you are going to be needing the data?" Examples based off of your model would be GetAllEntriesForSpecificSubmissions or GetGameIdsByEntries. For each one of these queries in a NoSQL world, chances are, you would just need to duplicate the data to optimize for these queries. In a relational database world, you could probably just get nifty with your query and not need to duplicate anything.
  3. Eventual Consistency Requirements -- Because of the duplication of data in NoSQL models, this naturally tends to be an eventually consistent model where immediately after a write, different services that have duplicated the data may or may not be in sync, but will eventually be in sync after all the additional copies have been updated. If you MUST have all data in sync, there are options for NoSQL models, but a relational database tends to be geared towards not even worrying about this.
  4. Storage Requirements -- Again, because of the duplication of data in NoSQL in order to optimize reads, this will naturally take more space. If space is a concern, keep in mind that relational databases are designed to not duplicate anything as best as possible
  5. Schema flexibility -- If you want to add a new field in a NoSQL database, then you add it to the model and the database will store it (unless you put restrictions on your NoSQL database, which I am sure is possible somehow, but I haven't seen it before). If you want to do the same in a relational database, then you modify the table and the model. However, when you want to add a new field that contains an object, the NoSQL databases really shine because it immediately stores it, but the SQL database needs a new table and a modified version of an existing table with a foreign key reference. Both database types can support it, but both have trade offs.

I am sure there is even more to consider, but these things come to mind for me.

Would I recommend one over the other personally? Lets see:

  1. Read/Write Ratio -- Usually, reading ends up happening WAY more than writing, so +1 for NoSQL
  2. Query Flexibility -- Unless you haven't thought about a UI that you want to use this data for, you usually have an idea of how you want to query your data, so +1 for NoSQL
  3. Eventual consistency requirements -- I haven't come across many personal projects that are not ok with eventual consistency, but i'll leave this one up to you
  4. Storage requirements -- It is 2020, storage is cheap. +1 NoSQL
  5. Schema Flexibility -- +1 for NoSQL, but this is debatable because your model could easily get out of sync with your data. But, if you are only grabbing the fields you actually need and care about, you will find out very quickly if the data is bad IMO. Just be sure to handle errors efficiently.

Yup. NoSQL. But only if you allow yourself to do it right. I see too often people trying to join across NoSQL databases/collections to get data. Nope. Just duplicate it (most of the time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment