Skip to content

Instantly share code, notes, and snippets.

@Haegin
Last active October 3, 2017 18:35
Show Gist options
  • Save Haegin/680d4d93b4b48314608c7bd9f521147a to your computer and use it in GitHub Desktop.
Save Haegin/680d4d93b4b48314608c7bd9f521147a to your computer and use it in GitHub Desktop.
DB answers to get started on a side project

What's the difference between embedded (sqlite) vs non-embedded?

Embedded - another term for the database being actually a library (and thus running in the same process as your application) as opposed to: Non-embedded - DB is run as an application

For a web app, just use a non-embedded DB like Postgres. You'll run into the limitations of something like SQLite too quickly.

non-embedded(hosted/ ex- postgres) -> where does it need to live in order to communicate between an api? Should a database live on it's own machine or on the same machine as your server?

It needs to be accessible to your backend. This might be over the internet or it could be on a private network. Eventually you'll want to run them on separate machines so you can optimise machine resources accordingly (using a different AWS instance type for each for example). For a side project, it's fine to be on the same machine. Even better, use a free Heroku addon and don't worry about how to configure the DB.

What is a transaction / rollback?

Imagine you are writing an employee of the month app, where companies can sign up. When the company signs up you create the first user for the company. If the user's information is invalid you don't want to create anything, but when you're trying to create the user you need the company to exist already. You can wrap both the insert statements (for the company and for the user) in a transaction, and then if either fails both will be rolled back. You can also use transactions for locking which prevents race conditions when updating records.

what are the strengths / downsides of an ORM?

It means you get to write your DB queries in the language used by your app and not in raw SQL. It also guards against a plethora of security issues such as SQL injection and can often have other features, such as schema enforcement, type casting and relationship management.

what is sql injection and how do you avoid it?

If you take user input from a form and just use string interpolation to put it in an SQL query then anyone who enters carefully crafted SQL in your form can run whatever queries they like on your DB. See https://www.explainxkcd.com/wiki/index.php/Little_Bobby_Tables.

To guard against it, use parameterised queries. The easiest way to do this is to use an ORM. Using parameterised queries also gives you a performance benefit, as the DB can build a query plan once and run it each time that query is used rather than having to reconstruct the query plan each time.

what is indexing / smart indexing

Some columns are used in way more searches than others (primary keys are common, but you may also look up users by their email for example). To make these searches (much) faster you can add an index. This adds a performance cost when writing data as the index needs updating as well, but query speed is dramatically increased. Indexes also take up space, so ensuring you have the indexes you need and no unnecessary indexes is often important for good DB performance as you scale.

examples of when to use many-to-many tables AND how to query things from them

If the relationship you're modelling in the real world is a many-to-many relationship. You query things from them using JOINs. An ORM normally makes this easier.

what is a join, and when to use it

A join lets you use multiple tables in a single query to combine data across multiple tables.

what are migrations and why are they needed

When you need to change the structure of your DB you use a migration to describe the change. This migration can then be run against your machine in development to change your DB enabling you to add the feature that depends on the DB change. When the feature is ready to go to production you can use the same migration to change the production DB in the same manner.

can the contents of a db be exported / migrated?

Yes. It depends on the DB as to how. MySQL and Postgres both support dumping the DB as a series of SQL statements that can be run to recreate the DB for example.

similarly, how do you back up a db

Various ways. You could take a DB dump and store that. For large databases this can be slow to restore though. You could also take a snapshot of the machine or the filesystem the DB is using. Filesystems like ZFS can make this a very easy and fast way to take backups.

what is sharding? and do I even need to know what it is?

Sharding is splitting your DB over multiple machines. By the time you need it you should be able to afford to pay someone who understands it.

what are op logs / how do you set them up?

Op Logs seem to be a MongoDB thing. All DBs have logs though and setting them up depends on the DB you're using.

If you have multiple instances of your web api running do you also have multiple instances of your DB (cloned?)

If you're running separate instances, for example if you support on site installations for each customer then they'll probably have a separate DB. If you're just running multiple instances of the API behind a load balancer then they'll share a single DB.

If yoeu have multiple instances of your DB running, how do you keep them in sync?

Most, if not all DBMSes support some form of replication. This allows one primary DB to send changes to one or more secondary DBs. In MySQL and Postgres these secondary DBs are commonly read only so data only needs to flow one way and they can be used for read intensive workloads to take load off the primary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment