Skip to content

Instantly share code, notes, and snippets.

@ZapDos7
Last active February 24, 2024 13:48
Show Gist options
  • Save ZapDos7/18d5f4cd08664008764a5c2b9f369370 to your computer and use it in GitHub Desktop.
Save ZapDos7/18d5f4cd08664008764a5c2b9f369370 to your computer and use it in GitHub Desktop.

SQL

Intro

  • RDBMS = relational database management system
  • data stored in tables of related data entries (rows, columns)
  • each row is a record
  • each column is a field
  • actions are performed by SQL statements (not case sensitive keywords)

Commands/Keywords

  1. SELECT - extracts data
  2. UPDATE - updates data (beware of missing WHERE clauses)
  3. DELETE - deleted data
  4. INSERT INTO - introduces data
  5. CREATE DATABASE
  6. ALTER DATABASE
  7. CREATE TABLE
  8. ALTER TABLE
  9. DROP TABLE
  10. CREATE INDEX (an index is a search key)
  11. DROP INDEX
  12. COUNT e.g. SELECT COUNT(DISTINCT id) FROM Customers;
  13. WHERE specify conditions
  14. Operators: =, >, <, >=, <=, <>, BETWEEN, IN (for many possible values), AND, OR, NOT
  15. LIKE for pattern search
    • 'a%' : start with a
    • '%a' : ends with a
    • '%a%' : contains a
    • * : any
    • ? : one
    • [] : e.g. h[oa]t returns hot, hat
    • ! : not
    • - : range
    • # : single number
    • % : any character
    • _ : one character
  16. ORDER BY, default = ASC else DESC (need column name to specify)
  17. NULL no value e.g. for optional fields, does NOT mean 0. Check like so: IS NULL/IS NOT NULL
  18. TOP/LIMIT
  19. MIN()/MAX()
  20. AVG()
  21. SUM()

views

Virtual tables based on the result set of an SQL statement

MySQL Notes

Optimisations

Documentation for MySQL 8

How to Optimize many-to-many queries

Source

  • use indexes
  • be mindful of IN subqueries

Dates

Entity MySQL SQL Server
DATE YYYY-MM-DD YYYY-MM-DD
DATETIME YYYY-MM-DD HH:mm:ss YYYY-MM-DD HH:mm:ss
TIMESTAMP YYYY-MM-DD HH:mm:ss a unique number
YEAR YYYY or YY N/A
SMALLDATETIME N/A YYYY-MM-DD HH:mm:ss

PostgreSQL Notes

How to manage PostgreSQL using command line psql

  1. Install PostgreSQL (& specify a password)
  2. Verify installment: psql --version (should return something like psql (PostgreSQL) 14.1)
  3. Connect to a db like so: psql -d database_name -U username where -d is short for --dbname and -U (note the caps) is short for --username. The default user & default db when installing PostgreSQL are named postgres so the command is psql -d postgres -U postgres
  4. Enter the password specified in step (1).

Alternatively one can use the SQL Shell (psql). There is a prompt for a server, a database, a port and a username. Pressing enter selects the default values, which are localhost, postgres, 5432, and postgres. Again the password is required.

In the psql command line

The environment looks like so:

postgres=#

Commands:

\?                              # list all commands
\h COMMAND                      # documentation of a specific command
q                               # quit command
\l                              # list all dbs
\l+                             # list all dbs & extra info
\c db_name                      # change selected db
\dt                             # list all tables from selected db
\d table_name                   # describe a table (\d+ yields more info about the table)
\e                              # open editor
\i path_to_file/file_name.sql   # read input from a file as if you had typed it into the terminal
\timing                         # measure time a query takes
\copy films(title, year, running_time) FROM 'path_to_file' DELIMITER ‘,’ CSV HEADER;    # import data from csv file
pg_dump -U username database_name > path_to_file/filename.sql                           # backup a db (can be into an .sql or even a .tar file)
psql -U username -d database_name -f filename.sql                                       # restore db from .sql file
pg_restore -U username -d database_name path_to_file/filename.tar                       # restore db from .tar file
# create tables, delete dbs, alter tables, insert data, select like SQL

Sources:

PostgreSQL Indexing

Source

Types of indexes:

  • B-tree Index: the most commonly used type of index to efficiently store and retrieve data in PostgreSQL. It's the default index type. Whenever we use the CREATE INDEX command without specifying the type of index we want, PostgreSQL will create a B-tree index for the table or column.
CREATE INDEX index_name ON table_name;                 -- for table
CREATE INDEX index_name ON table_name (column_name);   -- for a column
  • Hash Indexes: designed for fast key-value lookups. When a query condition requires equality checks on indexed columns, hash indexes can provide extremely fast retrieval, as the hash function directly determines the location of the desired data. Hash indexes are most suitable for equality comparisons, such as = or IN operations.
CREATE INDEX hash_name ON table_name USING HASH (column_name);
  • GiST and SP-GiST Indexes: GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) indexes are advanced index types in PostgreSQL that provide support for a wide range of data types and search operations. They are particularly useful for handling complex data structures and spatial data, GiST indexes are what you use if you want to speed up full-text searches.
CREATE INDEX index_geometry ON table_name USING GIST (geometry_column);       -- GiST index on a geometry column
CREATE INDEX index_text_search ON table_name USING SPGIST (tsvector_column);  -- SP-GiST index on a tsvector column
  • BRIN Indexes: Block Range Index, is an index type in PostgreSQL designed to provide efficient indexing for large tables with sorted data. BRIN index contains the minimum and maximum in a group of database pages.
CREATE INDEX timestamp ON table_name USING BRIN (column);

My Personal Notes

\l                 : shows all dbs (equivalent: SELECT dbname FROM pg_database;
\c dbname          : use this db
\dt                : show tables of public schema
\dn+               : show privileges
\dt *.*            : show tables of all schemas
\dt schema.*       : show tables of specific schema
\d table_name      : info of table
\d+ table_name     : same as above
\dt *.table_name*  : same as above

SQL - NoSQL term equivalents

Relational DB Management System MongoDB
Database Database
Table Collection
Tuple/Row Document
Column Field
Table Join Embedded Documents
Primary Key Primary Key (**)
(server) mysqld/orcale mongod
(client) mysql/sqlplus mongo

(**) Mongo's PK:

  • Default key _id provided by MDB itself
  • 12 hex bytes, unique
    • 4 bytes: timestamp
    • 3 bytes: machine ID
    • 2 bytes: process ID
    • 3 bytes: incremented value

Traits:

  • schema-less (is a document db which holds documents which may differ)
  • structure of a single object is clear
  • no complex joins
  • deep query-ability
  • tuning
  • easy to scale
  • mapping of app objects to db objects not needed
  • use of internal memory for storing (windowed) working set → faster data access

Commands

$ sudo service mongodb start
$ sudo service mongodb stop
$ sudo service mongodb restart

$ mongo # → access db

Data Model Designs

  1. embedded data model (denormalized): you can embed all related data in a single document e.g.:
    {
      _id,
      personal_details: {
        fname,
        lname
        dob
      },
      contact: {
        email,
        phone
      },
      address: {
        city,
        area,
        state
      }
    }
    
  2. normalized data model: refer to subdocuments of original document using references e.g.
    Employee {_id}
    Personal_Details {id, fname, lname,  dob}
    Contact {id, email, phone}
    Address {id, city, area, state}
    

Commands

use DB_NAME                                           # create DB if doesn't exist, else return existing db
db.help()                                             # help
db                                                    # check currently selected db
show dbs                                              # list dbs - must have at least 1 document in db in order to display
db.dropDatabase()                                     # deletes selected db
db.createCollection(collName, optionalConfigOptions)  # options: 
                                                      # capped: boolean (if true, fixed size => deletes older records - we need to specify size param)
                                                      # autoIndexId (boolean: for _id field - default false)
                                                      # size
                                                      # max (max number of documents)
db.collection.insert({data data})                                                      
show collections
db.collection.drop()
db.collection.insert({data data})
db.collection.insertOne({data data})
db.collection.insertMany({data data}, {data data})
db.collection.find()                                  # returns all fields of this collection
db.collection.find().pretty()                         # returns all fields of this collection in formatted fashion
db.collection.find().limit(NUM)                       # find until NUM
db.collection.find().skip(NUM)                        # ignore NUM first documents
                                                      # these can be chained
db.collection.findOne()
db.collection.update(CRITERIA, DATA)
db.collection.save({_id: ObjectId(), NEW_DATA})       # replace existing file
db.collection.findOneAndUpdate(CRITERIA, NEW_DATA)    # update values of existing document
db.collection.updateOne(filters, update)              # update single document based on filter
db.collection.updateMany(filters, update)             # update all documents based on filter
db.collection.remove(CRIERIA)                         # delete a document
db.collection.remove(CRIERIA, 1)                      # delete only 1 document
db.collection.remove({})                              # delete all documents

Datatypes:

  • String
  • Integer
  • Boolean
  • Double
  • MinMaxKeys
  • Arrays
  • Timestamp
  • Object
  • Null
  • Symbol
  • Date
  • ObjectId
  • BinaryData
  • Code
  • Regex

Equality Operations

Name Description
$eq Matches values that are equal to a specified value.
$gt Matches values that are greater than a specified value.
$gte Matches values that are greater than or equal to a specified value.
$in Matches any of the values specified in an array.
$lt Matches values that are less than a specified value.
$lte Matches values that are less than or equal to a specified value.
$ne Matches all values that are not equal to a specified value.
$nin Matches none of the values specified in an array.

Logic Operations

Name Syntax
AND {"likes" : {$gt: 10}, $and [{"comments" : {$lt : 25}}]
OR {"name" : "John", $or [{"name" : "JANE"}]
NOR {"name" : "John", $not [{"name" : "JANE"}]
NOT {"name" : "John", $NOT [{"name" : "JANE"}]

Tools

Intro

A MongoDB schema is created based on workload, relationships & patterns.

Workload

Identify & Quantify Entities

  • Entity = what exists in our db & is unique & independent of each other
  • Attribute = individual properties that discribe an entity
  • Based on the project's specs, we can come up with the independent entities we will need (e.g. ebooks, audio books, printed books, users, reviews, authors, publishers for a bookstore db)
  • Their quantities are approximations based on internal business stakeholder data (e.g. expect 200.000 audiobooks, 25.000.000 users, 500 publishers etc). Paying extra attention to the largest in number entities is important.

Identify & Quantify Operations

Identify the read & write operations for all user types (e.g. user, admin) in a table like so (We quantify the oprations' (expected) rate based on stakeholder data/estimations):

Entities Operations Information Needed Type (Read/Write) Rate
Books Fetch info Book details + Rating Read 10/s
Reviews Fetch 10 reviews of a book Reviews & Review rating Read 20/s
Users Fetch user details User details Read 5/s
Books Add/Update Book details Write 1/s
Printed books Sell copy Stock level Write 2/s
Reviews Add review Review & Book rating Write 10/s
Printed Books (Admin) Fetch low stock printed books Book details + Stock level Read 2/day

Relationships

Types

  • 1-1
  • 1-N
  • N-N

We can identify which relationship we need each time by asking these two questions:

  • Can entity A be related to more than one entity B?
  • Can entity B be related to more than one entity A?

Embedding or Referencing

  • Reference: separate entities, linked using a key
  • Embedding: both entities in a single document

Data that is accessed together should be stored together

Guideline Question Embed Reference
Simplicity Would keeping the pieces of info together lead to simpler data model & code? Y N
Go Together Do the pieces of info have a has-a/contains or similar relationship? Y N
Query Atomicity Does the application query the pieces of info together? Y N
Update Complexity Are the pieces of info updated together? Y N
Archival Should the pieces of info be archived at the same time? Y N
Cardinality Is there a high cardinality (current or growing) in the child size of the relationship? N Y
Data Duplication Would data duplication be too complicated to manage and undesired? N Y
Document Size Would the combined size of the pieces of info take too much memory or transfer bandwidth for the app? N Y
Document Growth Would the embedded piece grow without bound? N Y
Workload Are the pieces of info written at different times in a write-heavy workload? N Y
Individuality For the children size of the relationship, can the pieces exist by themselves without a parent? N Y

Select based on these guidelines. Sometimes a guideline may be the most important, so it's not always a mere matter of which method gets the most "points".

Modeling 1-1 Relationships

  • Embedding:
    • include fields for the child entity into the parent entity
    • It's better to group related fields in a subdocument within the parent entity (better organization, covers for duplicate names)
  • Referencing
    • add key field in one of the two entities (whichever entity is fetched primarily should be the one with the key)
    • or both (bidirectional)

Modeling 1-N Relationships

  • Embedding:
    • different ways:
      • embed the many side as an array of subdocuments in the one side.
      • create one subdocument within the parent document
    • good for no duplicate info
    • good for entities (children) that cannot exist without the other (parent)
  • Referencing:
    • different ways:
      • include an array of references in the parent document (preferred)
      • include a reference to the parent in each child (preferred if an array is unbounded, it's best to reference the other way around)
      • bidirectional references

Modeling N-N Relationships

  • Embedding:
    • embed the documents from the child side using an array in the parent side in each parent document
      • this can lead to data duplication (not always bad - better query performance)
    • or, use a single subdocument with key-value pairs
  • Referencing:
    • use an array of references (in either direction or both)
      • this can lead to data duplication
      • bidirectional is more expensive to manage

Patterns

Inheritance Pattern

  • Different forms of similar entities
  • Shared fields or unique?
  • Keep polymorphic documents under the same collection, differentiate by a field which indicated what (extra) fields are to be expected in this document & its shape
  • Use the aggregation framework to apply

Computed Pattern

  • Run ops when data changes
  • Store results for quick access
  • Operations
    • Mathematical (store calculated value in a document instead of constantly recalculating with each fetching)
    • Roll-up (merging data)

Approximation Pattern

  • generate statistically valid number that is not exact
  • used to reduce resource usage for data that does not need to be exact
  • use random number generator to extrapolate the new value in the application level
  • trade accuracy for better db performance

Extended Reference Pattern

  • (relational) JOIN ↔ (MongoDB) $lookup but both are expensive
  • To avoid them we use this pattern: reduce or eliminate JOINs by embedding data from multiple documents into the main document
  • also leads to faster reads
  • be mindful of how to minimize duplication (use only the fields you need, prefer fields that don't change)
  • be mindful of how to keep duplicate data updated based on source ("What is the list of dependent extended references?", "Do they need to be updated immediately?")

Schema Versioning Pattern

  • Mongo assists with schema versioning with no downtime:
    • add extra field to documents ("schema_version": 2 - if it's missing, implicit v1.0)
    • use some application support
    • existing documents:
      • update via application
      • update via background task
    • before updating schema we need to update the app
  • due to flexible document model: documents with different shape can coexist in the same collection
Sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment