ZapDos7/00_dbs.md

## 00_dbs.md

      
    Raw
  

              00_dbs.md
            
          
    Databases

SQL (MySQL, PostgreSQL)

NoSQL (MongoDB, Redis)

Materialized View


## 01_sql.md

      
    Raw
  

              01_sql.md
            
          
    SQL

Intro


RDBMS = relational database management system
data stored in tables of related data entries (rows, columns)
each row is a record
each column is a field
actions are performed by SQL statements (not case sensitive keywords)

Commands/Keywords


SELECT - extracts data
UPDATE - updates data (beware of missing WHERE clauses)
DELETE - deleted data
INSERT INTO - introduces data
CREATE DATABASE
ALTER DATABASE
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE INDEX (an index is a search key)
DROP INDEX
COUNT e.g. SELECT COUNT(DISTINCT id) FROM Customers;
WHERE specify conditions
Operators: =, >, <, >=, <=, <>, BETWEEN, IN (for many possible values), AND, OR, NOT
LIKE for pattern search

'a%' : start with a
'%a' : ends with a
'%a%' : contains a
* : any
? : one
[] : e.g. h[oa]t returns hot, hat
! : not
- : range
# : single number
% : any character
_ : one character


ORDER BY, default = ASC else DESC (need column name to specify)
NULL no value e.g. for optional fields, does NOT mean 0. Check like so: IS NULL/IS NOT NULL
TOP/LIMIT
MIN()/MAX()
AVG()
SUM()

views

Virtual tables based on the result set of an SQL statement
SQL Injection


## 02_mysql.md

      
    Raw
  

              02_mysql.md
            
          
    MySQL Notes

Optimisations

Documentation for MySQL 8
How to Optimize many-to-many queries

Source

use indexes
be mindful of IN subqueries

Dates


Entity
MySQL
SQL Server


DATE
YYYY-MM-DD
YYYY-MM-DD


DATETIME
YYYY-MM-DD HH:mm:ss
YYYY-MM-DD HH:mm:ss


TIMESTAMP
YYYY-MM-DD HH:mm:ss
a unique number


YEAR
YYYY or YY
N/A


SMALLDATETIME
N/A
YYYY-MM-DD HH:mm:ss


## 03_postgresql.md

      
    Raw
  

              03_postgresql.md
            
          
    PostgreSQL Notes

How to manage PostgreSQL using command line psql


Install PostgreSQL (& specify a password)
Verify installment: psql --version (should return something like psql (PostgreSQL) 14.1)
Connect to a db like so: psql -d database_name -U username where -d is short for --dbname and -U (note the caps) is short for --username.
The default user & default db when installing PostgreSQL are named postgres so the command is psql -d postgres -U postgres
Enter the password specified in step (1).

Alternatively one can use the SQL Shell (psql). There is a prompt for a server, a database, a port and a username. Pressing enter selects the default values, which are localhost, postgres, 5432, and postgres. Again the password is required.
In the psql command line

The environment looks like so:
postgres=#
Commands:
\?                              # list all commands
\h COMMAND                      # documentation of a specific command
q                               # quit command
\l                              # list all dbs
\l+                             # list all dbs & extra info
\c db_name                      # change selected db
\dt                             # list all tables from selected db
\d table_name                   # describe a table (\d+ yields more info about the table)
\e                              # open editor
\i path_to_file/file_name.sql   # read input from a file as if you had typed it into the terminal
\timing                         # measure time a query takes
\copy films(title, year, running_time) FROM 'path_to_file' DELIMITER ‘,’ CSV HEADER;    # import data from csv file
pg_dump -U username database_name > path_to_file/filename.sql                           # backup a db (can be into an .sql or even a .tar file)
psql -U username -d database_name -f filename.sql                                       # restore db from .sql file
pg_restore -U username -d database_name path_to_file/filename.tar                       # restore db from .tar file
# create tables, delete dbs, alter tables, insert data, select like SQL

Sources:


Official Documentation
PSQL tutorial

PostgreSQL Indexing

Source
Types of indexes:

B-tree Index: the most commonly used type of index to efficiently store and retrieve data in PostgreSQL. It's the default index type. Whenever we use the CREATE INDEX command without specifying the type of index we want, PostgreSQL will create a B-tree index for the table or column.

CREATE INDEX index_name ON table_name;                 -- for table
CREATE INDEX index_name ON table_name (column_name);   -- for a column

Hash Indexes: designed for fast key-value lookups. When a query condition requires equality checks on indexed columns, hash indexes can provide extremely fast retrieval, as the hash function directly determines the location of the desired data. Hash indexes are most suitable for equality comparisons, such as = or IN operations.

CREATE INDEX hash_name ON table_name USING HASH (column_name);

GiST and SP-GiST Indexes: GiST (Generalized Search Tree) and SP-GiST (Space-Partitioned Generalized Search Tree) indexes are advanced index types in PostgreSQL that provide support for a wide range of data types and search operations. They are particularly useful for handling complex data structures and spatial data, GiST indexes are what you use if you want to speed up full-text searches.

CREATE INDEX index_geometry ON table_name USING GIST (geometry_column);       -- GiST index on a geometry column
CREATE INDEX index_text_search ON table_name USING SPGIST (tsvector_column);  -- SP-GiST index on a tsvector column

BRIN Indexes: Block Range Index, is an index type in PostgreSQL designed to provide efficient indexing for large tables with sorted data. BRIN index contains the minimum and maximum in a group of database pages.

CREATE INDEX timestamp ON table_name USING BRIN (column);

My Personal Notes

\l                 : shows all dbs (equivalent: SELECT dbname FROM pg_database;
\c dbname          : use this db
\dt                : show tables of public schema
\dn+               : show privileges
\dt *.*            : show tables of all schemas
\dt schema.*       : show tables of specific schema
\d table_name      : info of table
\d+ table_name     : same as above
\dt *.table_name*  : same as above


## 04_mongodb.md

      
    Raw
  

              04_mongodb.md
            
          
    SQL - NoSQL term equivalents


Relational DB Management System
MongoDB


Database
Database


Table
Collection


Tuple/Row
Document


Column
Field


Table Join
Embedded Documents


Primary Key
Primary Key (**)


(server) mysqld/orcale
mongod


(client) mysql/sqlplus
mongo


(**) Mongo's PK:

Default key _id provided by MDB itself
12 hex bytes, unique

4 bytes: timestamp
3 bytes: machine ID
2 bytes: process ID
3 bytes: incremented value


Traits:


schema-less (is a document db which holds documents which may differ)
structure of a single object is clear
no complex joins
deep query-ability
tuning
easy to scale
mapping of app objects to db objects not needed
use of internal memory for storing (windowed) working set → faster data access

Commands

$ sudo service mongodb start
$ sudo service mongodb stop
$ sudo service mongodb restart

$ mongo # → access db
Data Model Designs


embedded data model (denormalized): you can embed all related data in a single document
e.g.:
{
  _id,
  personal_details: {
    fname,
    lname
    dob
  },
  contact: {
    email,
    phone
  },
  address: {
    city,
    area,
    state
  }
}


normalized data model: refer to subdocuments of original document using references
e.g.
Employee {_id}
Personal_Details {id, fname, lname,  dob}
Contact {id, email, phone}
Address {id, city, area, state}


Commands

use DB_NAME                                           # create DB if doesn't exist, else return existing db
db.help()                                             # help
db                                                    # check currently selected db
show dbs                                              # list dbs - must have at least 1 document in db in order to display
db.dropDatabase()                                     # deletes selected db
db.createCollection(collName, optionalConfigOptions)  # options: 
                                                      # capped: boolean (if true, fixed size => deletes older records - we need to specify size param)
                                                      # autoIndexId (boolean: for _id field - default false)
                                                      # size
                                                      # max (max number of documents)
db.collection.insert({data data})                                                      
show collections
db.collection.drop()
db.collection.insert({data data})
db.collection.insertOne({data data})
db.collection.insertMany({data data}, {data data})
db.collection.find()                                  # returns all fields of this collection
db.collection.find().pretty()                         # returns all fields of this collection in formatted fashion
db.collection.find().limit(NUM)                       # find until NUM
db.collection.find().skip(NUM)                        # ignore NUM first documents
                                                      # these can be chained
db.collection.findOne()
db.collection.update(CRITERIA, DATA)
db.collection.save({_id: ObjectId(), NEW_DATA})       # replace existing file
db.collection.findOneAndUpdate(CRITERIA, NEW_DATA)    # update values of existing document
db.collection.updateOne(filters, update)              # update single document based on filter
db.collection.updateMany(filters, update)             # update all documents based on filter
db.collection.remove(CRIERIA)                         # delete a document
db.collection.remove(CRIERIA, 1)                      # delete only 1 document
db.collection.remove({})                              # delete all documents

Datatypes:


String
Integer
Boolean
Double
MinMaxKeys
Arrays
Timestamp
Object
Null
Symbol
Date
ObjectId
BinaryData
Code
Regex

Equality Operations


Name
Description


$eq
Matches values that are equal to a specified value.


$gt
Matches values that are greater than a specified value.


$gte
Matches values that are greater than or equal to a specified value.


$in
Matches any of the values specified in an array.


$lt
Matches values that are less than a specified value.


$lte
Matches values that are less than or equal to a specified value.


$ne
Matches all values that are not equal to a specified value.


$nin
Matches none of the values specified in an array.


Logic Operations


Name
Syntax


AND
{"likes" : {$gt: 10}, $and [{"comments" : {$lt : 25}}]


OR
{"name" : "John", $or [{"name" : "JANE"}]


NOR
{"name" : "John", $not [{"name" : "JANE"}]


NOT
{"name" : "John", $NOT [{"name" : "JANE"}]


Tools


mongeez


## 05_mongo_schema.md

      
    Raw
  

              05_mongo_schema.md
            
          
    Data Modeling for MongoDB

Intro

A MongoDB schema is created based on workload, relationships & patterns.
Workload

Identify & Quantify Entities


Entity = what exists in our db & is unique & independent of each other
Attribute = individual properties that discribe an entity
Based on the project's specs, we can come up with the independent entities we will need (e.g. ebooks, audio books, printed books, users, reviews, authors, publishers for a bookstore db)
Their quantities are approximations based on internal business stakeholder data (e.g. expect 200.000 audiobooks, 25.000.000 users, 500 publishers etc). Paying extra attention to the largest in number entities is important.

Identify & Quantify Operations

Identify the read & write operations for all user types (e.g. user, admin) in a table like so (We quantify the oprations' (expected) rate based on stakeholder data/estimations):


Entities
Operations
Information Needed
Type (Read/Write)
Rate


Books
Fetch info
Book details + Rating
Read
10/s


Reviews
Fetch 10 reviews of a book
Reviews & Review rating
Read
20/s


Users
Fetch user details
User details
Read
5/s


Books
Add/Update
Book details
Write
1/s


Printed books
Sell copy
Stock level
Write
2/s


Reviews
Add review
Review & Book rating
Write
10/s


Printed Books
(Admin)  Fetch low stock printed books
Book details + Stock level
Read
2/day


Relationships

Types


1-1
1-N
N-N

We can identify which relationship we need each time by asking these two questions:

Can entity A be related to more than one entity B?
Can entity B be related to more than one entity A?

Embedding or Referencing


Reference: separate entities, linked using a key
Embedding: both entities in a single document

Data that is accessed together should be stored together


Guideline
Question
Embed
Reference


Simplicity
Would keeping the pieces of info together lead to simpler data model & code?
Y
N


Go Together
Do the pieces of info have a has-a/contains or similar relationship?
Y
N


Query Atomicity
Does the application query the pieces of info together?
Y
N


Update Complexity
Are the pieces of info updated together?
Y
N


Archival
Should the pieces of info be archived at the same time?
Y
N


Cardinality
Is there a high cardinality (current or growing) in the child size of the relationship?
N
Y


Data Duplication
Would data duplication be too complicated to manage and undesired?
N
Y


Document Size
Would the combined size of the pieces of info take too much memory or transfer bandwidth for the app?
N
Y


Document Growth
Would the embedded piece grow without bound?
N
Y


Workload
Are the pieces of info written at different times in a write-heavy workload?
N
Y


Individuality
For the children size of the relationship, can the pieces exist by themselves without a parent?
N
Y


Select based on these guidelines. Sometimes a guideline may be the most important, so it's not always a mere matter of which method gets the most "points".
Modeling 1-1 Relationships


Embedding:

include fields for the child entity into the parent entity
It's better to group related fields in a subdocument within the parent entity (better organization, covers for duplicate names)


Referencing

add key field in one of the two entities (whichever entity is fetched primarily should be the one with the key)
or both (bidirectional)


Modeling 1-N Relationships


Embedding:

different ways:

embed the many side as an array of subdocuments in the one side.
create one subdocument within the parent document


good for no duplicate info
good for entities (children) that cannot exist without the other (parent)


Referencing:

different ways:

include an array of references in the parent document (preferred)
include a reference to the parent in each child (preferred if an array is unbounded, it's best to reference the other way around)
bidirectional references


Modeling N-N Relationships


Embedding:

embed the documents from the child side using an array in the parent side in each parent document

this can lead to data duplication (not always bad - better query performance)


or, use a single subdocument with key-value pairs


Referencing:

use an array of references (in either direction or both)

this can lead to data duplication
bidirectional is more expensive to manage


Patterns

Inheritance Pattern


Different forms of similar entities
Shared fields or unique?
Keep polymorphic documents under the same collection, differentiate by a field which indicated what (extra) fields are to be expected in this document & its shape
Use the aggregation framework to apply

Computed Pattern


Run ops when data changes
Store results for quick access
Operations

Mathematical (store calculated value in a document instead of constantly recalculating with each fetching)
Roll-up (merging data)


Approximation Pattern


generate statistically valid number that is not exact
used to reduce resource usage for data that does not need to be exact
use random number generator to extrapolate the new value in the application level
trade accuracy for better db performance

Extended Reference Pattern


(relational) JOIN ↔ (MongoDB) $lookup but both are expensive
To avoid them we use this pattern: reduce or eliminate JOINs by embedding data from multiple documents into the main document
also leads to faster reads
be mindful of how to minimize duplication (use only the fields you need, prefer fields that don't change)
be mindful of how to keep duplicate data updated based on source ("What is the list of dependent extended references?", "Do they need to be updated immediately?")

Schema Versioning Pattern


Mongo assists with schema versioning with no downtime:

add extra field to documents ("schema_version": 2 - if it's missing, implicit v1.0)
use some application support
existing documents:

update via application
update via background task


before updating schema we need to update the app


due to flexible document model: documents with different shape can coexist in the same collection

Sources


Data Modeling
Embed or Reference Guidelines


## 15_redis.md

      
    Raw
  

              15_redis.md
            
          
    Redis
Entity	MySQL	SQL Server
DATE	YYYY-MM-DD	YYYY-MM-DD
DATETIME	YYYY-MM-DD HH:mm:ss	YYYY-MM-DD HH:mm:ss
TIMESTAMP	YYYY-MM-DD HH:mm:ss	a unique number
YEAR	YYYY or YY	N/A
SMALLDATETIME	N/A	YYYY-MM-DD HH:mm:ss
Relational DB Management System	MongoDB
Database	Database
Table	Collection
Tuple/Row	Document
Column	Field
Table Join	Embedded Documents
Primary Key	Primary Key (**)
(server) `mysqld`/`orcale`	`mongod`
(client) `mysql`/`sqlplus`	`mongo`
Name	Description
$eq	Matches values that are equal to a specified value.
$gt	Matches values that are greater than a specified value.
$gte	Matches values that are greater than or equal to a specified value.
$in	Matches any of the values specified in an array.
$lt	Matches values that are less than a specified value.
$lte	Matches values that are less than or equal to a specified value.
$ne	Matches all values that are not equal to a specified value.
$nin	Matches none of the values specified in an array.
Name	Syntax
AND	{"likes" : {$gt: 10}, $and [{"comments" : {$lt : 25}}]
OR	{"name" : "John", $or [{"name" : "JANE"}]
NOR	{"name" : "John", $not [{"name" : "JANE"}]
NOT	{"name" : "John", $NOT [{"name" : "JANE"}]
Entities	Operations	Information Needed	Type (Read/Write)	Rate
Books	Fetch info	Book details + Rating	Read	10/s
Reviews	Fetch 10 reviews of a book	Reviews & Review rating	Read	20/s
Users	Fetch user details	User details	Read	5/s
Books	Add/Update	Book details	Write	1/s
Printed books	Sell copy	Stock level	Write	2/s
Reviews	Add review	Review & Book rating	Write	10/s
Printed Books	(Admin) Fetch low stock printed books	Book details + Stock level	Read	2/day
Guideline	Question	Embed	Reference
Simplicity	Would keeping the pieces of info together lead to simpler data model & code?	Y	N
Go Together	Do the pieces of info have a `has-a`/`contains` or similar relationship?	Y	N
Query Atomicity	Does the application query the pieces of info together?	Y	N
Update Complexity	Are the pieces of info updated together?	Y	N
Archival	Should the pieces of info be archived at the same time?	Y	N
Cardinality	Is there a high cardinality (current or growing) in the child size of the relationship?	N	Y
Data Duplication	Would data duplication be too complicated to manage and undesired?	N	Y
Document Size	Would the combined size of the pieces of info take too much memory or transfer bandwidth for the app?	N	Y
Document Growth	Would the embedded piece grow without bound?	N	Y
Workload	Are the pieces of info written at different times in a write-heavy workload?	N	Y
Individuality	For the children size of the relationship, can the pieces exist by themselves without a parent?	N	Y