Skip to content

Instantly share code, notes, and snippets.

@kzzzr
Last active March 31, 2022 17:38
Show Gist options
  • Save kzzzr/fb49615e45e61a80638166b740657370 to your computer and use it in GitHub Desktop.
Save kzzzr/fb49615e45e61a80638166b740657370 to your computer and use it in GitHub Desktop.
Data Engineer – MongoDB practice
# define your connection string (URI) with env variable
# export MONGODB_ATLAS_URI=<your_uri>
mongo $MONGODB_ATLAS_URI
# replica set
rs.status()
db.isMaster()
# Mongo Shell Basics
db
show dbs
use examples
## 1. Populate a collection (Insert)
db.inventory.insertMany([
{ item: "journal", qty: 25, status: "A", size: { h: 14, w: 21, uom: "cm" }, tags: [ "blank", "red" ] },
{ item: "notebook", qty: 50, status: "A", size: { h: 8.5, w: 11, uom: "in" }, tags: [ "red", "blank" ] },
{ item: "paper", qty: 10, status: "D", size: { h: 8.5, w: 11, uom: "in" }, tags: [ "red", "blank", "plain" ] },
{ item: "planner", qty: 0, status: "D", size: { h: 22.85, w: 30, uom: "cm" }, tags: [ "blank", "red" ] },
{ item: "postcard", qty: 45, status: "A", size: { h: 10, w: 15.25, uom: "cm" }, tags: [ "blue" ] }
]);
## 2. Select All Documents
db.inventory.find({})
db.inventory.find({}).pretty()
## 3. Specify Equality Matches
db.inventory.find( { status: "D" } );
db.inventory.find( { qty: 0 } );
db.inventory.find( { qty: 0, status: "D" } );
db.inventory.find( { "size.uom": "in" } )
db.inventory.find( { size: { w: 21, h: 14, uom: "cm" } } )
db.inventory.find( { tags: "red" } )
db.inventory.find( { tags: [ "red", "blank" ] } ).pretty()
## 4. Specify Fields to Return (Projection)
db.inventory.find( {}, { item: 1, status: 1 } );
db.inventory.find( {}, { _id: 0, item: 1, status: 1 } );
db.inventory.drop()
# Querying documents (intermediate)
db.inventory.insertMany([
{ item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
{ item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
{ item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
{ item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
{ item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
]);
db.inventory.find( { status: { $in: [ "A", "D" ] } } )
db.inventory.find( { status: "A", qty: { $lt: 30 } } )
db.inventory.find( { $or: [ { status: "A" }, { qty: { $lt: 30 } } ] } )
db.inventory.find( {
status: "A",
$or: [ { qty: { $lt: 30 } }, { item: /^p/ } ]
} )
## nested documents
db.inventory.find( { size: { h: 14, w: 21, uom: "cm" } } )
db.inventory.find( { size: { w: 21, h: 14, uom: "cm" } } )
db.inventory.find( { "size.h": { $lt: 15 } } )
db.inventory.find( { "size.h": { $lt: 15 }, "size.uom": "in", status: "D" } )
db.inventory.drop()
## querying arrays
db.inventory.insertMany([
{ item: "journal", qty: 25, tags: ["blank", "red"], dim_cm: [ 14, 21 ] },
{ item: "notebook", qty: 50, tags: ["red", "blank"], dim_cm: [ 14, 21 ] },
{ item: "paper", qty: 100, tags: ["red", "blank", "plain"], dim_cm: [ 14, 21 ] },
{ item: "planner", qty: 75, tags: ["blank", "red"], dim_cm: [ 22.85, 30 ] },
{ item: "postcard", qty: 45, tags: ["blue"], dim_cm: [ 10, 15.25 ] }
]);
db.inventory.find( { tags: ["red", "blank"] } )
db.inventory.find( { tags: { $all: ["red", "blank"] } } )
db.inventory.find( { tags: "red" } )
db.inventory.find( { dim_cm: { $gt: 25 } } )
db.inventory.find( { dim_cm: { $gt: 15, $lt: 20 } } )
db.inventory.find( { dim_cm: { $elemMatch: { $gt: 22, $lt: 30 } } } )
db.inventory.find( { "dim_cm.1": { $gt: 25 } } )
db.inventory.find( { "tags": { $size: 3 } } )
db.inventory.drop()
## querying for null
db.inventory.insertMany([
{ _id: 1, item: null },
{ _id: 2 }
])
db.inventory.find( { item: null } )
db.inventory.find( { item : { $type: 10 } } )
db.inventory.find( { item : { $exists: false } } )
version: '3.7'
services:
mongodb_container:
image: mongo:3.6.23
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: rootpassword
ports:
- 27017:27017
volumes:
- mongodb_data_container:/data/db
volumes:
mongodb_data_container:

Занятие Практика по NoSQL

Цели занятия

  • Изучим принципы документ-ориентированных БД на примере MongoDB;
  • Возможные варианты интеграции MongoDB в аналитические приложения;

Результаты

  • Развертывание экземпляра MongoDB Atlas
  • Простые операции манипулирования данными
  • Генерация схемы из schemaless database
  • Пакетная выгрузка из MongoDB и автоматизация на Airflow

Содержание

MongoDB Atlas + Sample Dataset

Обзор Mongo shell: CRUD, Document-oriented, ObjectId

MQL to SQL: mongosqld

Document Relational Definition Language: mongodrdl

Data Streaming + Online Analytics: Listen to Oplog, ChangeStreams

Exporting data: mongoexport + automation with Airflow

# exporting schema to .yaml files
# https://docs.mongodb.com/bi-connector/current/reference/mongodrdl/#mongodrdl
mongodrdl \
--uri=$MONGODB_ATLAS_URI \
--out=./result.yml \
--sampleSize=1000
mongoexport \
--uri=$MONGODB_ATLAS_URI \
--collection=listingsAndReviews \
--type=json \
--limit=${MONGODB_LIMIT_ROWS:-1000}
import yaml
import pandas as pd
import glob
yml = glob.glob('*.yml')
s = []
for y in yml:
with open(y, 'r') as f:
s += yaml.load(f, Loader=yaml.FullLoader)['schema']
df = pd.io.json.json_normalize(s,
record_path=['tables', 'columns'],
meta=['db', ['tables', 'table'], ['tables', 'collection']],
record_prefix='column.',
meta_prefix='database.')
df.columns = ['column.Name', 'column.MongoType', 'column.SqlName', 'column.SqlType',
'database', 'database.relational_table', 'database.collection']
column_order = ['database', 'database.relational_table', 'database.collection',
'column.Name', 'column.SqlName', 'column.MongoType', 'column.SqlType']
df = df[column_order]
# df.describe()
# df[["database", "database.collection"]].drop_duplicates()
df.to_excel("output.xlsx")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment