This is a quick document aimed at highlighting the basics of what you might want to do using MongoDB and R. I am coming at this, almost completely, from a SQL mindset.
The easiest way to install, I believe, is
This quick guide describes how to create a Jupyter Notebook in AWS EC2 then how to access it remotely using SSH tunneling. This method is preferred since you do not open any additional ports besides 22, requires little-to-no configuration, and is generally more straight-forward.
This current version assumes basic familiarity with cloud computing, AWS services, and Jupyter Notebook. Mostly because this version won't have images and won't dive too deep into each individual step.
| ; Configuration for Airflow webserver and scheduler in Supervisor | |
| [program:airflow] | |
| command=/bin/airflow webserver | |
| stopsignal=QUIT | |
| stopasgroup=true | |
| user=airflow | |
| stdout_logfile=/var/log/airflow/airflow-stdout.log | |
| stderr_logfile=/var/log/airflow/airflow-stderr.log | |
| environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp" |
| ## | |
| # Flask Drive Example App | |
| # | |
| # @author Prahlad Yeri <prahladyeri@yahoo.com> | |
| # @date 30-12-2016 | |
| # Dependency: | |
| # 1. pip install flask google-api-python-client | |
| # 2. make sure you have client_id.json in this same directory. | |
| import os |
| """ | |
| Code that goes along with the Airflow tutorial located at: | |
| https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py | |
| """ | |
| from airflow import DAG | |
| from airflow.operators.python_operator import PythonOperator | |
| from airflow.operators.generic_transfer import GenericTransfer | |
| from airflow.contrib.hooks import FTPHook | |
| from airflow.hooks.mysql_hook import MySqlHook |
| # see also https://github.com/wrobstory/pgshift | |
| import gzip | |
| from io import StringIO, BytesIO | |
| from functools import wraps | |
| import boto | |
| from sqlalchemy import MetaData | |
| from pandas import DataFrame | |
| from pandas.io.sql import SQLTable, pandasSQL_builder |
| #!/bin/bash | |
| USUARIO_SO="$(whoami)" | |
| ANACONDA_URL="https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh" | |
| _DB_PASSWORD="la contraseña" | |
| _IP=$(hostname -I | cut -d' ' -f1) | |
| while getopts "a:p:h" opt; do | |
| case $opt in | |
| a) ANACONDA_URL="$OPTARG";; | |
| p) _DB_PASSWORD="$OPTARG";; | |
| h) cat <<EOF |
| #!/usr/bin/python | |
| # -*- coding: utf-8 -*- | |
| '''To use gzip file between python application and S3 directly for Python3. | |
| Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e | |
| ''' | |
| import io | |
| import gzip | |
| import json |
| -- | |
| -- This will register the "planet" table within your AWS account | |
| -- | |
| CREATE EXTERNAL TABLE planet ( | |
| id BIGINT, | |
| type STRING, | |
| tags MAP<STRING,STRING>, | |
| lat DECIMAL(9,7), | |
| lon DECIMAL(10,7), | |
| nds ARRAY<STRUCT<ref: BIGINT>>, |
| #!/usr/bin/env python3 | |
| # | |
| # Query AWS Athena using SQL | |
| # Copyright (c) Alexey Baikov <sysboss[at]mail.ru> | |
| # | |
| # This snippet is a basic example to query Athen and load the results | |
| # to a variable. | |
| # | |
| # Requirements: | |
| # > pip3 install boto3 botocore retrying |