Jose Ribeiro josesaribeiro

## rmongodb-tutorial.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                josesaribeiro
                / rmongodb-tutorial.md
            
            
              Created
              April 14, 2018 13:27
                — forked from Btibert3/rmongodb-tutorial.md
            
              
                Basic Overview of using the rmongodb package for R.
              
          
    rmongodb Tutorial

This is a quick document aimed at highlighting the basics of what you might want to do using MongoDB and R.  I am coming at this, almost completely, from a SQL mindset.
Install

The easiest way to install, I believe, is


## aws_jupyter_tunnel.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                josesaribeiro
                / aws_jupyter_tunnel.md
            
            
              Created
              June 28, 2019 01:36
                — forked from jakechen/aws_jupyter_tunnel.md
            
              
                Creating and connecting to Jupyter Notebooks in AWS EC2
              
          
    Introduction

This quick guide describes how to create a Jupyter Notebook in AWS EC2 then how to access it remotely using SSH tunneling. This method is preferred since you do not open any additional ports besides 22, requires little-to-no configuration, and is generally more straight-forward.
Pre-requisites

This current version assumes basic familiarity with cloud computing, AWS services, and Jupyter Notebook. Mostly because this version won't have images and won't dive too deep into each individual step.
Steps

Spin-up EC2 instance with "Deep Learning" AMI


Log into EC2 console and click "Launch Instance" button.
Inside "AWS Marketplace", select the "Deep Learning AMI" from AWS. I use this AMI because most of the stuff you'll need is installed already.


## airflow-supervisord.conf
; Configuration for Airflow webserver and scheduler in Supervisor

[program:airflow]
command=/bin/airflow webserver
stopsignal=QUIT
stopasgroup=true
user=airflow
stdout_logfile=/var/log/airflow/airflow-stdout.log
stderr_logfile=/var/log/airflow/airflow-stderr.log
environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"

## flask_drive_example.py
##
# Flask Drive Example App
#
# @author Prahlad Yeri <prahladyeri@yahoo.com>
# @date 30-12-2016
# Dependency:
# 1. pip install flask google-api-python-client
# 2. make sure you have client_id.json in this same directory.

import os

## airflow-dag-csv-to-mysql.py
"""
Code that goes along with the Airflow tutorial located at:
https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
"""
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.generic_transfer import GenericTransfer
from airflow.contrib.hooks import FTPHook
from airflow.hooks.mysql_hook import MySqlHook

## to_redshift.py
# see also https://github.com/wrobstory/pgshift

import gzip
from io import StringIO, BytesIO
from functools import wraps

import boto
from sqlalchemy import MetaData
from pandas import DataFrame
from pandas.io.sql import SQLTable, pandasSQL_builder

## airflowPostgresqlInstall.sh
#!/bin/bash
USUARIO_SO="$(whoami)"
ANACONDA_URL="https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh"
_DB_PASSWORD="la contraseña"
_IP=$(hostname -I | cut -d' ' -f1)
while getopts "a:p:h" opt; do
  case $opt in
    a) ANACONDA_URL="$OPTARG";;
	p) _DB_PASSWORD="$OPTARG";;
	h) cat <<EOF

## gzip_s3_and_json_py3.py
#!/usr/bin/python
# -*- coding: utf-8 -*-

'''To use gzip file between python application and S3 directly for Python3.
Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e
'''

import io
import gzip
import json

## 0_register_planet.sql
--
-- This will register the "planet" table within your AWS account
--
CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP<STRING,STRING>,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT<ref: BIGINT>>,

## query_athena.py
#!/usr/bin/env python3
#
# Query AWS Athena using SQL
# Copyright (c) Alexey Baikov <sysboss[at]mail.ru>
#
# This snippet is a basic example to query Athen and load the results
# to a variable.
#
# Requirements:
# > pip3 install boto3 botocore retrying
	; Configuration for Airflow webserver and scheduler in Supervisor

	[program:airflow]
	command=/bin/airflow webserver
	stopsignal=QUIT
	stopasgroup=true
	user=airflow
	stdout_logfile=/var/log/airflow/airflow-stdout.log
	stderr_logfile=/var/log/airflow/airflow-stderr.log
	environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"
	##
	# Flask Drive Example App
	#
	# @author Prahlad Yeri <prahladyeri@yahoo.com>
	# @date 30-12-2016
	# Dependency:
	# 1. pip install flask google-api-python-client
	# 2. make sure you have client_id.json in this same directory.

	import os
	"""
	Code that goes along with the Airflow tutorial located at:
	https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
	"""
	from airflow import DAG
	from airflow.operators.python_operator import PythonOperator
	from airflow.operators.generic_transfer import GenericTransfer
	from airflow.contrib.hooks import FTPHook
	from airflow.hooks.mysql_hook import MySqlHook
	# see also https://github.com/wrobstory/pgshift

	import gzip
	from io import StringIO, BytesIO
	from functools import wraps

	import boto
	from sqlalchemy import MetaData
	from pandas import DataFrame
	from pandas.io.sql import SQLTable, pandasSQL_builder
	#!/bin/bash
	USUARIO_SO="$(whoami)"
	ANACONDA_URL="https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh"
	_DB_PASSWORD="la contraseña"
	_IP=$(hostname -I \| cut -d' ' -f1)
	while getopts "a:p:h" opt; do
	case $opt in
	a) ANACONDA_URL="$OPTARG";;
	p) _DB_PASSWORD="$OPTARG";;
	h) cat <<EOF
	#!/usr/bin/python
	# -- coding: utf-8 --

	'''To use gzip file between python application and S3 directly for Python3.
	Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e
	'''

	import io
	import gzip
	import json
	--
	-- This will register the "planet" table within your AWS account
	--
	CREATE EXTERNAL TABLE planet (
	id BIGINT,
	type STRING,
	tags MAP<STRING,STRING>,
	lat DECIMAL(9,7),
	lon DECIMAL(10,7),
	nds ARRAY<STRUCT<ref: BIGINT>>,
	#!/usr/bin/env python3
	#
	# Query AWS Athena using SQL
	# Copyright (c) Alexey Baikov <sysboss[at]mail.ru>
	#
	# This snippet is a basic example to query Athen and load the results
	# to a variable.
	#
	# Requirements:
	# > pip3 install boto3 botocore retrying