Skip to content

Instantly share code, notes, and snippets.

@josesaribeiro
josesaribeiro / rmongodb-tutorial.md
Created April 14, 2018 13:27 — forked from Btibert3/rmongodb-tutorial.md
Basic Overview of using the rmongodb package for R.

rmongodb Tutorial

This is a quick document aimed at highlighting the basics of what you might want to do using MongoDB and R. I am coming at this, almost completely, from a SQL mindset.

Install

The easiest way to install, I believe, is

@josesaribeiro
josesaribeiro / aws_jupyter_tunnel.md
Created June 28, 2019 01:36 — forked from jakechen/aws_jupyter_tunnel.md
Creating and connecting to Jupyter Notebooks in AWS EC2

Introduction

This quick guide describes how to create a Jupyter Notebook in AWS EC2 then how to access it remotely using SSH tunneling. This method is preferred since you do not open any additional ports besides 22, requires little-to-no configuration, and is generally more straight-forward.

Pre-requisites

This current version assumes basic familiarity with cloud computing, AWS services, and Jupyter Notebook. Mostly because this version won't have images and won't dive too deep into each individual step.

Steps

Spin-up EC2 instance with "Deep Learning" AMI

  1. Log into EC2 console and click "Launch Instance" button.
  2. Inside "AWS Marketplace", select the "Deep Learning AMI" from AWS. I use this AMI because most of the stuff you'll need is installed already.
; Configuration for Airflow webserver and scheduler in Supervisor
[program:airflow]
command=/bin/airflow webserver
stopsignal=QUIT
stopasgroup=true
user=airflow
stdout_logfile=/var/log/airflow/airflow-stdout.log
stderr_logfile=/var/log/airflow/airflow-stderr.log
environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"
@josesaribeiro
josesaribeiro / flask_drive_example.py
Created August 7, 2019 20:17 — forked from prahladyeri/flask_drive_example.py
google drive api implementation in python-flask framework
##
# Flask Drive Example App
#
# @author Prahlad Yeri <prahladyeri@yahoo.com>
# @date 30-12-2016
# Dependency:
# 1. pip install flask google-api-python-client
# 2. make sure you have client_id.json in this same directory.
import os
@josesaribeiro
josesaribeiro / airflow-dag-csv-to-mysql.py
Created September 21, 2019 13:04
Airflow Ftp CSV to SQL
"""
Code that goes along with the Airflow tutorial located at:
https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
"""
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.generic_transfer import GenericTransfer
from airflow.contrib.hooks import FTPHook
from airflow.hooks.mysql_hook import MySqlHook
@josesaribeiro
josesaribeiro / to_redshift.py
Created September 22, 2019 17:59 — forked from TomAugspurger/to_redshift.py
to_redshift.py
# see also https://github.com/wrobstory/pgshift
import gzip
from io import StringIO, BytesIO
from functools import wraps
import boto
from sqlalchemy import MetaData
from pandas import DataFrame
from pandas.io.sql import SQLTable, pandasSQL_builder
@josesaribeiro
josesaribeiro / airflowPostgresqlInstall.sh
Created September 26, 2019 02:55 — forked from cronosnull/airflowPostgresqlInstall.sh
Install Airflow on a new Ubuntu server 18.04
#!/bin/bash
USUARIO_SO="$(whoami)"
ANACONDA_URL="https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh"
_DB_PASSWORD="la contraseña"
_IP=$(hostname -I | cut -d' ' -f1)
while getopts "a:p:h" opt; do
case $opt in
a) ANACONDA_URL="$OPTARG";;
p) _DB_PASSWORD="$OPTARG";;
h) cat <<EOF
@josesaribeiro
josesaribeiro / gzip_s3_and_json_py3.py
Created October 11, 2019 03:56 — forked from a-hisame/gzip_s3_and_json_py3.py
To use gzip file between python application and S3 directly for Python3
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''To use gzip file between python application and S3 directly for Python3.
Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e
'''
import io
import gzip
import json
@josesaribeiro
josesaribeiro / 0_register_planet.sql
Created October 17, 2019 03:51 — forked from mojodna/0_register_planet.sql
Sample OSM Athena queries
--
-- This will register the "planet" table within your AWS account
--
CREATE EXTERNAL TABLE planet (
id BIGINT,
type STRING,
tags MAP<STRING,STRING>,
lat DECIMAL(9,7),
lon DECIMAL(10,7),
nds ARRAY<STRUCT<ref: BIGINT>>,
@josesaribeiro
josesaribeiro / query_athena.py
Created November 26, 2019 18:48 — forked from sysboss/query_athena.py
SQL Query Amazon Athena using Python
#!/usr/bin/env python3
#
# Query AWS Athena using SQL
# Copyright (c) Alexey Baikov <sysboss[at]mail.ru>
#
# This snippet is a basic example to query Athen and load the results
# to a variable.
#
# Requirements:
# > pip3 install boto3 botocore retrying