Skip to content

Instantly share code, notes, and snippets.

@sgouda0412
sgouda0412 / prediction_example.py
Created June 13, 2022 06:56 — forked from fhuszar/prediction_example.py
This is an example solution to the London Big Data Hackathon Data Science Challenge organised by Data Sceince London on the weekend 13-14 April 2013.
#!/usr/bin/python
# -*- coding: utf8 -*-
# SAMPLE SUBMISSION TO THE BIG DATA HACKATHON 13-14 April 2013 'Influencers in a Social Network'
# .... more info on Kaggle and links to go here
#
# written by Ferenc Huszár, PeerIndex
from sklearn import linear_model
from sklearn.metrics import auc_score
import pandas as pd
import os
import shutil
def extract(path: str = "s3://my_bucket_name/file0.parquet") -> pd.DataFrame:
df = pd.read_parquet(path)
return df
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import missingno
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
## Importing libraries
import numpy as np
import pandas as pd
import datetime as dt
import seaborn as sns
import matplotlib.pyplot as plt
#For inline Chart Display
%matplotlib inline
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from pyspark.sql import SQLContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
# EXTRACT: Reading parquet data
Files for amazon redshift sql queries with aws lambda
@sgouda0412
sgouda0412 / Google Colab SSH
Created July 9, 2022 13:55 — forked from yashkumaratri/Google Colab SSH
SSH into google colab
#CODE
#Generate root password
import random, string
password = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(20))
#Download ngrok
! wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip -qq -n ngrok-stable-linux-amd64.zip
#Setup sshd
@sgouda0412
sgouda0412 / basic.csv
Created July 15, 2022 15:54 — forked from Tvkoushik/basic.csv
Basic Unix Commands
COMMAND FUNCTIONALITY
ls Lists all files and directories in the present working directory
ls -R Lists files in sub-directories as well
ls -a Lists hidden files as well
ls -al Lists files and directories with detailed information.
ls 'path' | more Show listing one screen at a time
cd or cd ~ Navigate to HOME directory
cd .. Move one level up
cd To change to a particular directory
cd / Move to the root directory
@sgouda0412
sgouda0412 / advanced.csv
Created July 15, 2022 15:54 — forked from Tvkoushik/advanced.csv
Advanced Unix Commands
COMMAND FUNCTIONALITY
grep -v ‘^$’ filename > new_filename Remove Blank Lines in a file
ls -l | grep '^-' | awk '/^-/ {if ($5 !=0 ) print $9 }' Display zero byte size files
sed 's/honey/pasta/n' < filename Replace the nth occurrence of the word 'honey' with 'pasta' in a file
echo 'string' | tr [a-z] [A-Z] command to convert a string from lower case to upper case
grep -i 'search string' filename Search for a given string in a file (case in-sensitive search)
cal 03 2022 Display the calendar for the month march in the year 2022
find -atime n -type f List the files that are accessed n days ago in the current directory
find -mtime n -type f List the files that were modified n days ago in the current directory
find -ctime n -type f List the files that were changed n days ago in the current directory
@sgouda0412
sgouda0412 / ml_pipeline.py
Created July 22, 2022 12:59 — forked from btphan95/ml_pipeline.py
example ML pipeline on AirFlow
from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
'owner': 'Binh Phan',