Skip to content

Instantly share code, notes, and snippets.

@tomron
tomron / seasonal_decompose_plotly.py
Last active November 3, 2023 15:14
A nicer seasonal decompose chart using plotly.
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.tools as tls
def plotSeasonalDecompose(
x,
model='additive',
filt=None,
period=None,
two_sided=True,
extrapolate_trend=0,
@tomron
tomron / missleading_plots.py
Created September 15, 2022 10:11
Think outside of the box plot - code accompanying my talk in DataTLV about box plots
nimport numpy as np
import pandas as pd
import sys
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly
import plotly.express as px
import matplotlib.pyplot as plt
@tomron
tomron / parquet_to_json.py
Created November 17, 2016 10:53
Converts parquet file to json using spark
# impor spark, set spark context
from pyspark import SparkContext, SparkConf
from pyspark.sql.context import SQLContext
import sys
import os
if len(sys.argv) == 1:
sys.stderr.write("Must enter input file to convert")
sys.exit()
input_file = sys.argv[1]
@tomron
tomron / csv_to_radar.py
Last active May 31, 2022 18:35
CSV to radar plot - turns csv file into a radar plot that can be exported or shown in the browser
import plotly.graph_objects as go
import plotly.offline as pyo
import pandas as pd
import argparse
import sys
def parse_arguments(args):
@tomron
tomron / __init__.py
Created October 22, 2021 07:31
Other pie - creat a pie chart with the top `n-1` values as separate sectors and other sector for the remaining values
"""
`plotly.express` is a terse, consistent, high-level wrapper around `plotly.graph_objects`
for rapid data exploration and figure generation. Learn more at https://plotly.express/
"""
from __future__ import absolute_import
from plotly import optional_imports
pd = optional_imports.get_module("pandas")
if pd is None:
raise ImportError(
@tomron
tomron / missing_values_read_csv.py
Last active August 15, 2021 06:57
How to deal with non trivial missing values when using pandas read_csv
import pandas as pd
import numpy as np
import time
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/mammographic-masses/mammographic_masses.data"
names = ['BI-RADS', 'Age', 'Shape', 'Margin', 'Density', 'Severity']
def manual_convert():
df = pd.read_csv(url, names=names)
@tomron
tomron / spark_aws_lambda.py
Created February 27, 2016 12:57
Example of python code to submit spark process as an emr step to AWS emr cluster in AWS lambda function
import sys
import time
import boto3
def lambda_handler(event, context):
conn = boto3.client("emr")
# chooses the first cluster which is Running or Waiting
# possibly can also choose by name or already have the cluster id
clusters = conn.list_clusters()
@tomron
tomron / plotly_back_to_back_chart.py
Last active May 31, 2021 18:58
Back to back bar chart with Plotly
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
women_pop = np.array([5., 30., 45., 22.])
men_pop = np.array( [5., 25., 50., 20.])
y = list(range(len(women_pop)))
fig = go.Figure(data=[
go.Bar(y=y, x=women_pop, orientation='h', name="women", base=0),
@tomron
tomron / plotly_bar_chart_links.py
Created November 17, 2020 07:17
Add links to Plotly bar chart
@tomron
tomron / spark_knn_approximation.py
Created November 19, 2015 16:47
A naive approximation of k-nn algorithm (k-nearest neighbors) in pyspark. Approximation quality can be controlled by number of repartitions and number of repartition
from __future__ import print_function
import sys
from math import sqrt
import argparse
from collections import defaultdict
from random import randint
from pyspark import SparkContext