Skip to content

Instantly share code, notes, and snippets.

@kingkastle
kingkastle / pandas_to_spark.py
Created January 28, 2020 13:47 — forked from zaloogarcia/pandas_to_spark.py
Script for converting Pandas DF to Spark's DF
from pyspark.sql.types import *
# Auxiliar functions
# Pandas Types -> Sparks Types
def equivalent_type(f):
if f == 'datetime64[ns]': return DateType()
elif f == 'int64': return LongType()
elif f == 'int32': return IntegerType()
elif f == 'float64': return FloatType()
else: return StringType()

Summary:

This project is part of the Udacity Data Analyst Nanodegree Project 5, with the aim of sharing knowledge with others. This is a dimplejs example based on the flight data from RITA that you can find at:

http://stat-computing.org/dataexpo/2009/the-data.html

The graph represents the evolution in terms of number of flights per year versus total distance accumulated per airline per year for the period between 1988 to 2008.

Check it out in b.locks: http://bl.ocks.org/kingkastle/raw/7188565a417968331fda/