What follows is a technical test for this job offer at CARTO: https://boards.greenhouse.io/cartodb/jobs/705852#.WSvORxOGPUI
Build the following and make it run as fast as you possibly can using Python 3 (vanilla). The faster it runs, the more you will impress us!
Your code should:
- Download this ~2GB file: https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv
- Count the lines in the file
- Calculate the average value of the tip_amount field.
All of that in the most efficient way you can come up with.
That's it. Make it fly!
Method 1
import pandas as pd
import time
data = pd.read_csv("2018_Yellow_Taxi_Trip_Data.csv")
t0 = time.time() # Initial Count
row = (data.shape)[0] # Total Rows
sumColumn = (data["tip_amount"].sum())/row # Mean of "tip_amount" column
t1 = time.time() # Final Count
tiempo = round(t1-t0,0) # Total Time
print("Número de filas: \n", row)
print("Suma Total tip_amount: \n", meanColumn)
print("Tiempo total de ejecución: \n", tiempo)
Method 2
import pandas as pd
import time
t0 = time.time() # Initial Count
data = pd.read_csv("2018_Yellow_Taxi_Trip_Data.csv")
n=0
tipAmount1 = 0
for index, i in data.iterrows():
meanTipAmount = tipAmount1/n # Mean of "tipAmount"
t1 = time.time() # Final Count
tiempo = round(t1-t0,0) # Total Time
print("Suma Total tip_amount: \n", meanTipAmount)
print("Número de filas: \n", n)
print("Tiempo total de ejecución: \n", tiempo)