Skip to content

Instantly share code, notes, and snippets.

@mollybrown
Last active January 24, 2022 07:51
Show Gist options
  • Save mollybrown/107b78def5d4d347d064ccbcd626560d to your computer and use it in GitHub Desktop.
Save mollybrown/107b78def5d4d347d064ccbcd626560d to your computer and use it in GitHub Desktop.
Week 1 Homework for DE-Zoomcamp

Question 1. Google Cloud SDK:

Install Google Cloud SDK. What's the version you have?

$ gcloud --version

> Google Cloud SDK 368.0.0
> bq 2.0.72
> core 2022.01.07
> gsutil 5.6

Google Cloud account

Create an account in Google Cloud and create a project. https://console.cloud.google.com/getting-started?project=xyz123&authuser=1

Question 2. Terraform

Now install terraform and go to the terraform directory (week_1_basics_n_setup/1_terraform_gcp/terraform)

After that, run

  • terraform init
  • terraform plan
  • terraform apply

Apply the plan and copy the output to the form

$ yes

> google_bigquery_dataset.dataset: Creating...
> google_storage_bucket.data-lake-bucket: Creating...
> google_storage_bucket.data-lake-bucket: Creation complete after 2s [id=dtc_data_lake_de-zoomcamp-xxxxxx]
> google_bigquery_dataset.dataset: Creation complete after 3s [id=projects/de-zoomcamp-xxxxxx/datasets/trips_data_all]

> Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Question 3. Count records

How many taxi trips were there on January 15?

SELECT count(*) from yellow_taxi_data
WHERE tpep_pickup_datetime BETWEEN '2021-01-15 00:00:00' AND '2021-01-15 23:23:59'
AND tpep_dropoff_datetime BETWEEN '2021-01-15 00:00:00' AND '2021-01-15 23:23:59';

Result: 52255 trips. However, this accounts for all trips starting and ending on Jan 15 2021, not necassaily the ones began and completed on Jan 15. I think I would need to create a subquery of dropoffs from the pickup data on the 15 to know how many FULL trips were completed on the 15th (and 15th only).

Question 4. Average

Find the largest tip for each day.

SELECT tpep_dropoff_datetime::date AS dropoff_date, MAX(tip_amount)
from yellow_taxi_data
WHERE tpep_dropoff_datetime::date BETWEEN '2021-01-01' AND '2021-01-31'
GROUP BY tpep_dropoff_datetime::date
ORDER BY tpep_dropoff_datetime::date ASC;

On which day it was the largest tip in January? 2021-01-20, $1140.44

Question 5. Most popular destination

What was the most popular destination for passengers picked up in central park on January 14?

SELECT "DOLocationID", count("DOLocationID") as destination_count
from yellow_taxi_data
WHERE "PULocationID" = 43
AND tpep_pickup_datetime::date = '2021-01-14'
GROUP BY "DOLocationID"
ORDER BY destination_count DESC;

DOLocationID = 237, destination_count = 97

Enter the district name (not id) Upper East Side South

Question 6.

What's the pickup-dropoff pair with the largest average price for a ride (calculated based on total_amount)?

PULocationID = 4, DOLocationID = 265, average price = 2292.4 Alphabet City / Unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment