Install Google Cloud SDK. What's the version you have?
$ gcloud --version
> Google Cloud SDK 368.0.0
> bq 2.0.72
> core 2022.01.07
> gsutil 5.6
Create an account in Google Cloud and create a project. https://console.cloud.google.com/getting-started?project=xyz123&authuser=1
Now install terraform and go to the terraform directory (week_1_basics_n_setup/1_terraform_gcp/terraform
)
After that, run
terraform init
terraform plan
terraform apply
Apply the plan and copy the output to the form
$ yes
> google_bigquery_dataset.dataset: Creating...
> google_storage_bucket.data-lake-bucket: Creating...
> google_storage_bucket.data-lake-bucket: Creation complete after 2s [id=dtc_data_lake_de-zoomcamp-xxxxxx]
> google_bigquery_dataset.dataset: Creation complete after 3s [id=projects/de-zoomcamp-xxxxxx/datasets/trips_data_all]
> Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
How many taxi trips were there on January 15?
SELECT count(*) from yellow_taxi_data
WHERE tpep_pickup_datetime BETWEEN '2021-01-15 00:00:00' AND '2021-01-15 23:23:59'
AND tpep_dropoff_datetime BETWEEN '2021-01-15 00:00:00' AND '2021-01-15 23:23:59';
Result: 52255 trips
. However, this accounts for all trips starting and ending on Jan 15 2021, not
necassaily the ones began and completed on Jan 15. I think I would need to create a subquery of dropoffs from the
pickup data on the 15 to know how many FULL trips were completed on the 15th (and 15th only).
Find the largest tip for each day.
SELECT tpep_dropoff_datetime::date AS dropoff_date, MAX(tip_amount)
from yellow_taxi_data
WHERE tpep_dropoff_datetime::date BETWEEN '2021-01-01' AND '2021-01-31'
GROUP BY tpep_dropoff_datetime::date
ORDER BY tpep_dropoff_datetime::date ASC;
On which day it was the largest tip in January?
2021-01-20, $1140.44
What was the most popular destination for passengers picked up in central park on January 14?
SELECT "DOLocationID", count("DOLocationID") as destination_count
from yellow_taxi_data
WHERE "PULocationID" = 43
AND tpep_pickup_datetime::date = '2021-01-14'
GROUP BY "DOLocationID"
ORDER BY destination_count DESC;
DOLocationID = 237, destination_count = 97
Enter the district name (not id)
Upper East Side South
What's the pickup-dropoff pair with the largest
average price for a ride (calculated based on total_amount
)?
PULocationID = 4, DOLocationID = 265, average price = 2292.4
Alphabet City / Unknown