Skip to content

Instantly share code, notes, and snippets.

@raymondberg
Last active July 17, 2017 17:46
Show Gist options
  • Select an option

  • Save raymondberg/24f0f2828e7ba6c7da1933852fedd6cb to your computer and use it in GitHub Desktop.

Select an option

Save raymondberg/24f0f2828e7ba6c7da1933852fedd6cb to your computer and use it in GitHub Desktop.
SQLITE Problem in Python

Project 6 - Basic SQL Querying

This project covers the basic concepts behind querying with Python.

Problem

We have a movie database that's fairly comprehensive. We'd like to explore it a bit with Python. Here are the columns:

movie_title color num_critic_for_reviews movie_facebook_likes duration director_name director_facebook_likes actor_3_name actor_3_facebook_likes actor_2_name actor_2_facebook_likes actor_1_name actor_1_facebook_likes gross genres num_voted_users cast_total_facebook_likes facenumber_in_poster plot_keywords movie_imdb_link num_user_for_reviews language country content_rating budget title_year imdb_score aspect_ratio

This data has been extracted from a Kaggle problem set, and you can set it up yourself using the setup-on-your-own guide.

Resources

You'll be using a few sections of the Python documentation:

You'll also possibly want to look at the SQLITE console syntax so you can run sqlite3 movies.sqlite3 to poke around the data, but it's not necessary to complete the project.

Before you start, create a script with the following lines in it:

import sqlite3
connection = sqlite3.connect('movies.sqlite3')
connection.row_factory = sqlite3.Row
cursor = connection.cursor()

cursor.execute('select * from movies')
row = cursor.fetchone()
print(row["movie_title"])

Problems

The goal is to dump a standard set of output from the database in one pass. See if you can reproduce the following:

The movie where Ringo Starr was actor_1 was:
 ?

The two movies where Ringo Starr was an actor were:
 ?
 ?

These movies all featured Kristen Stewart:
 ?
 ?
 ...

Bill Murray has been in a lot of movies:
 ?
 ?
 ...

I really liked Valarie Pettiford in ?.

Anthony Hopkins has been in a lot of movies: ?, ?, ?, ?

SOO many people have been in a Batman movie. Just look at this list!
 ?
 ?
 ...

The total number of movies in the database is ?.

The highest grossing movie is ? with ? total.

The person who has been in the most movies is ? with ? total movies.

There are ? Superman movies, but there are ? Batman movies!

I just figured out that the Batman movies have grossed ? while Superman movies only grossed ?

The Superman movies gross an average of ? per movie, Batman grosses ? per movie.

Setup Instructions

If you don't already have the SQLite database, here's how to make it yourself.

  1. Download Dataset (https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset)
  2. Install Sqlite3 (brew install sqlite)
  3. Start Sqlite3 (sqlite3)
  4. Open Database File (.open movies.sqlite3
  5. Setup Database (borrowed from http://www.sqlitetutorial.net/sqlite-import-csv/)
DROP TABLE IF EXISTS movies;

CREATE TABLE movies(
"color" TEXT,
"director_name" TEXT,
"num_critic_for_reviews" INTEGER,
"duration" INTEGER,
"director_facebook_likes" INTEGER,
"actor_3_facebook_likes" INTEGER,
"actor_2_name" TEXT,
"actor_1_facebook_likes" INTEGER,
"gross" INTEGER,
"genres" TEXT,
"actor_1_name" TEXT,
"movie_title" TEXT,
"num_voted_users" INTEGER,
"cast_total_facebook_likes" INTEGER,
"actor_3_name" TEXT,
"facenumber_in_poster" INTEGER,
"plot_keywords" TEXT,
"movie_imdb_link" TEXT,
"num_user_for_reviews" INTEGER,
"language" TEXT,
"country" TEXT,
"content_rating" TEXT,
"budget" INTEGER,
"title_year" TEXT,
"actor_2_facebook_likes" INTEGER,
"imdb_score" REAL,
"aspect_ratio" REAL,
"movie_facebook_likes" INTEGER
);
  1. Import Data
.mode csv
.import movie_metadata.csv movies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment