Skip to content

Instantly share code, notes, and snippets.

@ashutoshnanda
Last active November 26, 2015 07:13
Show Gist options
  • Save ashutoshnanda/2f79d6d310dd5c1cc08f to your computer and use it in GitHub Desktop.
Save ashutoshnanda/2f79d6d310dd5c1cc08f to your computer and use it in GitHub Desktop.
This code (in both Python and R) will read from a SQLite database of Hillary Clinton's emails and display basic information about the database.
import pandas as pd
import sqlite3
#Set up connection to the SQLite database
connection = sqlite3.connect('clinton.sqlite')
#Print all tables
print("Tables")
all_tables = pd.read_sql("SELECT name FROM sqlite_master WHERE type = 'table'", connection)
print(all_tables)
#Print information about 'docs' table
docs = pd.read_sql("SELECT * FROM %s" % ("docs"), connection)
print("Column Names")
print(list(docs.columns.values))
print("Number of Rows: %d" % len(docs))
#Clean up connection to the database
connection.close()
library(dplyr)
library(RSQLite)
#Set up connection to the SQLite database
connection <- dbConnect(RSQLite::SQLite(), dbname = "clinton.sqlite")
#Print all tables
print("Tables")
all_tables <- dbListTables(connection)
print(all_tables)
#Print information about 'docs' table
docs <- dbGetQuery(connection, sprintf("SELECT * FROM %s", "docs"))
print("Column Name")
print(colnames(docs))
print(sprintf("Number of Rows: %d", nrow(docs)))
#Clean up connection to the database
dbDisconnect(connection)
Tables
name
0 classreason_doc
1 classreasons
2 docs
3 recipient_doc_cc
4 recipient_doc_from
5 recipient_doc_to
6 recipients
Column Names
['date', 'posted_date', 'pdf_link', 'doc_class', 'message_num', 'case_num', 'declass_date', 'classification', 'reason', 'author', 'id', 'subject', 'body', 'full_path', 'is_document', 'doc_date', 'pdf_url']
Number of Rows: 40737
[1] "Tables"
[1] "classreason_doc" "classreasons" "docs" "recipient_doc_cc"
[5] "recipient_doc_from" "recipient_doc_to" "recipients"
[1] "Column Name"
[1] "date" "posted_date" "pdf_link" "doc_class" "message_num"
[6] "case_num" "declass_date" "classification" "reason" "author"
[11] "id" "subject" "body" "full_path" "is_document"
[16] "doc_date" "pdf_url"
[1] "Number of Rows: 40737"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment