Duke University - Durham, North Carolina
2024-02-28, 10:05 - 11:20 AM ET
Class visit: what is investigative data journalism?
NICAR 2024 - Baltimore, Maryland
2024-03-08, 9:00 - 10:00 AM ET
Workshop: Finding and using undocumented APIs
2023-07-19
This is my last week at The Markup. It’s been a true privilege to practice and produce impactful hypothesis-driven journalism with first-class journalists over the past four years.
In year one of publication, Adrianne Jeffries, Sam Morris, Evelyn Larrubia and I measured Google’s self-preferential search results using a method adapted from the life sciences. Our findings were cited in congressional hearing on Big Tech and antitrust.
Aaron Sankin, Sam Morris, Evelyn Larrubia and I found that Google blocked advertisers from finding YouTube videos related to Black Lives Matter and other [social justice phrases](https://themarkup.org/google-the-giant/2021/04/09/google-blocks-advertisers-from-targeting-black-lives-mat
Tow Tea @ The Tow Center - New York, New York
2023-02-17, 5:00 - 6:30 PM ET
Workshop: Finding and using undocumented APIs
Net Inclusion - San Antonio, Texas
2023-03-01, 2:30 - 3:30 PM CT
Panel: Advancing Digital Inclusion Data Quality, Tools, and Applications
Co-paneling with David Keyes, Christine Parker, and Ryan Palmer
numpy | |
tqdm | |
pdf2image | |
opencv-python | |
pytesseract | |
Pillow |
Machine Bias - Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner (2016)
Gender Shades - Joy Buolamwini and Timnit Gebru (2018)
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor - Virginia Eubanks (2018)
How We Analyzed Google's Search Results - Leon Yin and Adrianne Jeffries (2020)
def value_counts(df: pd.DataFrame, | |
col: str, | |
*args, **kwargs) -> pd.DataFrame: | |
""" | |
For a DataFrame (`df`): display normalized (percentage) | |
`value_counts(normalize=True)` and regular counts | |
`value_counts()` for a given `col`. | |
""" | |
count = df[col].value_counts(*args, **kwargs).to_frame(name='count') | |
perc = df[col].value_counts(normalize=True, *args, **kwargs) \ |
import json | |
fn = 'notebook.ipynb' | |
notebook = json.load(open(fn)) | |
notebook.keys() | |
for cell in notebook['cells']: | |
if cell['cell_type'] == "markdown": | |
for sent in cell['source']: | |
if sent == '\n': |
""" | |
A simple script to make a Markdown table for a data dictionary (assumes you just have a column name and description). | |
""" | |
import pandas as pd | |
col2description = { | |
"Name": "What you can call me", | |
"Id": "The identifier", | |
"Nickname": "Do you have to ask?" |