Leon Yin yinleon

## world-tour-2024.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                yinleon
                / world-tour-2024.md
            
            
              Last active
              June 10, 2024 13:22
            
          
    World Tour 2024

Duke University - Durham, North Carolina

2024-02-28, 10:05 - 11:20 AM ET

Class visit: what is investigative data journalism?

NICAR 2024 - Baltimore, Maryland

2024-03-08, 9:00 - 10:00 AM ET

Workshop: Finding and using undocumented APIs

  
## professional-update-2023-07-19.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yinleon
                / professional-update-2023-07-19.md
            
            
              Last active
              July 19, 2023 17:47
            
              
                Remembering 4 years at The Markup
              
          
    Remembering 4 years at The Markup

2023-07-19
This is my last week at The Markup. It’s been a true privilege to practice and produce impactful hypothesis-driven journalism with first-class journalists over the past four years.
In year one of publication, Adrianne Jeffries, Sam Morris, Evelyn Larrubia and I measured Google’s self-preferential search results using a method adapted from the life sciences. Our findings were cited in congressional hearing on Big Tech and antitrust.
Aaron Sankin, Sam Morris, Evelyn Larrubia and I found that Google blocked advertisers from finding YouTube videos related to Black Lives Matter and other [social justice phrases](https://themarkup.org/google-the-giant/2021/04/09/google-blocks-advertisers-from-targeting-black-lives-mat

  
## world-tour-2023.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yinleon
                / world-tour-2023.md
            
            
              Last active
              May 22, 2023 22:05
            
              
                Workshops and Talks!
              
          
    World Tour 2023

Tow Tea @ The Tow Center - New York, New York

2023-02-17, 5:00 - 6:30 PM ET

Workshop: Finding and using undocumented APIs

Net Inclusion - San Antonio, Texas

2023-03-01, 2:30 - 3:30 PM CT

Panel: Advancing Digital Inclusion Data Quality, Tools, and Applications

Co-paneling with David Keyes, Christine Parker, and Ryan Palmer

  
## requirements.txt
numpy
tqdm
pdf2image
opencv-python
pytesseract
Pillow

## talk:impact_summit_2021_references.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                yinleon
                / talk:impact_summit_2021_references.md
            
            
              Last active
              November 3, 2021 07:59
            
          
    Auditing Algorithms in the Public Interest @ Impact Summit 2021

Machine Bias - Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner (2016)
Gender Shades - Joy Buolamwini and Timnit Gebru (2018)
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor - Virginia Eubanks (2018)
How We Analyzed Google's Search Results - Leon Yin and Adrianne Jeffries (2020)

  
## value_counts.py
def value_counts(df: pd.DataFrame,
                 col: str,
                 *args, **kwargs) -> pd.DataFrame:
    """
    For a DataFrame (`df`): display normalized (percentage)
    `value_counts(normalize=True)` and regular counts
    `value_counts()` for a given `col`.
    """
    count = df[col].value_counts(*args, **kwargs).to_frame(name='count')
    perc = df[col].value_counts(normalize=True, *args, **kwargs) \

## notebook_markdown_to_text.py
import json

fn = 'notebook.ipynb'
notebook = json.load(open(fn))
notebook.keys()

for cell in notebook['cells']:
    if cell['cell_type'] == "markdown":
        for sent in cell['source']:
            if sent == '\n':

## pandas_apply_tips.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yinleon
                / pandas_apply_tips.md
            
            
              Last active
              March 1, 2021 23:43
            
              
                Some tips for Pandas df.appy(). This includes using status bars and multiple CPU cores.
              
          
    Apply in Pandas

Here are some handy helpers for df.apply. You will learn to use a status bar, and multiple cores.
import time
import pandas as pd
Let's use the iris dataset and perform an arbitrary function.

  
## multiprocessing.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yinleon
                / multiprocessing.md
            
            
              Created
              March 1, 2021 21:31
            
          
    Multiprocessing in Pandas

If you need to read many files into one dataframe use this snippet:
from multiprocessing import Pool
from tqdm import tqdm
import pandas as pd

def file_parser_func(fn : str):
    """
 Read a file into a dataframe and return a list of dictionaries


## create_markdown_table.py
"""
A simple script to make a Markdown table for a data dictionary (assumes you just have a column name and description).
"""

import pandas as pd

col2description = {
    "Name": "What you can call me",
    "Id": "The identifier",
    "Nickname": "Do you have to ask?"
	def value_counts(df: pd.DataFrame,
	col: str,
	args, *kwargs) -> pd.DataFrame:
	"""
	For a DataFrame (`df`): display normalized (percentage)
	`value_counts(normalize=True)` and regular counts
	`value_counts()` for a given `col`.
	"""
	count = df[col].value_counts(args, *kwargs).to_frame(name='count')
	perc = df[col].value_counts(normalize=True, args, *kwargs) \
	import json

	fn = 'notebook.ipynb'
	notebook = json.load(open(fn))
	notebook.keys()

	for cell in notebook['cells']:
	if cell['cell_type'] == "markdown":
	for sent in cell['source']:
	if sent == '\n':
	"""
	A simple script to make a Markdown table for a data dictionary (assumes you just have a column name and description).
	"""

	import pandas as pd

	col2description = {
	"Name": "What you can call me",
	"Id": "The identifier",
	"Nickname": "Do you have to ask?"