Skip to content

Instantly share code, notes, and snippets.

View jsoma's full-sized avatar

Jonathan Soma jsoma

View GitHub Profile
@jsoma
jsoma / Relatively modern relatively fancy OCR on PDFs.ipynb
Created June 28, 2024 02:24
How to use pdfminer.six, PaddleOCR and OpenAI's GPT to OCR and extract text from PDFs and save them into a CSV (or Excel) file for later analysis.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# 1. Rename the folder
Rename-Item -Path "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-license-and-violations-data-with-browser-automation-tools" -NewName "20240621-friday-scraping-licenses"
# 2. Download and extract the ZIP file
Invoke-WebRequest -Uri "https://github.com/jsoma/ire24-scraping/archive/refs/heads/main.zip" -OutFile "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-licenses\main.zip"
Expand-Archive -Path "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-licenses\main.zip" -DestinationPath "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-licenses" -Force
Remove-Item -Path "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-licenses\main.zip"
# 3. Move batch file to desktop
Move-Item -Path "C:\Users\nmradmin\Desktop\hands_on_classes\20240621-friday-scraping-licenses\ire24-scraping-main\license-scraping.bat" -Destination "C:\Users\nmradmin\Desktop\license-scraping.bat"
@jsoma
jsoma / Code.gs
Created March 6, 2024 11:36
Tiny little script to help you validate LLM responses in Google Sheets
function onOpen() {
const ui = SpreadsheetApp.getUi();
// Adds a custom menu to the Google Sheets UI
ui.createMenu('Checking helper')
.addItem('Create Sample', 'showStratificationPrompt')
.addToUi();
}
function showStratificationPrompt() {
const ui = SpreadsheetApp.getUi();
@jsoma
jsoma / index.html
Created December 2, 2023 20:12
Templates for auto-updating viz website
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>The Bad Air Website</title>
<style>
body {
margin: 0;
background-color: aliceblue;
}
@jsoma
jsoma / README.md
Last active February 18, 2024 16:17
I promise the the command line is fun! It can be awful, sure, but also fun.

The command line is fun, I promise!

Let's do some fun stuff on the command line! This is a lot of "figuring out what to do" as opposed to "applying skills we learned in class." It will definitely make you feel uncomfortable and like you don't know anything, but that's okay!

Do these in any order you want. Be sure to check out the very last one, it's crazy.

A general tip: When you're searching around on the internet, pip install has a hundred ways of being talked about. python3 -m pip install and pipx install and pip3 install anything that vaguely looks like that can probably just be substituted with pip install.

ChatGPT can be really helpful for figuring out the specific command-line flags and arguments you need to get your CLI tools operating how you want them to. Unless you're using ffmpeg and convert every day of your life, memorizing exactly how these command-line tools work is prrrrrobably not the best use of your brainpower.

country continent life_expectancy population gdp gdp_per_capita
Afghanistan Asia 54.863 22856302 15153728226 663
Albania Europe 74.2 3071856 12886435920 4195
Algeria Africa 68.963 30533827 155661450046 5098
Angola Africa 45.234 13926373 34063908358 2446
Antigua and Barbuda N. America 73.544 77656 989182128 12738
Argentina S. America 73.822 36930709 390394524839 10571
Armenia Europe 71.494 3076098 6502871172 2114
Australia Oceania 79.93 19164351 560384787591 29241
Austria Europe 78.33 8004712 256214821696 32008

Command line fun hints and tips

Part 2: Command line data analysis

Installation

NOTE: Windows users will use scoop install instead of brew. Even if the docs say use choco, scoop is a better package manager than chocolatey!!!

Documentation + examples

@jsoma
jsoma / README.md
Created April 8, 2022 13:32
Use D3/JavaScript with ai2html: animating (and un-animating)

Maybe you styled a map in illustrator to look a certain way - the markers all have nice unique colors, opacity, etc etc. And then you plugged it into your scrollytelling piece: looking good so far.

On step 2 of your piece you want to change those markers: make them all yellow! Highlight them! #FFF880 is my favorite yellow highlight color. That's fine, normally you'd just do this to transition to the highlight color:

d3.selectAll("[data-name='africa'] path")
    .transition()
    .attr('fill', '#FFF880')