Skip to content

Instantly share code, notes, and snippets.

View mermelstein's full-sized avatar

Dani Mermelstein mermelstein

  • New York, NY
View GitHub Profile
@mermelstein
mermelstein / spotify_json_to_df.R
Created January 3, 2024 20:24
convert Spotify json data to a dataframe
library(jsonlite)
library(lubridate)
# specify local path to the downloaded JSON files
path <- "~/Downloads/Spotify Extended Streaming History/"
# get a list of all JSON files in the directory
json_files <- list.files(path, pattern = "*.json", full.names = TRUE)
# initialize an empty list to store the data
@mermelstein
mermelstein / twitter_data_to_df.R
Last active January 5, 2024 20:54
twitter data to df
library(jsonlite)
library(brio)
# where is the archive directory
dir_path <- "~/Downloads/twitter-archive/data"
# file in /data with the tweets
filename <- "tweets.js"
# get full file path
@mermelstein
mermelstein / Dockerfile
Created April 17, 2024 16:15
uv in Docker
FROM python:3.10
WORKDIR /app
COPY . /app
RUN pip install uv
RUN uv pip install --system --no-cache-dir -r requirements.txt
CMD python -u main.py
@mermelstein
mermelstein / text_to_pdf.py
Created May 3, 2024 00:46
extract text from pdf when the text isn't easy to copy
from PIL import Image
import pytesseract
from pdf2image import convert_from_path
# Convert the PDF to a list of images
images = convert_from_path('path_to_pdf.pdf')
# Process each image with Tesseract
for i, img in enumerate(images):
text = pytesseract.image_to_string(img, lang='eng')