Skip to content

Instantly share code, notes, and snippets.

@ihercowitz
ihercowitz / image_resize.py
Created October 23, 2010 20:19
Python Script to resize all the images, on a given directory, to a 1024x768 JPEG format.
#!/usr/bin/env python
import Image
import os, sys
def resizeImage(infile, dir, output_dir="", size=(1024,768)):
outfile = os.path.splitext(infile)[0]+"_resized"
extension = os.path.splitext(infile)[1]
if extension.lower()!= ".jpg":
@zpea
zpea / html5_video_conv.bash
Created July 21, 2012 03:00
little shell script to convert video files to the various HTML5 video formats/codecs using ffmpeg. Also generates the line for embedding the video using the videoJS plugin for wordpress.
#!/bin/bash
#
# video conversion script for publishing as HTML 5 video, via videojs (with hd button extension)
# 2011 by zpea
# feel free to use as public domain / Creative Commons CC0 1.0 (http://creativecommons.org/publicdomain/zero/1.0/)
#
FFMPEG=/usr/bin/ffmpeg
HD_SUFFIX='_hd'
@benmarwick
benmarwick / R2MALLET.r
Last active April 12, 2021 10:27
R code to operate MALLET entirely from within R. Set variables, send commands to Windows' command console and get MALLET's result back into R for further analysis.
# Set working directory
dir <- "C:\\" # adjust to suit
setwd(dir)
# configure variables and filenames for MALLET
## here using MALLET's built-in example data and
## variables from http://programminghistorian.org/lessons/topic-modeling-and-mallet
# folder containing txt files for MALLET to work on
importdir <- "C:\\mallet-2.0.7\\sample-data\\web\\en"
@benmarwick
benmarwick / HTML2DTM.r
Created February 22, 2013 08:13
Take a folder of HTML files and convert them to a document term matrix for text mining. Includes removal of non-ASCII characters and iterative removal of stopwords
# get data
setwd("C:/Downloads/html") # this folder has only the HTML files
html <- list.files()
# load packages
library(tm)
library(RCurl)
library(XML)
# get some code from github to convert HTML to text
writeChar(con="htmlToText.R", (getURL(ssl.verifypeer = FALSE, "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R")))
@cdiener
cdiener / asciinator.py
Created April 13, 2014 03:11
asciinator.py now with documentation
# This line imports the modules we will need. The first is the sys module used
# to read the command line arguments. Second the Python Imaging Library to read
# the image and third numpy, a linear algebra/vector/matrix module.
import sys; from PIL import Image; import numpy as np
# This is a list of characters from low to high "blackness" in order to map the
# intensities of the image to ascii characters
chars = np.asarray(list(' .,:;irsXA253hMHGS#9B&@'))
# Check whether all necessary command line arguments were given, if not exit and show a
@benmarwick
benmarwick / tweet-edits-to-archaeology-articles.R
Last active April 3, 2023 16:35
Using R with wikipedia for various things
# get recent changes from wikipedia
library(rvest)
n_changes <- 5000
recent_changes_url <- paste0("https://en.wikipedia.org/w/index.php?title=Special:RecentChanges&limit=", n_changes , "&days=1")
# connect to website
html <- read_html(recent_changes_url)
@karpathy
karpathy / min-char-rnn.py
Last active May 10, 2024 18:13
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
@drjwbaker
drjwbaker / pastec-tutorial.md
Last active August 31, 2016 13:22
Getting Pastec up and running, 8 August 2016

Getting Pastec up and running

Pastec is an open source index and search engine for image recognition. This is how I got it working with lots of help from the hard work of Ryan Baumann, Shawn Graham and Matthew Lincoln.

Installation

Either install Ubuntu 14.04.5 as an operating system, or get a virtual machine from osboxes. Fire up with VirtualBox. Ensure VM is connected to the network (Settings>Network).

Install Pastec by following the documentation. Be sure to download and unzip visualWordsORB.dat into the build subdirectory of Pastec.

@duhaime
duhaime / classify_images.py
Last active July 13, 2018 12:15
Image to Vec
from __future__ import absolute_import, division, print_function
"""
This is a modification of the classify_images.py
script in Tensorflow. The original script produces
string labels for input images (e.g. you input a picture
of a cat and the script returns the string "cat"); this
modification reads in a directory of images and
generates a vector representation of the image using
@rccordell
rccordell / PoetryBot.rmd
Last active February 5, 2019 01:50 — forked from bmschmidt/words.R
---
title: "Programming Literary Bots"
author: "Ryan Cordell"
date: "3/12/2017"
output: html_document
---
## Acknowledgements
This version of my twitterbot assignment was adapted from [an original written in Python](https://www.dropbox.com/s/r1py3zazde2turk/Trendingmore.py?dl=0), which itself adapted code written by Mark Sample. That orginal bot tweeted (I've since stopped it) at [Quoth the Ravbot](https://twitter.com/Quoth__the). The current version owes much to advice and code borrowed from two colleagues at Northeastern University: Jonathan Fitzgerald and Benjamin Schmidt.