Shawn Graham shawngraham

## HTML2DTM.r
# get data
setwd("C:/Downloads/html") # this folder has only the HTML files
html <- list.files()

# load packages
library(tm)
library(RCurl)
library(XML)
# get some code from github to convert HTML to text
writeChar(con="htmlToText.R", (getURL(ssl.verifypeer = FALSE, "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R")))

## pastec-tutorial.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                drjwbaker
                / pastec-tutorial.md
            
            
              Last active
              August 31, 2016 13:22
            
              
                Getting Pastec up and running, 8 August 2016
              
          
    Getting Pastec up and running

Pastec is an open source index and search engine for image recognition. This is how I got it working with lots of help from the hard work of Ryan Baumann, Shawn Graham and Matthew Lincoln.
Installation

Either install Ubuntu 14.04.5 as an operating system, or get a virtual machine from osboxes. Fire up with VirtualBox. Ensure VM is connected to the network (Settings>Network).
Install Pastec by following the documentation. Be sure to download and unzip visualWordsORB.dat into the build subdirectory of Pastec.

  
## classify_images.py
from __future__ import absolute_import, division, print_function

"""

This is a modification of the classify_images.py
script in Tensorflow. The original script produces
string labels for input images (e.g. you input a picture
of a cat and the script returns the string "cat"); this
modification reads in a directory of images and
generates a vector representation of the image using

## PoetryBot.rmd
---
title: "Programming Literary Bots"
author: "Ryan Cordell"
date: "3/12/2017"
output: html_document
---

## Acknowledgements

This version of my twitterbot assignment was adapted from [an original written in Python](https://www.dropbox.com/s/r1py3zazde2turk/Trendingmore.py?dl=0), which itself adapted code written by Mark Sample. That orginal bot tweeted (I've since stopped it) at [Quoth the Ravbot](https://twitter.com/Quoth__the). The current version owes much to advice and code borrowed from two colleagues at Northeastern University: Jonathan Fitzgerald and Benjamin Schmidt.

## splitAudio.py

#!/usr/bin/python
## Split audio files into chunks
## Daniel Pett 1/5/2020
__author__ = 'portableant'
## Tested on Python 2.7.16 - yes I know I need to upgrade.

import argparse
import os
import speech_recognition as sr

## renderSite.R
# This script builds on Aleszu Bajak's excellent
# [tutorial on building a course website using R Markdown and Github pages](http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/).
# I was excited about the concept but wanted to automate a few of the production steps: namely generating the HTML files
# for the site from the RMD pages (which Aleszu describes doing one-by-one) and generating the site navigation menu,
# which Aleszu handcodes in the `_site.yml` file. This script should automate both processes, though it may have some quirks
# unique to my setup that you'd want to tweak to fit your own. It's likely more loquacious than necessary as well, so feel free
# to condense as you can. Ideally, each time you make updates to your RMD files you can run this script to generate updated HTML
# pages and a new `_site.yml`. Then commit changes to Github and you're up and running!

# Once you've got everything configured for your own site below, you should be able to run `source('rend

## R2MALLET.r
# Set working directory
dir <- "C:\\" # adjust to suit
setwd(dir)

# configure variables and filenames for MALLET
## here using MALLET's built-in example data and
## variables from http://programminghistorian.org/lessons/topic-modeling-and-mallet

# folder containing txt files for MALLET to work on
importdir <- "C:\\mallet-2.0.7\\sample-data\\web\\en"

## asciinator.py
# This line imports the modules we will need. The first is the sys module used
# to read the command line arguments. Second the Python Imaging Library to read
# the image and third numpy, a linear algebra/vector/matrix module.
import sys; from PIL import Image; import numpy as np

# This is a list of characters from low to high "blackness" in order to map the
# intensities of the image to ascii characters
chars = np.asarray(list(' .,:;irsXA253hMHGS#9B&@'))

# Check whether all necessary command line arguments were given, if not exit and show a

## image_resize.py
#!/usr/bin/env python

import Image
import os, sys

def resizeImage(infile, dir, output_dir="", size=(1024,768)):
     outfile = os.path.splitext(infile)[0]+"_resized"
     extension = os.path.splitext(infile)[1]

     if extension.lower()!= ".jpg":

## tweet-edits-to-archaeology-articles.R


# get recent changes from wikipedia
library(rvest)
n_changes <- 5000
recent_changes_url <- paste0("https://en.wikipedia.org/w/index.php?title=Special:RecentChanges&limit=", n_changes , "&days=1")

# connect to website
html <- read_html(recent_changes_url)
	# get data
	setwd("C:/Downloads/html") # this folder has only the HTML files
	html <- list.files()

	# load packages
	library(tm)
	library(RCurl)
	library(XML)
	# get some code from github to convert HTML to text
	writeChar(con="htmlToText.R", (getURL(ssl.verifypeer = FALSE, "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R")))
	from __future__ import absolute_import, division, print_function

	"""

	This is a modification of the classify_images.py
	script in Tensorflow. The original script produces
	string labels for input images (e.g. you input a picture
	of a cat and the script returns the string "cat"); this
	modification reads in a directory of images and
	generates a vector representation of the image using
	---
	title: "Programming Literary Bots"
	author: "Ryan Cordell"
	date: "3/12/2017"
	output: html_document
	---

	## Acknowledgements

	This version of my twitterbot assignment was adapted from [an original written in Python](https://www.dropbox.com/s/r1py3zazde2turk/Trendingmore.py?dl=0), which itself adapted code written by Mark Sample. That orginal bot tweeted (I've since stopped it) at [Quoth the Ravbot](https://twitter.com/Quoth__the). The current version owes much to advice and code borrowed from two colleagues at Northeastern University: Jonathan Fitzgerald and Benjamin Schmidt.

	#!/usr/bin/python
	## Split audio files into chunks
	## Daniel Pett 1/5/2020
	__author__ = 'portableant'
	## Tested on Python 2.7.16 - yes I know I need to upgrade.

	import argparse
	import os
	import speech_recognition as sr
	# This script builds on Aleszu Bajak's excellent
	# [tutorial on building a course website using R Markdown and Github pages](http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/).
	# I was excited about the concept but wanted to automate a few of the production steps: namely generating the HTML files
	# for the site from the RMD pages (which Aleszu describes doing one-by-one) and generating the site navigation menu,
	# which Aleszu handcodes in the `_site.yml` file. This script should automate both processes, though it may have some quirks
	# unique to my setup that you'd want to tweak to fit your own. It's likely more loquacious than necessary as well, so feel free
	# to condense as you can. Ideally, each time you make updates to your RMD files you can run this script to generate updated HTML
	# pages and a new `_site.yml`. Then commit changes to Github and you're up and running!

	# Once you've got everything configured for your own site below, you should be able to run `source('rend
	# Set working directory
	dir <- "C:\\" # adjust to suit
	setwd(dir)

	# configure variables and filenames for MALLET
	## here using MALLET's built-in example data and
	## variables from http://programminghistorian.org/lessons/topic-modeling-and-mallet

	# folder containing txt files for MALLET to work on
	importdir <- "C:\\mallet-2.0.7\\sample-data\\web\\en"
	# This line imports the modules we will need. The first is the sys module used
	# to read the command line arguments. Second the Python Imaging Library to read
	# the image and third numpy, a linear algebra/vector/matrix module.
	import sys; from PIL import Image; import numpy as np

	# This is a list of characters from low to high "blackness" in order to map the
	# intensities of the image to ascii characters
	chars = np.asarray(list(' .,:;irsXA253hMHGS#9B&@'))

	# Check whether all necessary command line arguments were given, if not exit and show a
	#!/usr/bin/env python

	import Image
	import os, sys

	def resizeImage(infile, dir, output_dir="", size=(1024,768)):
	outfile = os.path.splitext(infile)[0]+"_resized"
	extension = os.path.splitext(infile)[1]

	if extension.lower()!= ".jpg":


	# get recent changes from wikipedia
	library(rvest)
	n_changes <- 5000
	recent_changes_url <- paste0("https://en.wikipedia.org/w/index.php?title=Special:RecentChanges&limit=", n_changes , "&days=1")

	# connect to website
	html <- read_html(recent_changes_url)