Skip to content

Instantly share code, notes, and snippets.

View pmlandwehr's full-sized avatar
💭
this is a silly feature for a software repository

Peter M. Landwehr pmlandwehr

💭
this is a silly feature for a software repository
View GitHub Profile
@pmlandwehr
pmlandwehr / Demo Notes.ipynb
Created September 25, 2014 02:41
IPython notebook transcription of Pandas Demo
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pmlandwehr
pmlandwehr / physical_environment.md
Last active August 29, 2015 14:08
Physical Environment Codes

Physical Environment

The following information type categories all include a description and/or comparative data regarding the hazard agent, e.g. location of a fire or level of a river. These categories also include information about physical aspects of the affected area, e.g. weather, geographical details, characteristics of city/county borders and similar information.

1. General Area Information (GAI)

Tweets that provide geographic or logistical information about areas under threat or experiencing disaster.

Disaster Data Set Example Tweet
OK09 Geo-note for our non-Oklahoma friends: the fires around Velma-Ratliff City-Fox-Loco-Meridian are in oil producing regions. #OKFires
RR09 @skangus GF built permanent levees after 97 flood of the Red. Difference was they declared entire neighborhoods greenspace to do it.
@pmlandwehr
pmlandwehr / linguistics_one_sheet.md
Created November 7, 2014 00:35
Linguistics One-Sheet for Coding

Linguistics In Practice Cheat Sheet

Well, in practice in the context of Twitter, anyway. These are the primary concepts that Vieweg considers to be useful.

Pragmatics

For Vieweg, pragmatics wraps the idea of “meaning in context”. Tweets are situated in a context, real or imagined, with the tweeter’s followers. A local, tweeting to other locals, may reference events and language that are esoteric and confined to their locality. A celebrity, tweeting to a fan base, may confine their tweets to the banal and PR-heavy. People can participate in many communities simultaneously on Twitter, and this can be hard to disentangle.

Background knowledge

When speaking to each other within a particular domain, twitterers will often reference information that assumes that other individuals have particular familiarity with that context and the history of the current situation. (E.g. a reference to an old flood when discussing a new flood.)

Markedness

@pmlandwehr
pmlandwehr / transformer.py
Last active August 29, 2015 14:17
Simple text stripper
def text_transformer(text, trim_period=True):
stripped = text.lower().strip().replace(', ', ',')
if trim_period:
stripped = stripped.rstrip('.').rstrip()
if len(stripped.split()) == 1:
return stripped
return ' '.join(stripped.split())
@pmlandwehr
pmlandwehr / json_from_streamcorpus.py
Created March 26, 2015 23:27
Convert Streamcorpus objects saved as plain text to lists of JSON objects
import codecs
import simplejson as json
import os
import sys
def stream_entry_str_to_json(entry_str):
entry_str = entry_str.replace(' {', ': {')
fields = [x.strip() for x in entry_str.split('\t\n')]
for i in range(len(fields)-1):
@pmlandwehr
pmlandwehr / TREC Tweet Graphs.ipynb
Created April 12, 2015 05:43
Some graphs of the TREC Tweet Corpus (evolving)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pmlandwehr
pmlandwehr / sandy_background.md
Created August 1, 2015 22:46
Background information for labelling tweets related to Hurricane Sandy

Hurricane Sandy

Basic History

A brief timeline

  • October 24: Hurricane Sandy, south of Kingston in Jamaica, begins to move north.
  • October 29, 7 AM: Sandy reaches peak intensity
  • October 29, 6:30 PM: Sandy makes landfall near Brigantine, NJ and starts moving west-northwest.
  • October 31, 7 AM: Sandy breaks up over Pennsylvania.

Data Notes

The data in the Sandy collection comes from the Northeast, primarily the area around Manhattan. It runs from October 25, when Hurricane sandy was in the news and states were preparing for its onslaught, through November 3 (though data from this late period are relatively sparse.) As such, the tweets in the Sandy data should cover both disaster preparation and some reports of cleanup in the aftermath of the affair.

@pmlandwehr
pmlandwehr / built_environment.md
Last active September 25, 2015 02:26
Built Environment Codes

Built Environment

The following information type categories all include a description of ways in which buildings, infrastructural components such as roads and bridges, property and other structures are affected by the mass emergency.

Remember that some degree of specificity is important! When reviewing tweets in the context of a disaster, scale matters. For instance, we know that the Colorado wildfires devastated all of Colorado. As such, tweets that say that Colorado is being devastated by wildfires do not provide much benefit. In contrast, tweets that say the a particular neighborhood in Colorado Springs is on fire provide both a relatively precise location and a particular kind of damage. The more precise the better.

1. Damage (D)

Tweets that provide information or allude to information about a structure facility or property that has suffered from a hazard. This can be for public or private structures and property of any variety. Natural features, such as trees, are covered by Damage as we

@pmlandwehr
pmlandwehr / haiyan_background.md
Last active September 26, 2015 20:55
Background information for labelling tweets related to Typhoon Haiyan/Yolanda

Typhoon Haiyan

Basic History

Basic timeline

  • November 2: The pressure systems that will become Haiyan are first noted by the Japan Meteorological Agency to the southeast of Micronesia.
  • November 5: Haiyan rapidly intensifies and is classified as a typhoon.
  • November 7: Haiyan has continued building as it moved westward, and at 8:40 PM UST it made landfall at Guiuan on East Samar. It makes three additional landfalls as it crosses the Philippines.
  • November 8: Haiyan leaves the islands, weakened, and still moving west.
  • November 11: Haiyan breaks up over China.

General notes

@pmlandwehr
pmlandwehr / wildfires_background.md
Last active November 22, 2015 18:51
Background information for labelling tweets related to the 2012 Colorado Wildfires

The 2012 Colorado Wildfire Season

Basic History

General Description

The 2012 wildfire season ran from March through July, and is considered one of the worst that Colorado has experienced in recent memory. A number of large and small fires rampaged over the countryside; I’ve found counts of both twelve and sixteen large fires damaging significant acreage reported by different news sources, and the number of small fires is even larger.

The Waldo Canyon Fire, which is most prominent in the data, began on June 23 about four miles northwest of Colorado Springs. As it expanded, several local towns were evacuated. The fire continued to expand over the next several days, and on June 26 Mayor Steve Bach ordered that Colorado Springs be evacuated. The fire spread to the city, and by the early morning of the 27th there were estimates that about 300 homes had been destroyed. Firefighters continued to work against the blaze, and on June 29th President Obama visited Colorado to discuss the problem.

Data