Jonathan Stray jstray

## gist:fe34b6b7079c6bf15dc1

      
              1 file
            
          
              3 forks
            
          
              1 comment
            
          
              1 star
            
          
                jstray
                / gist:fe34b6b7079c6bf15dc1
            
            
              Last active
              April 26, 2016 20:05
            
              
                Threat Modeling: planning digital security for your story
              
          
    Journalism can be a high-risk activity, and some stories are a lot riskier than others. In a part one we covered the digital security precautions that every journalist should take. If one of your colleagues uses weak passwords or clicks on a phishing link, more sophisticated efforts are wasted. But assuming that everyone you are working with is already up to speed on basic computer security practice, there's a lot more you can do to provide security for a specific, sensitive story.
This work begins with thinking through what it is you have to protect, and from whom. This is called threat modeling and is the first step in any security analysis. The goal is to construct a picture -- in some ways no more than an educated guess -- of what you're up against. There are many ways to do this, but this post is structured around four basic questions.

What do you want to keep private?
Who wants to know?
What can they do to fi


## what-to-do-with-documents.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                jstray
                / what-to-do-with-documents.md
            
            
              Last active
              August 29, 2015 13:57
                — forked from kiostark/what-to-do-with-documents.md
            
          
    You got the documents. Now what?

[omg documents.png]
Congratulations! Your Freedom of Information request finally yielded a big brown envelope in the mail. You are the lucky recipient of a juicy leak. You've managed to scrape all the PDFs from that stone-age government portal. Now all you have to do is the reporting.
Would that it were so easy. Your next steps depend on what you've got and what you're trying to do. You might have one page or one million pages. You could be starting with a tall stack of paper or a CSV file or anything in between. Maybe you already know exactly what you're looking for, or maybe that anonymous tip was maddeningly non-specific.
In the course of my work on the Overview document-mining software I've seen just about every problem that a journalist can have with a document-driven story. These are the tales of unreadable formats, heaps of paper, and late nights reading. This post is organized as a sort of flowchart, a series of questions you can ask

  
## what-to-do-with-documents.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                jstray
                / what-to-do-with-documents.md
            
            
              Last active
              August 29, 2015 13:57
                — forked from kiostark/what-to-do-with-documents.md
            
          
    You got the documents. Now what?

[omg documents.png]
Congratulations! Your Freedom of Information request finally yielded a big brown envelope in the mail. You are the proud owner of a juicy leak. You've managed to scrape all the PDFs from that stone-age government portal. Now all you have to do is the reporting.
In the course of my work on the Overview document-mining software I've seen just about every problem that journalists can have with a document-driven story. These are the stories of unreadable formats, heaps of paper, and late nights reading.
When you're the proud owner of a brand new document dump, the next steps depend on what you've got and what you're trying to do. You might have one page or one million pages. You could be starting with a tall stack of paper or a CSV file or anything in between. Maybe you already know exactly what you're looking for, or maybe that anonymous tip was maddeningly non-specific. This post is organized as a sort of flowchart, a seri

  
## what-to-do-with-documents.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                jstray
                / what-to-do-with-documents.md
            
            
              Last active
              March 10, 2019 22:38
            
              
                You've got the documents, now what?
              
          
    You got the documents. Now what?

[omg documents.png]
Congratulations! Your Freedom of Information request finally yielded a big brown envelope in the mail. You are the proud owner of a juicy leak. You've managed to scrape all the PDFs from that stone-age open government portal. Now all you have to do is report.
In the course of my work on the Overview document mining software I've seen just about every problem that journalists can have with a document-driven story. These are the stories of unreadable formats, heaps of paper, and late nights reading.
When you're the proud owner of a brand new document dump, the next steps depend on what you've got and what you're trying to do. You might have one page or one million pages. You could be starting with a tall stack of paper or a CSV file or anything in between. Maybe you already know exactly what you're looking for, or maybe that anonymous tip was so non-specific you don't know where to start. This post is organized as a sort o

  
## gist:6003431

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                jstray
                / gist:6003431
            
            
              Last active
              December 19, 2015 19:08
            
              
                Drawing conclusions from data
              
          
    The job of a data journalist is to turn data into a story. If you start with a spreadsheet of cancer rates, the story might be "people living near oil refineries had three times the rate of lung cancer." Or it might not be, because you could be mis-interpreting the data in some way. This recorded talk is about how not to get fooled when you go looking for stories in your data.
<iframe width="420" height="315" src="//www.youtube.com/embed/3NuyRKNkBQg" frameborder="0" allowfullscreen></iframe>

This lecture was given as part of the 15th Annual Science Immersion Workshop for Journalists at the Metcalf Institute for Marine & Environmental Reporting, Rhode Island. The slides are here, and the Github repo with all the R code needed to reproduce the examples in the talk is here.
###Interpreting data
A data journalism story is usually ab

  
## gist:3741305
# -------------------------------- MDS plot ------------------------------

fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim
x <- fit$points[,1]
y <- fit$points[,2]

# ]plot with colors corresponding to party
parties = factor(row.names(recentvotes))
plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", main="House of Lords voting", pch=19, col=parties)
legend('topright', legend = levels(parties), col=palette(), cex = 0.8, pch = 1)

## gist:3741280
# -------------------------------- Compute distances ------------------------------

# distance function = 1 - fraction of votes where both voted, and both voted the same
votedist <- function(v1, v2) {
	overlap = v1!=0 & v2!=0
	numoverlap = sum(overlap)
	match = overlap & v1==v2
	nummatch = sum(match)
	if (!numoverlap)
		dist = 1

## gist:3741258
# -------------------------------- Take recent votes ------------------------------

# take only N most recent votes
Nvotes=100
recentvotes = votes[,1:Nvotes]

# set MP row names to party name
row.names(recentvotes) = lords[,"party"]

# remove all MPs who didn't vote at all in these recent votes

## gist:3741250
library(proxy) 	# need custom distance function capability

# -------------------------------- Load data ------------------------------

# Load in vote history
# strip out vote description, date, etc, and transpose so each row is an MP
votetable = read.csv("votematrix-lords.csv", header=T, sep=",")
votes = votetable[, 5:1047]
votes = t(votes)
	# -------------------------------- MDS plot ------------------------------

	fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim
	x <- fit$points[,1]
	y <- fit$points[,2]

	# ]plot with colors corresponding to party
	parties = factor(row.names(recentvotes))
	plot(x, y, xlab="Coordinate 1", ylab="Coordinate 2", main="House of Lords voting", pch=19, col=parties)
	legend('topright', legend = levels(parties), col=palette(), cex = 0.8, pch = 1)
	# -------------------------------- Compute distances ------------------------------

	# distance function = 1 - fraction of votes where both voted, and both voted the same
	votedist <- function(v1, v2) {
	overlap = v1!=0 & v2!=0
	numoverlap = sum(overlap)
	match = overlap & v1==v2
	nummatch = sum(match)
	if (!numoverlap)
	dist = 1
	# -------------------------------- Take recent votes ------------------------------

	# take only N most recent votes
	Nvotes=100
	recentvotes = votes[,1:Nvotes]

	# set MP row names to party name
	row.names(recentvotes) = lords[,"party"]

	# remove all MPs who didn't vote at all in these recent votes
	library(proxy) # need custom distance function capability

	# -------------------------------- Load data ------------------------------

	# Load in vote history
	# strip out vote description, date, etc, and transpose so each row is an MP
	votetable = read.csv("votematrix-lords.csv", header=T, sep=",")
	votes = votetable[, 5:1047]
	votes = t(votes)