Skip to content

Instantly share code, notes, and snippets.

@tommeagher
Last active December 11, 2015 17:38
Show Gist options
  • Save tommeagher/4635401 to your computer and use it in GitHub Desktop.
Save tommeagher/4635401 to your computer and use it in GitHub Desktop.

#Hack the news

Good morning. Welcome. My name is Tom Meagher and I am the data editor at Digital First Media. I work in a quasi-futuristic, post-apocalyptic newsroom in Manhattan called Project Thunderdome, where I lead a team of developer journalists building data-driven, interactive news applications.

I've spent my entire career in newsrooms and in the past few years I've been sucked into this world of news development.

So we're here for the weekend by this time tomorrow, we're going to have built the next amazing news application. My goal right now is to talk a little about how news development works, show you some examples and offer some advice for getting the most out of the weekend.

What's really great about HackJersey is that we are able to bring so many of you from these two worlds of news and programming together. Fundamentally, reporters are great storytellers and coders are great at building software, and working together there are greater possibilities than either can achieve alone.

I know some of the journalists are eager to learn about data reporting and news development, but they want to contribute to their team and what they really want is to learn to code.

In 30 minutes or even 24 hours, I’m not going to be able to teach you how to code.

Instead, right now I want to give you some practical ideas about ways you two groups, the journos and the developers, can really help each other in this endeavor to make our news better.

It's no surprise to anyone that when it comes to technology, as an industry, we journalists are far behind where we need to be. The possible exception to this rule has been this vanguard of data journalists that until not that long ago specialized in what was called computer-assisted reporting. This beat consists of mining public databases, analyzing them and producing stories for the paper. But the name itself feels a bit archaic, right?

Ben Welsh at the LA Times Data Desk sums up the situation very well when he compares us to other industries, architecture and science and health, that are so far ahead of where journalism is.

###new slide

"Only in journalism do we continue to distinguish ourselves with Microsoft Excel."

I'll have to apologize to Ben for borrowing his joke, but I'm sure he open sourced it.

This computer-assisted reporting--classic example,

take the tax delinquent notices int he newspapers and cross it with the public employees database and find all the deadbeat property owners who sit on the city council or the board of freeholders

But we are still far behind. The people we cover in business and sports, for instance, are using cutting edge tools, and we're sitting in the corner with stone arrowheads trying to keep up.

Now, to our programmers in the room...have you ever looked for apps about New Jersey?

###new slide

The other day, I searched iTunes for iPhone apps with the word "Jersey"

You can do better. And we can do better together.

So I think we can all look for some inspiration in many ways to the old hacker mentality.

###new slide

Let's take something apart, figure out how it works and find a way we can make it better

Let's take the news apart. Let's take our data reporting apart. Let's take our websites apart and find ways we can improve them. In an industry that many days feels like it's falling apart, this is incredibly refreshing.

The old joke is that there are three core virtues of a programmer Laziness Impatience

I don't want to do dull, repetitive tasks if I can possibly avoid them. I will invest a significant amount of time and energy on the front end to make my life easier in the long run.

###new slide Hubris

Any problem can be solved with enough code.

Compare to the newspaper world where the unofficial motto in some newsrooms could be "I will do everything the hardest way possible to avoid learning something new."

Even though this is hard stuff, learning new skills, for this weekend, let's forget about the inverted pyramid and think about new ways to communicate the news in this amazing new medium.

Even if we can't code, as journalists, we are resourceful and we bring a lot to this endeavor.

  • We know how to talk to people and get information that's otherwise impossible to find.
  • We have a good sense of what's engaging, what's significant.
  • These are all strengths that we can bring to developing news applications and it can help us work with our programmer friends to make something new.

Which brings us to this hackathon.

  • What is our goal?
  • We're going to spend 24 hours bringing our mutual strengths together to build something better. Usually this is about building something with utility or novelty, but

what’s unique about Hack Jersey is that it’s about making something meaningful.

The end result should be a product that tells a great NJ story using open data, and that hopefully leads to more open data and greater transprency for NJ.

What's the raw material of news applications, software as journalism, as Scott Klein of ProPublica likes to call it?

###new slide Data + news judgment + programming

In reporting, we use data all the time. Some of it comes from an API or a CSV, if we're lucky. A lot of it comes from PDFs or from painstakingly reporting out each record by hand ourselves.

Hoboken has an Open Gov portal. NJ has lots of data in various states but too much of it is in bad layouts in bad file formats. The best we can get are some Excel spreadsheets. More often than not, they're stuck in PDFs. For programming, this makes a heavy chore of scraping, OCR'ing and cleaning data before you can eeven begin to analyze or visualize it.

On our website, we compiled a list of more than 50 data sets that have some kernel of news value in them, http://www.hackjersey.com/public-data/. You'll see on there that many of them are in dire need of some love before they can be really hacked on...torn apart, fused with other data and have meaning brought to it. Some of you have brought data, and others have found some to use. And our next speaker, Marc Pfeiffer is going to talk a little bit about why it's so hard to get good data from government agencies.

What we really could use is some computer assisted gov't.

In the future, Hack Jersey will have events to clean data and put it in formats that are more useful, but today is about building. For today, let's take what we can use and make something new.

But once you have that data, what do you do with it?

We want to use some of the techniques of programming and web development to make our data stories better.

There are tons of examples. Matt Ericson is going to show you some of the absolute best data visualization in news being done anywhere after me, so I'll leave that to him. But I want to show a few examples of using data to tell stories in very different ways.

###new slide The Boy Scouts' 'perversion files'

  • 1300 unique records, many dupes.
  • On 26 pages.
  • No downloads, No searching, No sorting.
  • Nominally, court-ordered public data that was useful to just about no one. There was a captcha.
  • The site crashed in 15 minutes and that may be a generous estimate. But it was long enough for us to grab 3 pages and write a scraper. So when the site came back up, we scraped out all the data, grabbed links to the original reports and had it back up online for all of our papers to search and use, for reporting or on their websites.

###new slide Seitgest, Sunlight Foundation's mobile app http://sitegeist.sunlightfoundation.com/

Census, FEC data, weather data, polluted spots, 3.8 miles of my house is a superfund site! It's location aware. it's fantastic. This is mostly scraping and hitting APIs to combine tons of data and send it back to me. They also have their OpenStates app that scrapes state legislature websites and gives me geolocated data on the legislators for the district I'm standing in, how they vote and how they raise money.

So this helps us collect the data. But what we really want to do is automate the data collection.

Let's take this to the next step, and let me talk for a second about Ben Welsh, the guy whose joke I forked earlier. He has built what he calls his robot reporters, scripts to cull police arrest reports and alert him when newsworthy people get arrested. And USGS alerts wheneever an earthquake happens that automatically writes the brief and posts it online, and tweets it.

###new slide Ghost Factories, http://usatoday30.usatoday.com/news/nation/smelting-lead-contamination

Hundreds of factories that smelted lead, never cleaned up, just ocvered over and forgotten.

This came from a tip in the American Journal of Public Health 11 years ago. They did soil tests. and found contamination Sanborn fire insurance maps overlaid on google maps. What is now a school can be seen with the smelting furnace on top of it.

"Alex Marciniak remembers his grandmother complaining about the dust that spewed from the smelting plant across the street from her home in Carteret, N.J. When the wind blew toward the modest row houses in their working-class neighborhood, the dust would foul laundry hanging in the yard. It coated cars, blew into houses. It was everywhere.

"We'd have to close all the windows in the house because it was hard to breathe," Marciniak, 43, recalled of his childhood.

The smelting operation for 80 years "spewed forth enormous amounts of contaminating materials," a federal judge concluded in a June 2009 ruling on a lawsuit over the impact of the plant's historical operation on parts of the site it once occupied. "Even after (pollution) controls were put in place, the controls were inadequate, defective, and often non-functional."

USA TODAY built their database by hand. They also used geo data. They invested a lot of time and research into their project and it was one of the best of the year last year.

It's really a master class in how it's done.

What's the key to each of these projects? Using programming techniques to automate parts of the reporting and publishing process, dealing with modest sized (in programming terms) data sets, making it interactive, unlocking it for readers and bringing context to it.

All this data scraping and munging and reporting has to lead to something more than a story, right?

###new slide

Build the product

Emphasize that hackathon projects are about building software, but great products are about much more. They're about clear messaging, simple interface, branding, color psychology, and ultimately the best HackJersey products will tell a great story. This is where these cross-disciplinary teams will really stand out.

You coders understand database technologies, working with open datasets, integrating all of the moving parts, and getting things up and running. That leaves so many other things to do:

  • branding
  • communication
  • color scheme
  • wire frames
  • developing the story
  • working on the presentation
  • registering the domain name
  • setting up social media accounts for the project
  • and folding feedback into the product at every step in the development cycle.

So you both have things to do, but what makes for a good news application? Accountability, Transparency, money and safety.

What is the nut graf? the "So what?" Why does this application matter? Not only how is it useful, but why should I care?

And always keep in mind these coding best practices from ProPublica -

###new slide https://gist.github.com/4075124

This can't be about Snookie and JWoww. Even though the techniques and the medium are totally different, these have to be rooted in our best journalistic values.

###new slide So some pointers for the next 26 hours.

DON'T

  • Give up. If your team isn't working, don't be afraid to find another project that might be a better fit.
  • Get overwhelmed.
  • Don't try to learn to code. Learn about the process and the key components of a product.
  • One tip for journalists working with coders, from my friend Brian Donohue of Echo: Great coding is like great sleeping. You may sleep for 8 hours, but you hit REM sleep after a certain amount of time. If you get woken up during REM sleep it can feel like you've never slept at all. I really want to make sure journalists are not tapping me every five minutes to discuss something. Instant messaging is key when possible.

###new slide Do's

  • Communicate the roles of each team member clearly, and keep each other updated on tasks and expectations
  • Work together on telling a great story and finding open datasets
  • Collaborate on user interface, design, branding, etc

JOURNOS

  • you're a savvy news consumer. What kind of apps would you use?
  • you're a beat reporter and a subject matter expert. What are the troves of data on your beat that you've never been able to explore or never had the firepower to really mine for possibilities?
  • you know your way around Excel, or maybe even a scripting language. Can you validity check the data and clean it up to prepare it to be brought into the database powering your app?
  • You can help draw mockups and wireframes of what the apps, either web or mobile, should look like. You can weigh in on how it works.
  • You can ask questions: does this idea make sense? Are there logical fallacies underlying it or just shallow reporting that doesn't stand up to scrutiny? You can save a weak idea from itself. Being that nagging skeptical voice saying "I think this data doesn't tell the story we want our app to tell" can be a very important role.
  • You can be an expert user tester. As the prototype is coming together, does it work the way it's supposed to. As a user, try to break it.
  • Make sure that the "content," the information and chatter and explainers are both accurate and compelling.
  • You can help craft the "pitch" for the judges. Whittle it down to something manageable and attractive. Write the elevator pitch and rehearse it.

CODERS As we all know, it can be very boring to watch someone coding if you don't have any background or experience. It's just the Matrix, but less interesting. But pair programming can be a very powerful learning tool. If you can, especially on Saturday, take your time and try to explain to your teammates who are interested what you're doing and why it matters. You don't have to explain the syntax of the language or the class you're instantiating. You don't have to teach anyone how to code, but you can walk us through the concepts, so when we get back to our newsrooms, we can anticipate the kind of issues and steps we'll want to go through when we want partner with developers on news projects.

###new slide And perhaps most importantly, have fun. You're giving up too much time this weekend to not learn a bit and have some fun. And if you're not having fun, come see me or Debbie and we'll do whatever we can to help.

###new slide Let's take something apart, figure out how it works and find a way we can make it better and create something new and amazing.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment