Skip to content

Instantly share code, notes, and snippets.

@JoshMock
Last active December 7, 2015 20:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JoshMock/c655086f2522a1080eb5 to your computer and use it in GitHub Desktop.
Save JoshMock/c655086f2522a1080eb5 to your computer and use it in GitHub Desktop.
How to generate random tweet strings using your Twitter archive
  1. Log into Twitter and go to your settings
  2. Request your Twitter archive
  3. Download and unzip the archive file
  4. Put tweets.csv in the same directory as this gist's package.json and markov-tweets.js files
  5. Run npm install to install dependencies
  6. Run node markov-tweets.js to generate some strings!

Running the script may take a few seconds depending on the size of your archive CSV. When it's done it will log out 20 random strings based on the Markov chain generated.

var fs = require('fs');
var markov = require('markov');
var parse = require('csv-parse');
var transform = require('stream-transform');
var filter = require('stream-filter');
// generate seed for markov
var seed = fs.createReadStream(__dirname + '/tweets.csv')
// parse CSV
.pipe(parse({ columns: true }))
// filter out RTs
.pipe(filter(data => data.retweeted_status_id.length === 0))
// transform CSV into plain text of tweets, doing some text cleanup
// along the way
.pipe(transform(function (record, callback) {
var text = record.text
// strip out URLs
.replace(/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)?/gi, '')
// clean punctuation
.replace(/[\u2018\u2019]/g, "'")
.replace(/[\u201C\u201D"`“”\[\]]/g, '')
.replace(/&/g, '&')
// drop period from public mentions
.replace(/\.@/g, '@')
.toLowerCase();
callback(null, text + '\n');
}));
// parse file for markov
var m = markov();
m.seed(seed, function () {
// generate 20 of 'em to pull out the gems
for (var i = 0; i < 20; i++) {
console.log(m.fill(m.pick(), Math.floor(Math.random() * (8 - 4) + 4)).join(' '));
}
});
{
"name": "tweets",
"version": "1.0.0",
"author": "Josh Mock",
"license": "ISC",
"dependencies": {
"csv-parse": "^1.0.1",
"markov": "0.0.7",
"stream-filter": "^1.0.0",
"stream-transform": "^0.1.1"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment