I've just put up a large chunk of my Twitter archives up on the Talis Platform service. Talis Platform is a 'cloud'-based triplestore hosting service. More at http://n2.talis.com A triplestore is like a database but for graphs of RDF triples. The cool thing about RDF and the triplestore is that you basically have a completely schema-less datastore. You don't have to figure out "Oh, there's integers going in this field and strings going in that". You just upload a big pile of RDF and the triplestore keeps it all there. This is obviously not as efficient as using a database, so if you want to grow to Google size, it may not be the best solution. But because it's cloud-based I don't have to think about that either - that's up to Talis! ;) Turning Twitter data into RDF is pretty easy. The approach I found easiest was to use the API which returns either XML or JSON. I used XML as I have already got an XSLT stylesheet that does most of the work. Pre-requisites: * a Unix-based OS * curl * xsltproc * Ruby 1.8.6+ (or JRuby 1.3.0) * nokogiri gem I had old archive data from Twitter, back using the old archive method. In that, tweets that are at-replies to other tweets only have the ID of the other user, not the screen name. But the URI of tweets is constructed from the screen name. You then need to look up the IF using the /users/show.xml?user_id=(val) method. The code to do that is in transform.rb transform.rb is a bit of a lazy hack. If you run it over old archive data, it WILL crash. that's because open-uri raises an exception when it gets a 404 status. Silly really, as 404 is a perfectly valid status, and is semantically meaningful. returning 404 means there is no @tommorris on twitter. ;) When it hit a 404, I took whatever number it returned and manually grepped for it in the file, figured out who the at-reply was to and then added that persons etails to the YAML file. The XSLT used is below, but I recommend that if you want to do this to wait a few days. I'm planning on rewriting the XSLT a bit soon to make it suck less. The code is twitter-rdf.xsl As for actually doing the transformations and loading them into the Talis store, I used IRB (interactive Ruby shell) to invoke xsltproc and curl. irb> `ls *.xml`.split.each{|i| `xsltproc ~/Code/twitter-rdf.xsl #{i} > #{i.split('.')[0] + ".rdf"` } irb> `ls *.rdf`.split.each{|i| `curl -v --digest -u "(username):(password)" --retry 10 --retry-delay 10 -H "Content-Type:application/rdf+xml" --data @#{i} http://api.talis.com/stores/(storename)/meta` }