Skip to content

Instantly share code, notes, and snippets.

@cldellow
cldellow / fetch_page.py
Last active January 29, 2019 15:42 — forked from Smerity/fetch_page.py
An example of fetching a page from Common Crawl using the Common Crawl Index
import gzip
import json
import requests
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
# Let's fetch the Common Crawl FAQ using the CC index
resp = requests.get('http://index.commoncrawl.org/CC-MAIN-2015-27-index?url=http%3A%2F%2Fcommoncrawl.org%2Ffaqs%2F&output=json')
@cldellow
cldellow / LDA_SparkDocs
Last active February 16, 2016 18:40 — forked from jkbradley/LDA_SparkDocs
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable