Skip to content

Instantly share code, notes, and snippets.

@junaidpv
Created December 26, 2010 06:16
Show Gist options
  • Save junaidpv/755250 to your computer and use it in GitHub Desktop.
Save junaidpv/755250 to your computer and use it in GitHub Desktop.
To get list of short pages
import wikipedia
import codecs
NEWLINE = '\r\n'
siteFamily = 'wikipedia'
siteLangCode = 'hi'
wikiSite = wikipedia.Site(code=siteLangCode, fam=siteFamily)
log = codecs.open('logs/pages-in-cat.log', mode='a+', encoding = 'utf-8')
listFile = codecs.open('short_pages_in_hindi.txt', mode = 'w+', encoding = 'utf-8')
numberOfItems = 100
shortPages = wikiSite.shortpages(numberOfItems)
listString = ''
for page in shortPages:
listString = listString + page[0].title()
listString = listString + NEWLINE
listFile.write(listString)
listFile.flush()
listFile.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment