Skip to content

Instantly share code, notes, and snippets.

@format37
Created March 10, 2021 18:52
Show Gist options
  • Save format37/83642738ee273d85a2606b159c5175b0 to your computer and use it in GitHub Desktop.
Save format37/83642738ee273d85a2606b159c5175b0 to your computer and use it in GitHub Desktop.
epub word counter
# Thanks to https://github.com/aerkalov/ebooklib
#!pip install EbookLib
import ebooklib
from ebooklib import epub
from bs4 import BeautifulSoup
from collections import Counter
book = epub.read_epub('ready_player_two.epub')
content = ''
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT:
soup = BeautifulSoup(item.get_content())
content += soup.get_text()
content = content.replace('\n', ' ')
break_counter = 100
for word, count in Counter(content.split()).most_common():
print(count, word)
break_counter -= 1
if break_counter < 0:
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment