Skip to content

Instantly share code, notes, and snippets.

@ivopbernardo
Last active May 18, 2021 16:51
Show Gist options
  • Save ivopbernardo/dcd9842a7c2fe41ea15a48d28fadbe4f to your computer and use it in GitHub Desktop.
Save ivopbernardo/dcd9842a7c2fe41ea15a48d28fadbe4f to your computer and use it in GitHub Desktop.
Examples around NLTK stemming
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer
porter = PorterStemmer()
snowball = SnowballStemmer(language='english')
lanc = LancasterStemmer()
sentence_example = (
'This is definitely a controversy as the attorney labeled the case "extremely controversial"'
)
# Porter Stemmed version of sentence example
stemmed_sentence = [
porter.stem(word) for word in word_tokenize(sentence_example)
]
# Examples of single words used in the post:
porter.stem('cats')
porter.stem('amazing')
porter.stem('amazement')
porter.stem('amaze')
porter.stem('amazed')
porter.stem('amazon')
porter.stem('nation')
porter.stem('premonition')
# Comparison between Porter and Snowball
porter.stem('loudly')
snowball.stem('loudly')
# Comparison between Snowball and Lancaster
porter.stem('salty')
snowball.stem('salty')
# Example of EU_Definition:
eu_definition = '''
The European Union (EU) is a political and economic union of 27 member states that are located primarily in Europe.
Its members have a combined area of 4,233,255.3 km2 (1,634,469.0 sq mi) and an estimated total population of about 447 million.
The EU has developed an internal single market through a standardised system of laws that apply in all member states in those matters,
and only those matters, where members have agreed to act as one. EU policies aim to ensure the free movement of people, goods,
services and capital within the internal market; enact legislation in justice and home affairs; and maintain common policies on trade,
agriculture, fisheries and regional development. Passport controls have been abolished for travel within the Schengen Area.
A monetary union was established in 1999, coming into full force in 2002, and is composed of 19 EU member states which use the euro
currency. The EU has often been described as a sui generis political entity (without precedent or comparison).
The EU and European citizenship were established when the Maastricht Treaty came into force in 1993.
The EU traces its origins to the European Coal and Steel Community (ECSC) and the European Economic Community (EEC), established,
respectively, by the 1951 Treaty of Paris and 1957 Treaty of Rome. The original members of what came to be known as the European
Communities were the Inner Six: Belgium, France, Italy, Luxembourg, the Netherlands, and West Germany. The Communities and their
successors have grown in size by the accession of new member states and in power by the addition of policy areas to their remit.
The United Kingdom became the first member state to leave the EU on 31 January 2020. Before this, three territories of member states
had left the EU or its forerunners. The latest major amendment to the constitutional basis of the EU, the Treaty of Lisbon,
came into force in 2009.
'''
# Tokenizing and Stemming the eu_definition
tokenized_eu = word_tokenize(eu_definition)
porter_eu = [porter.stem(word) for word in tokenized_eu]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment