Skip to content

Instantly share code, notes, and snippets.

@NielsMinssen
Created July 19, 2022 11:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NielsMinssen/fd7cf09b208329a1a41e322291a821b8 to your computer and use it in GitHub Desktop.
Save NielsMinssen/fd7cf09b208329a1a41e322291a821b8 to your computer and use it in GitHub Desktop.
#Chargement des lemmes de la langue française
nlp = spacy.load('fr_core_news_md')
#liste vide pour aceuillir les mots lematisés
clean_words_lem = []
#remplissage de la liste avec les mots lematisés
clean_words=nlp(" ".join(clean_words))
for w in clean_words:
clean_words_lem.append(w.lemma_)
#afficher les k n-grammes les plus communs dans le texte, ici les 100 premiers trigrammes
n=3
k=100
ngram = list(nltk.ngrams(clean_words_lem,n))
fdist = FreqDist(ngram)
print(fdist.most_common(k))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment