Skip to content

Instantly share code, notes, and snippets.

@MarshalW
Last active February 27, 2022 08:17
Show Gist options
  • Save MarshalW/aecdd1841ddc39f67766d2018a58f224 to your computer and use it in GitHub Desktop.
Save MarshalW/aecdd1841ddc39f67766d2018a58f224 to your computer and use it in GitHub Desktop.
使用nltk做英文断句

使用 nltk 做英文断句

需要安装nltk和nltk.data

$ pip install --user -U nltk
$ python -m nltk.downloader popular

代码:

#!/usr/bin/env python3

from nltk.tokenize import sent_tokenize
import nltk

str="""
Last week I went to the theatre. I had a very good seat. The play was very interesting. I did not enjoy it. A young man and a young woman were sitting behind me. They were talking loudly. I got very angry. I could not hear the actors. I turned round. I looked at the man and the woman angrily. They did not pay any attention. In the end, I could not bear it. I turned round again. 'I can't hear a word!' I said angrily.
"""

sent_tokenize_list = sent_tokenize(str)

for sent in sent_tokenize_list: 
    print(sent)
    print('')

运行效果:

$ ./demo.py

Last week I went to the theatre.

I had a very good seat.

The play was very interesting.

I did not enjoy it.

A young man and a young woman were sitting behind me.

They were talking loudly.

I got very angry.

I could not hear the actors.

I turned round.

I looked at the man and the woman angrily.

They did not pay any attention.

In the end, I could not bear it.

I turned round again.

'I can't hear a word!'

I said angrily.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment