Skip to content

Instantly share code, notes, and snippets.

@pszemraj
Created January 31, 2022 05:34
Show Gist options
  • Save pszemraj/302a7d1a9bcf69ffc0e4137dfaca758a to your computer and use it in GitHub Desktop.
Save pszemraj/302a7d1a9bcf69ffc0e4137dfaca758a to your computer and use it in GitHub Desktop.
how to use the rpunct pip package to repunctuate grammar (after CPU fix implemented)
"""
usage example:
from scratch.rpunct.rpunct import RestorePuncts
rpunct_fixer = RestorePuncts()
bot_resp = repunctuate_grammar(
original_text, rpunct_obj=rpunct_fixer, verbose=True
)
"""
def repunctuate_grammar(input_text, rpunct_obj, verbose=False):
"""
repunctuate_grammar - uses the rpunct module to repunctuate a string after stripping all existing punctuation
Args:
input_text (str): [string to be repunctuated]
rpunct_obj (rpunct.RPunct): [rpunct object]
verbose (bool, optional): [whether to print the output of the rpunct module]. Defaults to False.
Returns:
[str]: [repunctuated string]
"""
if verbose:
print(f"repunctuating:\n\t{input_text}")
# strip all punctuation on the input text, except for apostrophes
input_text = re.sub(r"[^\w\s\']", "", input_text)
st = time.perf_counter()
ptext = rpunct_obj.punctuate(input_text, lang="en")
rt = time.perf_counter() - st
if verbose:
print(f"the new string is:\n\t{ptext}")
print("repunctuation took {} seconds".format(rt))
return ptext
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment