Created
December 22, 2014 03:35
-
-
Save policevideorequests/f5304fd0a5d2c67d4f1e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# code is copyright Timothy Clemans released as free open source GPL | |
The Python code to redact every capitalized word and number and any sensitive words related to say SWAT team is: | |
import re | |
narrative = "This is a police narrative. This birthday 8/9/2014 will get redacted. This social security number 553-55-5555 will get redacted. This medication Xanax will get redacted." | |
words = narrative.split() | |
replacement_words = [] | |
words_to_remove = ['robot'] | |
for word in words: | |
if word[0].isupper(): | |
replacement_words.append('**R:capitalized**') | |
elif re.findall('\d', word): | |
replacement_words.append('**R:number**') | |
elif word in words_to_remove: | |
replacement_words.append('**R:restrictedword**') | |
else: | |
replacement_words.append(word) | |
print " ".join(replacement_words) | |
Output: | |
**R:capitalized** is a police narrative. **R:capitalized** birthday **R:number** will get redacted. **R:capitalized** social security number **R:number** will get redacted. **R:capitalized** medication **R:capitalized** will get redacted. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment