Skip to content

Instantly share code, notes, and snippets.

@numpde
Last active November 9, 2017 03:56
Show Gist options
  • Save numpde/0c0d9469fa0ffaae05f4f1b2fdec2dfb to your computer and use it in GitHub Desktop.
Save numpde/0c0d9469fa0ffaae05f4f1b2fdec2dfb to your computer and use it in GitHub Desktop.
Segment extraction and expansion by words
#!/usr/bin/python3
# CC-BY-4.0
import sys, argparse
from random import shuffle, randrange, choice
# Collate lines, separating them by a space
s = ' '.join(sys.stdin.readlines())
# Remove non-text
s = "".join(c for c in s if (c.isalnum() or (c == ' ')))
# Split into a list of words
s = s.split()
parser = argparse.ArgumentParser(description='Generate a segment of certain word length from standard input.')
parser.add_argument('--length', type=int, help='contiguous segment length in words (all by default)')
parser.add_argument('--resample', type=int, help='resample from segment to get a text of this length')
parser.add_argument('--shuffle', action="store_true", help='shuffle output')
args = parser.parse_args()
if (args.length) :
L = args.length
assert (L <= len(s)), "Not enough words to generate segment of desired length."
i = randrange(0, len(s)-L+1)
s = s[i:(i+L)]
if (args.resample) :
s = [choice(s) for _ in range(args.resample)]
if (args.shuffle) :
shuffle(s)
# Collate the list of words, separating by a space
s = ' '.join(s)
print(s)
@numpde
Copy link
Author

numpde commented Apr 12, 2017

Help
python3 rndseg.py -h

Extract words
echo "A, B? C D." | python3 rndseg.py
A B C D

Extract and shuffle words
echo "A, B? C D." | python3 rndseg.py --shuffle
D B C A

Extract a random contiguous segment of 3 words
echo "A, B? C D." | python3 rndseg.py --length=3
B C D

Extract a random contiguous segment of 3 words; shuffle words
echo "A, B? C D." | python3 rndseg.py --length=3 --shuffle
B C A

Extract a random contiguous segment of 3 words; shuffle words
echo "A, B? C D." | python3 rndseg.py --length=3 | python3 rndseg.py --shuffle
B C A

Extract a random contiguous segment of 3 words; generate a text of 10 words by sampling words from the segment
echo "A, B? C D." | python3 rndseg.py --length=3 --resample=10
B B B B B A C A C C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment