Skip to content

Instantly share code, notes, and snippets.

@sidequestboy
Last active August 29, 2015 13:56
Show Gist options
  • Save sidequestboy/9205174 to your computer and use it in GitHub Desktop.
Save sidequestboy/9205174 to your computer and use it in GitHub Desktop.
Little script that takes user input, and on the fly prints the corresponding LZ78 codeword pairs
#!/usr/bin/env python3
"""
LZ78 toy, reads characters from stdin and encodes on the fly.
"""
from sys import stdout
"""import getch, or define my own"""
try:
from msvcrt import getch
except ImportError:
"""Not on Windows, try Unix-like approach"""
def getch():
import sys, tty, termios
fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
try:
tty.setraw(fd)
ch = sys.stdin.read(1)
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
return ch
ch = ""
current_str = ""
# s is a set of seen strings
s = {""}
# d is dict of seen strings to their index
d = {"": 0}
i = 1
while True:
# loop until we see new pattern
while current_str in d:
# add another character
ch = getch()
current_str += ch
# new pattern
if ch not in ('', ''):
s.add(current_str)
d[current_str] = i
print("({}, {})".format(d[current_str[:-1]], ch), end='')
stdout.flush()
current_str = ""
i += 1
else:
# terminate
if ch == '':
print("({}, EOF)".format(d[current_str[:-1]]), end='')
else: # ch == ''
print("({}, ^C)".format(d[current_str[:-1]]), end='')
print("")
exit(0)
@sidequestboy
Copy link
Author

Usage:

python lz78.py

then type some characters, and watch the LZ78 codepairs be generated :D

Note:

  • This script uses the utf-8 characters for EOT and ETX to handle Ctrl-D, Ctrl-C (Github Gist doesn't render them...)
  • It also uses Python 3 syntax for the end kwarg to the print function, so use Python 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment