Skip to content

Instantly share code, notes, and snippets.

@brsc2909
Created July 17, 2022 14:40
Show Gist options
  • Save brsc2909/4f1e0bd1a29291d7ddb2b4d732689164 to your computer and use it in GitHub Desktop.
Save brsc2909/4f1e0bd1a29291d7ddb2b4d732689164 to your computer and use it in GitHub Desktop.
State of the art compression based on emojis
import emoji
import re
import gzip
def gzip_str(string_: str) -> bytes:
return gzip.compress(string_.encode("utf-8"))
def ezip_str(string_: str) -> str:
compressed = []
for i in re.split(r"\W+", string_):
e = emoji.emojize(f":{i.lower()}:")
if emoji.is_emoji(e):
emoji.emojize(f":{i.lower()}:")
compressed.append(emoji.emojize(f":{i.lower()}:"))
else:
compressed.append(i)
return gzip_str(" ".join(compressed))
mystr = """
A ZIP file is correctly identified by the presence of an end of central directory record which is located at the end of the archive structure in order to allow the easy appending of new files.
If the end of central directory record indicates a non-empty archive, the name of each file or directory within the archive should be specified in a central directory entry,
along with other metadata about the entry, and an offset into the ZIP file, pointing to the actual entry data.
This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files.
The entries within the ZIP file also include this information, for redundancy, in a local file header. Because ZIP files may be appended to, only files specified in
the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central
directory may declare that some files have been deleted and other files have been updated.
"""
orig_compressed = gzip_str(mystr)
emoji_compressed = ezip_str(mystr)
print(f"Gzip: {len(orig_compressed)} bytes")
print(f"Emoji Zip: {len(emoji_compressed)} bytes")
# Gzip: 504 bytes
# Emoji Zip: 483 bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment