Skip to content

Instantly share code, notes, and snippets.

@Tblue
Last active July 16, 2024 09:50
Show Gist options
  • Save Tblue/62ff47bef7f894e92ed5 to your computer and use it in GitHub Desktop.
Save Tblue/62ff47bef7f894e92ed5 to your computer and use it in GitHub Desktop.
MozLz4a compression/decompression utility
#!/usr/bin/env python3
# vim: sw=4 ts=4 et tw=100 cc=+1
#
####################################################################################################
# DESCRIPTION #
####################################################################################################
#
# Decompressor/compressor for files in Mozilla's "mozLz4" format. Firefox uses this file format to
# compress e. g. bookmark backups (*.jsonlz4).
#
# This file format is in fact just plain LZ4 data with a custom header (magic number [8 bytes] and
# uncompressed file size [4 bytes, little endian]).
#
####################################################################################################
# DEPENDENCIES #
####################################################################################################
#
# - Tested with Python 3.10
# - LZ4 bindings for Python, version 4.x: https://pypi.python.org/pypi/lz4
#
####################################################################################################
# LICENSE #
####################################################################################################
#
# Copyright (c) 2015-2022, Tilman Blumenbach
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification, are permitted
# provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this list of conditions
# and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice, this list of
# conditions and the following disclaimer in the documentation and/or other materials provided
# with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
# IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import argparse
import sys
import lz4.block
class BinFileArg:
def __init__(self, mode):
self._mode = mode
def __call__(self, arg):
objs = {
"r": sys.stdin.buffer,
"w": sys.stdout.buffer,
}
if arg == "-":
return objs[self._mode]
try:
return open(arg, self._mode + "b")
except OSError as e:
raise argparse.ArgumentTypeError(
"failed to open file for %s: %s" % (
"reading" if self._mode == "r" else "writing",
e
)
)
def decompress(file_obj):
if file_obj.read(8) != b"mozLz40\0":
raise ValueError("Invalid magic number")
return lz4.block.decompress(file_obj.read())
def compress(file_obj):
compressed = lz4.block.compress(file_obj.read())
return b"mozLz40\0" + compressed
def get_argparser():
p = argparse.ArgumentParser(
description="MozLz4a compression/decompression utility"
)
p.add_argument(
"-d", "--decompress", "--uncompress",
action="store_true",
help="Decompress the input file instead of compressing it."
)
p.add_argument(
"in_file",
type=BinFileArg("r"),
help="Path to input file. `-' means standard input."
)
p.add_argument(
"out_file",
type=BinFileArg("w"),
nargs="?",
default="-",
help="Path to output file. `-' means standard output (and is the default)."
)
return p
def main():
args = get_argparser().parse_args()
try:
with args.in_file as fh:
if args.decompress:
data = decompress(fh)
else:
data = compress(fh)
except Exception as e:
print(
"Could not compress/decompress file `%s': %s" % (
args.in_file.name,
e
),
file=sys.stderr
)
sys.exit(4)
try:
with args.out_file as fh:
fh.write(data)
except Exception as e:
print(
"Could not write to output file `%s': %s" % (
args.out_file.name,
e
),
file=sys.stderr
)
sys.exit(5)
if __name__ == "__main__":
sys.exit(main())
@richhaynes-zz
Copy link

You sir are a life saver! I recently wrote a script to clean up the search engines used by our organisations users and found to my horror that this is no longer possible due to this compression Mozilla have implemented. Thanks to your lovely script I can now decompress the file, parse the JSON and recompress saving me hours of work! :)

@ray-mints
Copy link

Can you please write readme? I'm not familiar with python.
When I try to run "python mozlz4a.py" or "python mozlz4a.py -d search.json.mozlz4 search.json" in my terminal - the output is

  File "mozlz4a.py", line 84
    print("Could not open input file `%s' for reading: %s" % (parsed_args.in_file, e), file=sys.stderr)
                                                                                           ^
SyntaxError: invalid syntax

Maybe I do something wrong. I've installed lz4 with pip and my python version is 2.7.12.

P.S. When I tried with python 3 on a different computer I had a different error: "module 'lz4' has no attribute 'decompress'". :(

@YawarRaza7349
Copy link

YawarRaza7349 commented Jan 9, 2018

@ray-mints

"module 'lz4' has no attribute 'decompress'"

This is because the lz4 library has since updated its API. The quick-and-dirty solution for users of this script is to change the import lz4 line of code to import lz4.block as lz4.

@alanialgor
Copy link

I'm using Fedora 26. Python2 and Python3 are installed. It seems the default for python was python2.

If I entered "python mozlz4.py -d search.json.mozlz4 search.json" it would error out.
Using "python3 mozlz4.py -d search.json.mozlz4 search.json" works as expected.

plus didn't have to mod the import lz4 line.

@ray-mints
Copy link

@YawarRaza7349
Thank you so much, good person. Now i can edit my search engines in Firefox again.
I wish Mozilla haven't compressed it in the first place.

@lilydjwg
Copy link

I've made a decompression tool with Rust too.

@serj-kzv
Copy link

There's Firefox addon to read or compress .mozlz4 text files mozlz4-edit

@Grossdm
Copy link

Grossdm commented Apr 29, 2018

@serj-kzv A better link for the AMO addon mozlz4-edit is: https://addons.mozilla.org/firefox/addon/mozlz4-edit, as it is not language specific.

@kaefer3000
Copy link

The lz4 folks changed their API, so you may want to have a look at my fork. I would have opened a pull request, but you cannot do that with gists.

@sebma
Copy link

sebma commented Aug 28, 2018

@Tblue I get this error :

$ ./mozlz4a.py -d recovery.jsonlz4 recovery.json
Could not compress/decompress file `recovery.jsonlz4': module 'lz4' has no attribute 'decompress'

Can you please update your script to be compatible with the new lz4 API ?

@Ruslan0Dev
Copy link

@tmonjalo
Copy link

@ATRescue
Copy link

ATRescue commented May 26, 2019

LZ4 is not bad in itself, but LZ4 for Firefox certainly is a bad idea.
https://www.archiveteam.org/index.php?title=Talk:Mozilla_Firefox#Criticising_mozLZ4.

@vthriller
Copy link

LZ4 is not bad in itself, but LZ4 for Firefox certainly is a bad idea.
https://www.archiveteam.org/index.php?title=Talk:Mozilla_Firefox#Criticising_mozLZ4.

@ATRescue, what you're missing is the fact that in order to be up-to-date, session files need to be constantly updated. That's a lot of writes if you have significant number of tabs open! Without compression session storage would be regularly outdated on HDDs and would wear out consumer SSDs faster.

Sure, LZ4 in standard frame format would be better. SQLite db in WAL mode (provided it could be updated incrementally) would be much better. But you can't say for sure that "there is absolutely no need for LZ4" here.

And modern file systems (including ext2, ext3, ext4, btrFS, NTFS, ZFS) offer transparent compression functionality.

Nope. ext* don't have it, XFS doesn't have it, HFS on older OS X required use of Apple-specific APIs for compression to work (i.e. it wasn't transparent). Besides, apps generally can't just tell OS to transparently compress arbitrary files, and asking end user to go X and toggle Y to make browser responsive or less write-intensive would definitely not help its market share.

@CrendKing
Copy link

If LZ4 is meant to save writes, then why search.json.mozlz4 is also LZ4? It is certainly not frequently updated.

@ingvar-lynn
Copy link

it's 6 am and i can not understand why simple { echo 0x184D2204 | xxd -r ; tail -c+9 previous.jsonlz4 ; } | lz4 -dc or anything like that would not work

@jrw
Copy link

jrw commented Mar 1, 2021

it's 6 am and i can not understand why simple { echo 0x184D2204 | xxd -r ; tail -c+9 previous.jsonlz4 ; } | lz4 -dc or anything like that would not work

God, if I could upvote this a hundred times I would. Why would anyone design a NEW file format (jsonlz4) for something that we already have so many STANDARD, well-defined formats for? Why does Mozilla require end-users to download, compile, test different potential hacks to get their data out of a custom format? Just use a STANDARD format!

@Profpatsch
Copy link

I have to agree, I’m pretty taken aback by this.

@biorpg
Copy link

biorpg commented Mar 15, 2022

The effective reason is simple: They have no obligation to provide you a reason.
The real reason is then also simple: They want the vast majority of users, including fellow developers, to use a specific subset of search engines that suit their plans for your activity.

@danuker
Copy link

danuker commented Mar 18, 2022

If you are trying to extract your search engines, it's very hard, even after decompressing there is a ton of junk in the file (images). I found a web service here. But it can not modify the engines. It looks like it's the most hostile format possible for an open-source software.

@Tblue
Copy link
Author

Tblue commented Mar 19, 2022

Script updated:

  • Now works with the latest 4.x release of the Python lz4 package.
  • Compression/decompression to/from stdout/stdin is now supported.

@simurq
Copy link

simurq commented Dec 4, 2022

Can anyone explain how to use this script to add a search engine to the lz4 file, compress and use it in Firefox? Thanks!
I have all dependencies installed, including pip and lz4, yet the only thing I get after running ./mozlz4a.py search.json.mozlz4 is this

@danuker
Copy link

danuker commented Dec 6, 2022

@simurq I'd say the script is not enough for parsing the search engines.

Here is an HTML page under a free license you can save that does it for you (but also save the linked JS files):

https://www.jeffersonscher.com/ffu/searchjson.html

@lilydjwg
Copy link

lilydjwg commented Dec 6, 2022

FYI I have a tool that can process the search engines correctly: https://github.com/lilydjwg/mozlz4-tool

@drelephant
Copy link

drelephant commented Mar 14, 2024

it's 6 am and i can not understand why simple { echo 0x184D2204 | xxd -r ; tail -c+9 previous.jsonlz4 ; } | lz4 -dc or anything like that would not work

God, if I could upvote this a hundred times I would. Why would anyone design a NEW file format (jsonlz4) for something that we already have so many STANDARD, well-defined formats for? Why does Mozilla require end-users to download, compile, test different potential hacks to get their data out of a custom format? Just use a STANDARD format!

The reason is because they started using it before the lz4 standard was finished. So that's why it's non-standard lz4. Lz4 seems like an OK choice, other than their use of it in a non-standard way. It's very fast to compress/decompress. And why not update it to use standard lz4 now?

I definitely agree it's a PITA to work with though. I wanted to do it in java but was unable to, so now I have to use python.

@biorpg
Copy link

biorpg commented Mar 22, 2024

The reason is because they started using it before the lz4 standard was finished. So that's why it's non-standard lz4. Lz4 seems like an OK choice, other than their use of it in a non-standard way. It's very fast to compress/decompress. And why not update it to use standard lz4 now?

Yes, a mundane reason does exist, acting as the primary dependency for sustaining the real reason for a questionable practice. In a similarly redundant fashion, the questionable nature of the practice is, in itself, purely a thought exercise because the answer would always be the mundane reason, unless you're a disgruntled employee that still represents your employer, but this isn't a likely candidate for such a scenario to involve, because the majority of the public would stop listening about 5 words past "mozlz4 file format". In short, you are doing Mozilla a service by offering the mundane reason externally.

Now, if you consider that the only way Mozilla generates a positive revenue from Firefox is by making Google the default search engine, and the fact that this would be paid not based upon the simple fact that a Google employee downloads Firefox to confirm that it is the default, but rather by users with a Firefox user agent conducting Google searches, it becomes quite clear that profit is gained by making it difficult to change the default search engine, because this is enough to disuade the majority of users who ever have a moment where they feel like using a different search engine.

Also, they use this non-standard text compression format for this to effectively compress a 10kb file into a 5kb file sitting in a folder alongside 19 JSON files, 1 LZ4 file, 7 txt files, 10 sqlite files, 6 db files, and 2 js files where this file is the one and only mozlz4 file among them.

You can call this circumstantial evidence, but you might want to ask yourself:

  1. Are you a courtroom?
  2. What else are you allowing to slide this easily?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment