benchonaut/README.md

## README.md

      
    Raw
  

              README.md
            
          
    FckUfox ( Firefox ) session jsonlz4 recovery from sessionstore-backup

note: Firefox will be subsequently called FckUfox in this doc
edit 2022: TRY TO AVOID FCKUFOX AS MUCH AS POSSIBLE ; FCKUFOX WILL waste


your SSD/HDD https://www.servethehome.com/firefox-is-eating-your-ssd-here-is-how-to-fix-it/
your privacy since it calls ~50 domains UNASKED
your nerves, since ~ FckUFx v100 it silenty eats your about:sessionrestore when killed with e.g. killall -9 fckufox

if any firefox accountable person ever reads this:

congrats , over the years a widely accepted
open sourced product became a complete nightmare ,

forcing people to use AppImages of firefox rip-offs that are rarely updated ,
did not even whitelist i2p and onion domains
( so everybody is googling this domains until they set  browser.fixup.domainwhitelist.i2p  browser.fixup.domainwhitelist.onion network.dns.blockDotOnion in about:config )
has default telemetry
does not even ask if it is okay to pull data from up to 50 Firefox-internal domains
( etc etc)


session-storage is robust , but not too much and the resulting jsonlz4 might be nested ( leaving about:sessionrestore open in another restored session)
yielding monster jsonlz4 ( 80 Mbyte jsonlz4 is ~ 300Mbyte+ ) ..
since firefox sometimes refuses to eat this files ( either through the method of enabling/disabling "restore previous session" in settings and putting the file in PROFILE/previous.jsonlz4 or through killing a running instance and placing the file in PROFILE/sessionstore-backups/recovery.[jsonlz4|baklz4] )
the method is to get all urls deduplicated from that json.
the only status you might see is pv ( how many uncompressed Mbytes flow per input file )
#### ATTENTION: machine load ahead , keep 1Gbyte+ RAM free for jsonlz4 sizes over 50Mbyte)

HowTo


kill firefox
find the PROFILE folder (e.g. on linux: ~.mozilla/firefox/b3efc4fe.default )
install: pip python pv jq awk
get https://github.com/russellballestrini/nested-lookup/ somehow
copy the PROFILE/sessionstore-backup folder content into another one
go to that folder (e.g. cd sessionstore-backup-extraction)
softlink (ln -s /where/this/repo/is/*.py ./) the python files

for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]'  ;done |awk '!x[$0]++'  |grep -v '^""$' |grep -v ^$  > sessionsave-urls.txt


all  your urls are now in the sessionsave-urls and (also about: and data:image and some strange things like \000 )


you might import it with URLs List https://addons.mozilla.org/en-US/firefox/addon/urls-list/ , Tab-List https://addons.mozilla.org/de/firefox/addon/tab-list/ and sessionbuddy(chrome)


you might filter it with stuff like


cat sessionsave-urls.txt |grep -e file: -e ftp -e http 


cat sessionsave-urls.txt|grep -v -e '^"data:image' -e '^"about:preferenc'  -e '^"about:newtab'  -e '^"about:sesttings' -e '^"about:addons' -e '^"about:logins' -e http://192.168 -e https://192.168


still too much ? → use tagging


mkdir tags ;for tag in graylog ;do grep $tag sessionsave-urls.txt.filtered.txt > tags/$tag;grep -v $tag sessionsave-urls.txt.filtered.txt >> sessionsave-urls.txt.filtered.txt.tmp;cat sessionsave-urls.txt.filtered.txt.tmp>sessionsave-urls.txt.filtered.txt;rm sessionsave-urls.txt.filtered.txt.tmp;done


of course there is a forensic approach as well ...


what exactly is that alien line doing ?

##↓ process all jsonlz4 files↓ #####  ↓ uncompress it ↓↓ ## status (pv)##↓↓flatten json/get url↓↓#↓↓1 per line↓#####↓↓deduplicate↓↓### ↓↓ no empty lines ↓↓###### ↓↓ save result ##
for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]'  ;done |awk '!x[$0]++'  |grep -v '^""$' |grep -v ^$  > sessionsave-urls.txt

END


first variant/playground/testing space:
sessionCheckpoints.json
## extract the inputs
for dest in *jsonlz4 *baklz4 ;do python unlz4.py -d < $dest >$dest.jsontmp ;done;


##then we build ONE large file containing SINGLE LINE ENTRIES
#for file in *.jsontmp;do cat $file |jq -c '.[]' ;done |awk '!x[$0]++'  > deduped.json
## remove tmp files
#rm *.jsontmp

## beware of jq/python json max depth , the following lines do not recover all data
#for lines in $(seq 1 $(cat deduped.json|wc -l ));do 
#	head -n${lines} deduped.json |tail -n1|sed 's/<stripped: exceeds max depth>/"<stripped: exceeds max depth>"/g' > oneline$lines;
#   done

#grep '"tabs":' oneline* -l |while read tabfile;do  cat $tabfile |jq .  > pretty.${tabfile}.json ;done


## mozlz4.py
#!/usr/bin/env python
from sys import stdin, stdout, argv, stderr
import os
try:
    import lz4.block as lz4
except ImportError:
    import lz4

stdin = os.fdopen(stdin.fileno(), 'rb')
stdout = os.fdopen(stdout.fileno(), 'wb')

if argv[1:] == ['-c']:
    stdout.write(b'mozLz40\0' + lz4.compress(stdin.read()))
elif argv[1:] == ['-d']:
    assert stdin.read(8) == b'mozLz40\0'
    stdout.write(lz4.decompress(stdin.read()))
else:
    stderr.write('Usage: %s -c|-d < infile > outfile\n' % argv[0])
    stderr.write('Compress or decompress Mozilla-flavor LZ4 files.\n\n')
    stderr.write('Examples:\n')
    stderr.write('\t%s -d < infile.json.mozlz4 > outfile.json\n' % argv[0])
    stderr.write('\t%s -c < infile.json > outfile.json.mozlz4\n' % argv[0])
    exit(1)

## unnest-firefox-json.py
from sys import stdin, stdout, argv, stderr
import os
import json
from nested_lookup import nested_lookup
##sourced here https://github.com/russellballestrini/nested-lookup/
#import pprint


stdin = os.fdopen(stdin.fileno(), 'rb')
f=stdin.read()
#with open('json_sample.txt', 'r') as f:

data = json.loads(f)
results = nested_lookup(
key = 'url',
document = data,
wild = True
)
results= list( dict.fromkeys(results) )

print json.dumps(results)
	#!/usr/bin/env python
	from sys import stdin, stdout, argv, stderr
	import os
	try:
	import lz4.block as lz4
	except ImportError:
	import lz4

	stdin = os.fdopen(stdin.fileno(), 'rb')
	stdout = os.fdopen(stdout.fileno(), 'wb')

	if argv[1:] == ['-c']:
	stdout.write(b'mozLz40\0' + lz4.compress(stdin.read()))
	elif argv[1:] == ['-d']:
	assert stdin.read(8) == b'mozLz40\0'
	stdout.write(lz4.decompress(stdin.read()))
	else:
	stderr.write('Usage: %s -c\|-d < infile > outfile\n' % argv[0])
	stderr.write('Compress or decompress Mozilla-flavor LZ4 files.\n\n')
	stderr.write('Examples:\n')
	stderr.write('\t%s -d < infile.json.mozlz4 > outfile.json\n' % argv[0])
	stderr.write('\t%s -c < infile.json > outfile.json.mozlz4\n' % argv[0])
	exit(1)
	from sys import stdin, stdout, argv, stderr
	import os
	import json
	from nested_lookup import nested_lookup
	##sourced here https://github.com/russellballestrini/nested-lookup/
	#import pprint


	stdin = os.fdopen(stdin.fileno(), 'rb')
	f=stdin.read()
	#with open('json_sample.txt', 'r') as f:

	data = json.loads(f)
	results = nested_lookup(
	key = 'url',
	document = data,
	wild = True
	)
	results= list( dict.fromkeys(results) )

	print json.dumps(results)