Arpit Bharti - GSoC 2019 Progress Report
Project - Ship the Public Suffix List (PSL) over Remote Settings
Organization - Mozilla
Project Mentor - Mathieu Leplatre
An update to the public suffix list (PSL) is shipped bundled with Firefox releases. The current system creates the DAFSA binary file at build time. The list has to be manually updated with every new release and cannot be updated for people running old versions of Firefox. I will be working to create an update system using the remote-settings API in Firefox to deliver an updated list to the client periodically, as soon as a new version is published (using polling). The list will be published on the Mozilla remote-settings servers and be shipped as a binary. This file format allows for shipping the list quickly over the network as it is orders of magnitude smaller in size an can be parsed much faster than raw text.
For more in depth details, view the Blueprint Document.
Bugzilla - 1083971
Publishing Public Suffix List as a DAFSA binary
This first pull request involved commits to the remote-settings-lambdas repository, I created a new python script, publish_dafsa.py, that is supposed to run periodically to check for any new commits made to the publicsuffix/list repository, particularly just the public_suffix_list.dat file. If we observe any changes, we will download the required resources and compile a new updated dafsa (using the modified script mentioned in the following merge request below). This new file, along with last commit hash for the public_suffix_list.dat file, is then uploaded to the Remote Settings server, to be downloaded by Firefox clients all over the world. This requests also has tests to ensure correct behavior.
Bugzilla - 1083971
Bug 1083971 - Add an option to output a binary file for the PSL data
The resources used for compiling a dafsa binary are pulled directly from mozilla-central, namely prepare_tlds.py and make_dafsa.py, however they could only output a hex array and had no options to output a binary representation of the data. I modified the scripts for the above purpose and ensured compatibility across python versions, since the remote-settings-lambdas uses Python 3.7 while the Firefox build system is still using the now EOL Python 2.7.
Bugzilla - 1083971
Links for prepare_tlds.py & make_dafsa.py updated to pull from mozilla-central
A very small pull request to update the links that will be used to pull prepare_tlds.py and make_dafsa.py from a certain revision in mozilla-central. This could only have been done after the previous patch for generating binary file had landed.
Bugzilla - 1563226
Bug 1563226 - Download the Public Suffix List using Remote Settings
Using the Remote Settings API we download the latest binary file (`dafsa.bin`) from the Remote Settings server and write it to the user profile folder. This happens when the user is idle so we can avoid disk I/O at firefox startup. We also check for updates while firefox is running. In both cases the newly downloaded dafsa binary is written to disk and we send a signal to C++ with object reference to the file, an identifier string for the signal and a handle to the file (although this will be of use only in the following merge request). This patch also has tests for asserting correct behavior, ie. signal is correctly sent in every event and fails loudly. For the purpuses of tests written in this patch and the next patch, at build time we create a fake dafsa binary file, a reference object to this file and full path is sent in the signal for asserting correct behavior.
Bugzilla - 1563246
Bug 1563246 - Reload the Public Suffix List when data is updated in profile folder
Finally the reloading of dafsa is handled in this patch, nsEffectiveTLDService.h is modified to act as a observer for signals sent to the C++ code. After successfully receiving the signal we take the file and memory map it, ie. dump the bytes into memory. This is done considering the multiprocess execution of Firefox. If we memory map it from the parent process the child processes will have access to the data for free and we avoid unneccesary file reading operations that would have taken place in child processes. We also have to consider the safety of data in when mutiple processes can read and modify the data. So we use RWLock to ensure there is a lock on the memory while reading and writing so it doesn't change midway if a another process attempts to access it. We also reset the cache here to ensure new cache is created with the new dafsa data. For tests we use the same fake dafsa binary we created in the previous patch, that binary was built with a much smaller fake suffix list which contained a fake suffix of our choice, in the test we send a signal with reference to this file and assert that our fake suffix is known after reloading the fake dafsa.
- Convert a binary file into a C++ array of hexadecimal numbers, this was a short exercise to get comfortable working with C++ and understand firsthand how the binary will be converted into a hex array, the first few iterations of my last patch were following the approach here but on sugesstion of the other developers I used another method that didn't involve filestream operations.
Issues I opened
- Fix to make prepare_tlds.py run on Python 3, this was minor patch that I needed in order to ensure identical behavior with Python 2 and Python 3, I reported the issue and provided a solution as such the issue was marked as a good first bug and was solved by a friend of mine whom I introduced to firefox development.
- Cannot push binary file with moz-phab This a was bug I encountered with
moz-phab, the utility used for submitting patches to phabricator. The program would freeze when I attempted to push a binary file in my revision. The issues remains unresolved and I found a work around by creating a binary file at build time.