Skip to content

Instantly share code, notes, and snippets.

@snoack
Created December 29, 2018 18:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save snoack/6ba2ab7dec6d40bde47f423988cb271d to your computer and use it in GitHub Desktop.
Save snoack/6ba2ab7dec6d40bde47f423988cb271d to your computer and use it in GitHub Desktop.
--- README.md 2018-12-29 12:54:04.223316431 -0500
+++ README.rst 2018-12-29 13:14:12.985081423 -0500
@@ -1,5 +1,5 @@
-
-# python-abp
+python-abp
+==========
This repository contains a library for working with Adblock Plus filter lists,
a script for rendering diffs between filter lists, and the script that is used
@@ -7,18 +7,10 @@
into the format suitable for consumption by the adblocking software (aka
rendering).
-## Table of Contents
-
-- [Installation](#installation)
-- [Rendering of filter lists](#rendering)
-- [Generating diffs](#diffs)
-- [Library API](#library)
-- [Testing](#testing)
-- [Development](#development)
-- [Using the library with R](#r)
+.. contents::
-<a id="installation"></a>
-## Installation
+Installation
+------------
Prerequisites:
@@ -26,58 +18,64 @@
* Python (2.7 or 3.5+),
* pip.
-To install:
+To install::
$ pip install --upgrade python-abp
-<a id="rendering"></a>
-## Rendering of filter lists
+
+Rendering of filter lists
+-------------------------
The filter lists are originally authored in relatively smaller parts focused
on particular types of filters, related to a specific topic or relevant for a
particular geographical area.
-We call these parts _filter list fragments_ (or just _fragments_) to
+We call these parts *filter list fragments* (or just *fragments*) to
distinguish them from full filter lists that are consumed by the adblocking
software such as Adblock Plus.
Rendering is a process that combines filter list fragments into a filter list.
It starts with one fragment that can include other ones and so forth.
-The produced filter list is marked with a [version and a timestamp][1].
+The produced filter list is marked with a `version and a timestamp <https://adblockplus.org/filters#special-comments>`_.
-Python-abp contains a script that can do this called `flrender`:
+Python-abp contains a script that can do this called ``flrender``::
$ flrender fragment.txt filterlist.txt
-This will take the top level fragment in `fragment.txt`, render it and save it
-into `filterlist.txt`.
-The `flrender` script can also be used by only specifying `fragment.txt`:
-
+This will take the top level fragment in ``fragment.txt``, render it and save it
+into ``filterlist.txt``.
+
+The ``flrender`` script can also be used by only specifying ``fragment.txt``::
+
$ flrender fragment.txt
-
-in which case the rendering result will be sent to `stdout`. Moreover, when
-it's run with no positional arguments:
+
+
+in which case the rendering result will be sent to ``stdout``. Moreover, when
+it's run with no positional arguments::
$ flrender
-it will read from `stdin` and send the results to `stdout`.
+
+it will read from ``stdin`` and send the results to ``stdout``.
Fragments might reference other fragments that should be included into them.
-The references come in two forms: http(s) includes and local includes:
+The references come in two forms: http(s) includes and local includes::
%include http://www.server.org/dir/list.txt%
%include easylist:easylist/easylist_general_block.txt%
+
The http include contains a URL that will be fetched and inserted at the point
of reference.
The local include contains a path inside the easylist repository.
-`flrender` needs to be able to find a copy of the repository on the local
-filesystem. We use `-i` option to point it to to the right directory:
+``flrender`` needs to be able to find a copy of the repository on the local
+filesystem. We use ``-i`` option to point it to to the right directory::
$ flrender -i easylist=/home/abc/easylist input.txt output.txt
+
Now the local include referenced above will be resolved to:
-`/home/abc/easylist/easylist/easylist_general_block.txt`
+``/home/abc/easylist/easylist/easylist_general_block.txt``
and the fragment will be loaded from this file.
Directories that contain filter list fragments that are used during rendering
@@ -86,22 +84,23 @@
fragments.
Each source is identified by a name: that's the part that comes before ":" in
the include instruction and it should be the same as what comes before "=" in
-the `-i` option.
+the ``-i`` option.
Commonly used sources have generally accepted names. For example the main
-EasyList repository is referred to as `easylist`.
+EasyList repository is referred to as ``easylist``.
If you don't know all the source names that are needed to render some list,
-just run `flrender` and it will report what it's missing:
+just run ``flrender`` and it will report what it's missing::
$ flrender easylist.txt output/easylist.txt
Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
al_block.txt' from 'easylist.txt'
-You can clone the necessary repositories to a local directory and add `-i`
+
+You can clone the necessary repositories to a local directory and add ``-i``
options accordingly.
-<a id="diffs"></a>
-## Generating diffs
+Generating diffs
+----------------
A diff allows a client running ad blocking software such as Adblock Plus to
update the filter lists incrementally, instead of downloading a new copy of a
@@ -110,41 +109,46 @@
consumption, etc.), allowing clients to update their lists more frequently
using less resources.
-python-abp contains a script called `fldiff` that will find the diff between
-the latest filter list, and any number of previous filter lists:
+python-abp contains a script called ``fldiff`` that will find the diff between
+the latest filter list, and any number of previous filter lists::
$ fldiff -o diffs/easylist/ easylist.txt archive/*
-where `-o diffs/easylist/` is the (optional) output directory where the diffs
-should be written, `easylist.txt` is the most recent version of the filter
-list, and `archive/*` is the directory where all the archived filter lists are.
-When called like this, the shell should automatically expand the `archive/*`
+
+where ``-o diffs/easylist/`` is the (optional) output directory where the diffs
+should be written, ``easylist.txt`` is the most recent version of the filter
+list, and ``archive/*`` is the directory where all the archived filter lists are.
+When called like this, the shell should automatically expand the ``archive/*``
directory, giving the script each of the filenames separately.
-In the above example, the output of each archived `list[version].txt` will be
-written to `diffs/diff[version].txt`. If the output argument is omitted, the
+In the above example, the output of each archived ``list[version].txt`` will be
+written to ``diffs/diff[version].txt``. If the output argument is omitted, the
diffs will be written to the current directory.
-The script produces three types of lines, as specified in the [technical
-specification][5]:
+The script produces three types of lines, as specified in the `technical
+specification <https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72sbsSgQ/>`_:
+
-* Special comments of the form `! <name>:[ <value>]`
-* Added filters of the form `+ <filter-text>`
-* Removed filters of the form `- <filter-text>`
+* Special comments of the form ``! <name>:[ <value>]``
+* Added filters of the form ``+ <filter-text>``
+* Removed filters of the form ``- <filter-text>``
-<a id="library"></a>
-## Library API
+Library API
+-----------
python-abp can also be used as a library for parsing filter lists. For example
to read a filter list (we use Python 3 syntax here but the API is the same):
+.. code-block:: python
+
from abp.filters import parse_filterlist
with open('filterlist.txt') as filterlist:
for line in parse_filterlist(filterlist):
print(line)
-If `filterlist.txt` contains this filter list:
+
+If ``filterlist.txt`` contains this filter list::
[Adblock Plus 2.0]
! Title: Example list
@@ -153,8 +157,11 @@
abc.com/ad$image
@@/abc\.com/
+
the output will look something like:
+.. code-block:: python
+
Header(version='Adblock Plus 2.0')
Metadata(key='Title', value='Example list')
EmptyLine()
@@ -162,69 +169,70 @@
Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)])
Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[])
-The `abp.filters` module also exports a lower-level function for parsing
-individual lines of a filter list: `parse_line`. It returns a parsed line
-object just like the items in the iterator returned by `parse_filterlist`.
-For further information on the library API use `help()` on `abp.filters` and
+The ``abp.filters`` module also exports a lower-level function for parsing
+individual lines of a filter list: ``parse_line``. It returns a parsed line
+object just like the items in the iterator returned by ``parse_filterlist``.
+
+For further information on the library API use ``help()`` on ``abp.filters`` and
its contents in an interactive Python session, read the docstrings, or look at
the tests for some usage examples.
-<a id="testing"></a>
-## Testing
+Testing
+-------
-Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2]
-is used for quickly running the tests during development. [Tox][3] is used for
+Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest <http://pytest.org/>`_
+is used for quickly running the tests during development. `Tox <https://tox.readthedocs.org/>`_ is used for
testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
quality reporting.
In order to execute the tests, first create and activate a development
-virtualenv:
+virtualenv::
$ python setup.py devenv
$ . devenv/bin/activate
-With the development virtualenv activated use pytest for a quick test run:
+
+With the development virtualenv activated use pytest for a quick test run::
(devenv) $ pytest tests
-and tox for a comprehensive report:
+
+and tox for a comprehensive report::
(devenv) $ tox
-<a id="development"></a>
-## Development
+
+Development
+-----------
When adding new functionality, add tests for it (preferably first). If some
code will never be reached on a certain version of Python, it may be exempted
-from coverage tests by adding a comment, e.g. `# pragma: no py2 cover`.
+from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``.
All public functions, classes and methods should have docstrings compliant with
-[NumPy/SciPy documentation guide][4]. One exception is the constructors of
+`NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_. One exception is the constructors of
classes that the user is not expected to instantiate (such as exceptions).
-<a id="r"></a>
-## Using the library with R
+Using the library with R
+------------------------
Clone the repo to you local machine. Then create a virtualenv and install
-python abp there:
+python abp there::
+
+ $ cd python-abp
+ $ virtualenv env
+ $ pip install --upgrade .
- $ cd python-abp
- $ virtualenv env
- $ pip install --upgrade .
-Then import it with `reticulate` in R:
+Then import it with ``reticulate`` in R:
- > library(reticulate)
- > use_virtualenv("~/python-abp/env", required=TRUE)
- > abp <- import("abp.filters.rpy")
+.. code-block:: R
-Now you can use the functions with `abp$functionname`, e.g.
-`abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")`
+ > library(reticulate)
+ > use_virtualenv("~/python-abp/env", required=TRUE)
+ > abp <- import("abp.filters.rpy")
- [1]: https://adblockplus.org/filters#special-comments
- [2]: http://pytest.org/
- [3]: https://tox.readthedocs.org/
- [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
- [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72sbsSgQ/
+Now you can use the functions with ``abp$functionname``, e.g.
+``abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment