Created
December 29, 2018 18:25
-
-
Save snoack/6ba2ab7dec6d40bde47f423988cb271d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- README.md 2018-12-29 12:54:04.223316431 -0500 | |
+++ README.rst 2018-12-29 13:14:12.985081423 -0500 | |
@@ -1,5 +1,5 @@ | |
- | |
-# python-abp | |
+python-abp | |
+========== | |
This repository contains a library for working with Adblock Plus filter lists, | |
a script for rendering diffs between filter lists, and the script that is used | |
@@ -7,18 +7,10 @@ | |
into the format suitable for consumption by the adblocking software (aka | |
rendering). | |
-## Table of Contents | |
- | |
-- [Installation](#installation) | |
-- [Rendering of filter lists](#rendering) | |
-- [Generating diffs](#diffs) | |
-- [Library API](#library) | |
-- [Testing](#testing) | |
-- [Development](#development) | |
-- [Using the library with R](#r) | |
+.. contents:: | |
-<a id="installation"></a> | |
-## Installation | |
+Installation | |
+------------ | |
Prerequisites: | |
@@ -26,58 +18,64 @@ | |
* Python (2.7 or 3.5+), | |
* pip. | |
-To install: | |
+To install:: | |
$ pip install --upgrade python-abp | |
-<a id="rendering"></a> | |
-## Rendering of filter lists | |
+ | |
+Rendering of filter lists | |
+------------------------- | |
The filter lists are originally authored in relatively smaller parts focused | |
on particular types of filters, related to a specific topic or relevant for a | |
particular geographical area. | |
-We call these parts _filter list fragments_ (or just _fragments_) to | |
+We call these parts *filter list fragments* (or just *fragments*) to | |
distinguish them from full filter lists that are consumed by the adblocking | |
software such as Adblock Plus. | |
Rendering is a process that combines filter list fragments into a filter list. | |
It starts with one fragment that can include other ones and so forth. | |
-The produced filter list is marked with a [version and a timestamp][1]. | |
+The produced filter list is marked with a `version and a timestamp <https://adblockplus.org/filters#special-comments>`_. | |
-Python-abp contains a script that can do this called `flrender`: | |
+Python-abp contains a script that can do this called ``flrender``:: | |
$ flrender fragment.txt filterlist.txt | |
-This will take the top level fragment in `fragment.txt`, render it and save it | |
-into `filterlist.txt`. | |
-The `flrender` script can also be used by only specifying `fragment.txt`: | |
- | |
+This will take the top level fragment in ``fragment.txt``, render it and save it | |
+into ``filterlist.txt``. | |
+ | |
+The ``flrender`` script can also be used by only specifying ``fragment.txt``:: | |
+ | |
$ flrender fragment.txt | |
- | |
-in which case the rendering result will be sent to `stdout`. Moreover, when | |
-it's run with no positional arguments: | |
+ | |
+ | |
+in which case the rendering result will be sent to ``stdout``. Moreover, when | |
+it's run with no positional arguments:: | |
$ flrender | |
-it will read from `stdin` and send the results to `stdout`. | |
+ | |
+it will read from ``stdin`` and send the results to ``stdout``. | |
Fragments might reference other fragments that should be included into them. | |
-The references come in two forms: http(s) includes and local includes: | |
+The references come in two forms: http(s) includes and local includes:: | |
%include http://www.server.org/dir/list.txt% | |
%include easylist:easylist/easylist_general_block.txt% | |
+ | |
The http include contains a URL that will be fetched and inserted at the point | |
of reference. | |
The local include contains a path inside the easylist repository. | |
-`flrender` needs to be able to find a copy of the repository on the local | |
-filesystem. We use `-i` option to point it to to the right directory: | |
+``flrender`` needs to be able to find a copy of the repository on the local | |
+filesystem. We use ``-i`` option to point it to to the right directory:: | |
$ flrender -i easylist=/home/abc/easylist input.txt output.txt | |
+ | |
Now the local include referenced above will be resolved to: | |
-`/home/abc/easylist/easylist/easylist_general_block.txt` | |
+``/home/abc/easylist/easylist/easylist_general_block.txt`` | |
and the fragment will be loaded from this file. | |
Directories that contain filter list fragments that are used during rendering | |
@@ -86,22 +84,23 @@ | |
fragments. | |
Each source is identified by a name: that's the part that comes before ":" in | |
the include instruction and it should be the same as what comes before "=" in | |
-the `-i` option. | |
+the ``-i`` option. | |
Commonly used sources have generally accepted names. For example the main | |
-EasyList repository is referred to as `easylist`. | |
+EasyList repository is referred to as ``easylist``. | |
If you don't know all the source names that are needed to render some list, | |
-just run `flrender` and it will report what it's missing: | |
+just run ``flrender`` and it will report what it's missing:: | |
$ flrender easylist.txt output/easylist.txt | |
Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener | |
al_block.txt' from 'easylist.txt' | |
-You can clone the necessary repositories to a local directory and add `-i` | |
+ | |
+You can clone the necessary repositories to a local directory and add ``-i`` | |
options accordingly. | |
-<a id="diffs"></a> | |
-## Generating diffs | |
+Generating diffs | |
+---------------- | |
A diff allows a client running ad blocking software such as Adblock Plus to | |
update the filter lists incrementally, instead of downloading a new copy of a | |
@@ -110,41 +109,46 @@ | |
consumption, etc.), allowing clients to update their lists more frequently | |
using less resources. | |
-python-abp contains a script called `fldiff` that will find the diff between | |
-the latest filter list, and any number of previous filter lists: | |
+python-abp contains a script called ``fldiff`` that will find the diff between | |
+the latest filter list, and any number of previous filter lists:: | |
$ fldiff -o diffs/easylist/ easylist.txt archive/* | |
-where `-o diffs/easylist/` is the (optional) output directory where the diffs | |
-should be written, `easylist.txt` is the most recent version of the filter | |
-list, and `archive/*` is the directory where all the archived filter lists are. | |
-When called like this, the shell should automatically expand the `archive/*` | |
+ | |
+where ``-o diffs/easylist/`` is the (optional) output directory where the diffs | |
+should be written, ``easylist.txt`` is the most recent version of the filter | |
+list, and ``archive/*`` is the directory where all the archived filter lists are. | |
+When called like this, the shell should automatically expand the ``archive/*`` | |
directory, giving the script each of the filenames separately. | |
-In the above example, the output of each archived `list[version].txt` will be | |
-written to `diffs/diff[version].txt`. If the output argument is omitted, the | |
+In the above example, the output of each archived ``list[version].txt`` will be | |
+written to ``diffs/diff[version].txt``. If the output argument is omitted, the | |
diffs will be written to the current directory. | |
-The script produces three types of lines, as specified in the [technical | |
-specification][5]: | |
+The script produces three types of lines, as specified in the `technical | |
+specification <https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72sbsSgQ/>`_: | |
+ | |
-* Special comments of the form `! <name>:[ <value>]` | |
-* Added filters of the form `+ <filter-text>` | |
-* Removed filters of the form `- <filter-text>` | |
+* Special comments of the form ``! <name>:[ <value>]`` | |
+* Added filters of the form ``+ <filter-text>`` | |
+* Removed filters of the form ``- <filter-text>`` | |
-<a id="library"></a> | |
-## Library API | |
+Library API | |
+----------- | |
python-abp can also be used as a library for parsing filter lists. For example | |
to read a filter list (we use Python 3 syntax here but the API is the same): | |
+.. code-block:: python | |
+ | |
from abp.filters import parse_filterlist | |
with open('filterlist.txt') as filterlist: | |
for line in parse_filterlist(filterlist): | |
print(line) | |
-If `filterlist.txt` contains this filter list: | |
+ | |
+If ``filterlist.txt`` contains this filter list:: | |
[Adblock Plus 2.0] | |
! Title: Example list | |
@@ -153,8 +157,11 @@ | |
abc.com/ad$image | |
@@/abc\.com/ | |
+ | |
the output will look something like: | |
+.. code-block:: python | |
+ | |
Header(version='Adblock Plus 2.0') | |
Metadata(key='Title', value='Example list') | |
EmptyLine() | |
@@ -162,69 +169,70 @@ | |
Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)]) | |
Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[]) | |
-The `abp.filters` module also exports a lower-level function for parsing | |
-individual lines of a filter list: `parse_line`. It returns a parsed line | |
-object just like the items in the iterator returned by `parse_filterlist`. | |
-For further information on the library API use `help()` on `abp.filters` and | |
+The ``abp.filters`` module also exports a lower-level function for parsing | |
+individual lines of a filter list: ``parse_line``. It returns a parsed line | |
+object just like the items in the iterator returned by ``parse_filterlist``. | |
+ | |
+For further information on the library API use ``help()`` on ``abp.filters`` and | |
its contents in an interactive Python session, read the docstrings, or look at | |
the tests for some usage examples. | |
-<a id="testing"></a> | |
-## Testing | |
+Testing | |
+------- | |
-Unit tests for `python-abp` are located in the `/tests` directory. [Pytest][2] | |
-is used for quickly running the tests during development. [Tox][3] is used for | |
+Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest <http://pytest.org/>`_ | |
+is used for quickly running the tests during development. `Tox <https://tox.readthedocs.org/>`_ is used for | |
testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code | |
quality reporting. | |
In order to execute the tests, first create and activate a development | |
-virtualenv: | |
+virtualenv:: | |
$ python setup.py devenv | |
$ . devenv/bin/activate | |
-With the development virtualenv activated use pytest for a quick test run: | |
+ | |
+With the development virtualenv activated use pytest for a quick test run:: | |
(devenv) $ pytest tests | |
-and tox for a comprehensive report: | |
+ | |
+and tox for a comprehensive report:: | |
(devenv) $ tox | |
-<a id="development"></a> | |
-## Development | |
+ | |
+Development | |
+----------- | |
When adding new functionality, add tests for it (preferably first). If some | |
code will never be reached on a certain version of Python, it may be exempted | |
-from coverage tests by adding a comment, e.g. `# pragma: no py2 cover`. | |
+from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``. | |
All public functions, classes and methods should have docstrings compliant with | |
-[NumPy/SciPy documentation guide][4]. One exception is the constructors of | |
+`NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_. One exception is the constructors of | |
classes that the user is not expected to instantiate (such as exceptions). | |
-<a id="r"></a> | |
-## Using the library with R | |
+Using the library with R | |
+------------------------ | |
Clone the repo to you local machine. Then create a virtualenv and install | |
-python abp there: | |
+python abp there:: | |
+ | |
+ $ cd python-abp | |
+ $ virtualenv env | |
+ $ pip install --upgrade . | |
- $ cd python-abp | |
- $ virtualenv env | |
- $ pip install --upgrade . | |
-Then import it with `reticulate` in R: | |
+Then import it with ``reticulate`` in R: | |
- > library(reticulate) | |
- > use_virtualenv("~/python-abp/env", required=TRUE) | |
- > abp <- import("abp.filters.rpy") | |
+.. code-block:: R | |
-Now you can use the functions with `abp$functionname`, e.g. | |
-`abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")` | |
+ > library(reticulate) | |
+ > use_virtualenv("~/python-abp/env", required=TRUE) | |
+ > abp <- import("abp.filters.rpy") | |
- [1]: https://adblockplus.org/filters#special-comments | |
- [2]: http://pytest.org/ | |
- [3]: https://tox.readthedocs.org/ | |
- [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt | |
- [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72sbsSgQ/ | |
+Now you can use the functions with ``abp$functionname``, e.g. | |
+``abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment