You'll want to keep all of your local repos in the same folder. I like to use ~/projects/
for this, but others like ~/workspace/
. Choose a name you like typing.
$ cd ~
$ mkdir projects/
import codecs | |
import math | |
import sklearn.cluster | |
import matplotlib.pyplot as plt | |
x = set() | |
c = 0 | |
path = '/home/amir/Downloads/featuresetsforclustering/ptwiki.features_reverted.20k.tsv' | |
with codecs.open(path, 'r', 'utf-8') as f: | |
for line in f: |
$ python demonstrate_extractor.py | |
Extracting features for http://en.wikipedia.org/wiki/?oldid=626489778&diff=prev | |
<added_badwords_ratio>: 211.95999999999998 | |
<added_misspellings_ratio>: 1.4638121546961327 | |
<badwords_added>: 3 | |
<bytes_changed>: 133 | |
<chars_added>: 145 | |
<day_of_week_in_utc>: 6 | |
<hour_of_day_in_utc>: 15 | |
<is_custom_comment>: True |
#!/usr/bin/python | |
# -*- coding: utf-8 -*- | |
""" | |
@ Autor: [[Usuário:Danilo.mac]] | |
@ Licença: GNU General Public License 3.0 (GPL V3) e Creative Commons Attribution/Share-Alike (CC-BY-SA) | |
Descrição: Script para busca de referencias no dump dos históricos da Wikipédia lusófona. | |
""" |
>>> import revscores | |
>>> dir(revscores) | |
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] | |
>>> from revscores import languages | |
>>> dir(languages) | |
['Language', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'english', 'language', 'portuguese'] | |
Notice that the first "dir()" doesn't list out langauge. This is because language is not imported by default. | |
But when we run dir() on language, we can see "english", "portuguese" and "language". This is because these modules are imported by default. |
$ python | |
Python 3.4.0 (default, Apr 11 2014, 13:05:11) | |
[GCC 4.8.2] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.languages import english | |
>>> english.is_badword("foobar") | |
False | |
>>> english.is_badword("shitty") | |
True |
* edittools[ResourceLoader]|edittools.js|edittools.css|default |
$ python | |
Python 3.4.3 (default, Jul 28 2015, 18:20:59) | |
[GCC 4.8.4] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.languages import portuguese | |
>>> from revscoring.datasources import revision | |
>>> from revscoring.dependencies import solve | |
>>> solve(portuguese.revision.badwords, cache={revision.text: "potential badword"}) | |
0 | |
>>> solve(portuguese.revision.badwords, cache={revision.text: "puta"}) |
requests |