Skip to content

Instantly share code, notes, and snippets.

View he7d3r's full-sized avatar

Helder Geovane Gomes de Lima he7d3r

View GitHub Profile
@Ladsgroup
Ladsgroup / Cluster.py
Created August 27, 2015 18:59
Clustering reverted edits in Wikipedia
import codecs
import math
import sklearn.cluster
import matplotlib.pyplot as plt
x = set()
c = 0
path = '/home/amir/Downloads/featuresetsforclustering/ptwiki.features_reverted.20k.tsv'
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
$ python demonstrate_extractor.py
Extracting features for http://en.wikipedia.org/wiki/?oldid=626489778&diff=prev
<added_badwords_ratio>: 211.95999999999998
<added_misspellings_ratio>: 1.4638121546961327
<badwords_added>: 3
<bytes_changed>: 133
<chars_added>: 145
<day_of_week_in_utc>: 6
<hour_of_day_in_utc>: 15
<is_custom_comment>: True
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
@ Autor: [[Usuário:Danilo.mac]]
@ Licença: GNU General Public License 3.0 (GPL V3) e Creative Commons Attribution/Share-Alike (CC-BY-SA)
Descrição: Script para busca de referencias no dump dos históricos da Wikipédia lusófona.
"""
>>> import revscores
>>> dir(revscores)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>> from revscores import languages
>>> dir(languages)
['Language', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'english', 'language', 'portuguese']
Notice that the first "dir()" doesn't list out langauge. This is because language is not imported by default.
But when we run dir() on language, we can see "english", "portuguese" and "language". This is because these modules are imported by default.
$ python
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from revscoring.languages import english
>>> english.is_badword("foobar")
False
>>> english.is_badword("shitty")
True
@yuvipanda
yuvipanda / Gadget-definitions
Created September 19, 2012 16:51
Gadgetization of edittools
* edittools[ResourceLoader]|edittools.js|edittools.css|default
$ python
Python 3.4.3 (default, Jul 28 2015, 18:20:59)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from revscoring.languages import portuguese
>>> from revscoring.datasources import revision
>>> from revscoring.dependencies import solve
>>> solve(portuguese.revision.badwords, cache={revision.text: "potential badword"})
0
>>> solve(portuguese.revision.badwords, cache={revision.text: "puta"})

Step 1: Make a project directory

You'll want to keep all of your local repos in the same folder. I like to use ~/projects/ for this, but others like ~/workspace/. Choose a name you like typing.

$ cd ~
$ mkdir projects/

Step 2: Get the repos

@turicas
turicas / requirements.txt
Created November 10, 2013 00:50
L² Hackathon WikiMedia
requests
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.