Skip to content

Instantly share code, notes, and snippets.


Helder Geovane Gomes de Lima he7d3r

View GitHub Profile
he7d3r /
Last active May 22, 2020
Comparison of articlequality models for ptwiki, depending on the dataset used

Copied from

Full period; Bots included

Model Information: - type: GradientBoosting - version: 0.8.0 - params: {'min_samples_split': 2, 'label_weights': None, 'max_depth': 7, 'min_impurity_split': None, 'learning_rate': 0.01, 'verbose': 0, 'max_features': 'log2', 'center': True, 'subsample': 1.0, 'n_estimators': 300, 'warm_start': False, 'multilabel': False, 'min_samples_leaf': 1, 'labels': ['1', '2', '3', '4', '5', '6'], 'scale': True, 'presort': 'auto', 'population_rates': None, 'loss': 'deviance', 'random_state': None, 'max_leaf_nodes': None, 'init': None, 'n_iter_no_change': None, 'criterion': 'friedman_mse', 'min_impurity_decrease': 0.0, 'validation_fraction': 0.1, 'tol': 0.0001, 'min_weight_fraction_leaf': 0.0} Environment: - revscoring_version: '2.6.9' - platform: 'Linux-4.9.0-11-amd64-x86_64-with-debian-9.12' - machine: 'x86_64'

he7d3r / ptwiki.labelings.20200301.json.user.ipynb
Created May 10, 2020
Quality assessments by bots by year on ptwiki
View ptwiki.labelings.20200301.json.user.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
he7d3r / ptwiki.labelings.20200301.json.ipynb
Created May 10, 2020
Evolution of the assessments extracted from ptwiki
View ptwiki.labelings.20200301.json.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
he7d3r /
Created May 9, 2020
Compare articlequality datasets before and after proposed changes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Before and after
file_names = ['ptwiki.labelings.20200301.json',
sets = []
for file_name in file_names:
he7d3r /
Last active Dec 26, 2019
Generate statistics from Moodle logs


  1. Open the Moodle course of interest and go to Administration> Course Administration > Reports > Logs.
  2. Click on "Get these logs"
  3. Download table data as comma separated values (e.g. input1.csv). The script will use this as one of its input files.
  4. Create a file "videos.csv", with a column title (containing the titles that are present in the logs) and a column length, with the length (in minutes) of each video.
  5. Run the script passing the names of the csv files used as input and output:
$ python --logs input1.csv --videos videos.csv --stats output.csv --aggregated output_agg.csv
  1. Check out the resulting two files:
he7d3r /
Last active Aug 20, 2018
Fix typos in .tex files
# Copyright © 2018 He7d3r <>
import argparse
import re
import fileinput
from pathlib import Path
re_rule = re.compile("<(?:Typo)?\s+(?:word=\"(.*?)\"\s+)?find=\"(.*?)\"\s+replace=\"(.*?)\"\s*\/?>")
def fix_typos(typos, filename):
#!/usr/bin/perl -w
# Code : Dake
use strict;
use Parse::MediaWikiDump;
use utf8;
my $dump = shift(@ARGV) or die "Please specify a dump file";
my $pages = Parse::MediaWikiDump::Pages->new($dump);
my $page;
View wmgrep.js
// Based on
* This script provides an extra Special-page action called "WMGrep" which
* allows searching over all WMF wikis.
* After enabling the script, the tool is accessible from [[Special:BlankPage/wmgrep]].
* @source
* @revision 4 (2014-10-08)
* @stats [[File:Krinkle_Global_SUL.js]]
he7d3r / informal-words.txt
Last active Aug 29, 2015
Bad words of Wikipedia (ptwiki)
View informal-words.txt
# The raw original list is at
# Caveats:
he7d3r /
Created Jan 31, 2015
Prints a graph in graphviz syntax showing the dependencies between features and data sources of revscoring
from revscoring.features import *
from revscoring.datasources import *
features = [added_badwords_ratio, added_misspellings_ratio, badwords_added,
bytes_changed, chars_added, day_of_week_in_utc, hour_of_day_in_utc,
is_content_namespace, is_custom_comment, is_mainspace,
is_previous_user_same, is_section_comment, longest_repeated_char_added,
longest_token_added, markup_chars_added, misspellings_added,
numeric_chars_added, page_age_in_seconds, prev_badwords,
prev_misspellings, prev_words, proportion_of_badwords_added,
proportion_of_markup_added, proportion_of_misspellings_added,