Skip to content

Instantly share code, notes, and snippets.

Avatar

Helder Geovane Gomes de Lima he7d3r

View GitHub Profile
@he7d3r
he7d3r / ptwiki.wp10-v1-full-period-bots-included.md
Last active May 22, 2020
Comparison of articlequality models for ptwiki, depending on the dataset used
View ptwiki.wp10-v1-full-period-bots-included.md

Copied from https://github.com/wikimedia/articlequality/blob/5a7b0054af8f018d5569acc35648fc0293d0ed40/model_info/ptwiki.wp10.md

Full period; Bots included

Model Information: - type: GradientBoosting - version: 0.8.0 - params: {'min_samples_split': 2, 'label_weights': None, 'max_depth': 7, 'min_impurity_split': None, 'learning_rate': 0.01, 'verbose': 0, 'max_features': 'log2', 'center': True, 'subsample': 1.0, 'n_estimators': 300, 'warm_start': False, 'multilabel': False, 'min_samples_leaf': 1, 'labels': ['1', '2', '3', '4', '5', '6'], 'scale': True, 'presort': 'auto', 'population_rates': None, 'loss': 'deviance', 'random_state': None, 'max_leaf_nodes': None, 'init': None, 'n_iter_no_change': None, 'criterion': 'friedman_mse', 'min_impurity_decrease': 0.0, 'validation_fraction': 0.1, 'tol': 0.0001, 'min_weight_fraction_leaf': 0.0} Environment: - revscoring_version: '2.6.9' - platform: 'Linux-4.9.0-11-amd64-x86_64-with-debian-9.12' - machine: 'x86_64'

@he7d3r
he7d3r / ptwiki.labelings.20200301.json.user.ipynb
Created May 10, 2020
Quality assessments by bots by year on ptwiki
View ptwiki.labelings.20200301.json.user.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@he7d3r
he7d3r / ptwiki.labelings.20200301.json.ipynb
Created May 10, 2020
Evolution of the assessments extracted from ptwiki
View ptwiki.labelings.20200301.json.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@he7d3r
he7d3r / compare-datasets.py
Created May 9, 2020
Compare articlequality datasets before and after proposed changes
View compare-datasets.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Before and after https://github.com/wikimedia/articlequality/pull/127
file_names = ['ptwiki.labelings.20200301.json',
'ptwiki.labelings.20200301.extractor-and-reverts.json']
sets = []
for file_name in file_names:
@he7d3r
he7d3r / README.md
Last active Dec 26, 2019
Generate statistics from Moodle logs
View README.md

Usage

  1. Open the Moodle course of interest and go to Administration> Course Administration > Reports > Logs.
  2. Click on "Get these logs"
  3. Download table data as comma separated values (e.g. input1.csv). The script will use this as one of its input files.
  4. Create a file "videos.csv", with a column title (containing the titles that are present in the logs) and a column length, with the length (in minutes) of each video.
  5. Run the script passing the names of the csv files used as input and output:
$ python process-moodle-logs.py --logs input1.csv --videos videos.csv --stats output.csv --aggregated output_agg.csv
  1. Check out the resulting two files:
@he7d3r
he7d3r / fix.py
Last active Aug 20, 2018
Fix typos in .tex files
View fix.py
#!/usr/bin/python3
# Copyright © 2018 He7d3r <http://he7d3r.mit-license.org>
import argparse
import re
import fileinput
from pathlib import Path
re_rule = re.compile("<(?:Typo)?\s+(?:word=\"(.*?)\"\s+)?find=\"(.*?)\"\s+replace=\"(.*?)\"\s*\/?>")
def fix_typos(typos, filename):
View wp-words-by-frequency.pl
#!/usr/bin/perl -w
# Code : Dake
use strict;
use Parse::MediaWikiDump;
use utf8;
my $dump = shift(@ARGV) or die "Please specify a dump file";
my $pages = Parse::MediaWikiDump::Pages->new($dump);
my $page;
View wmgrep.js
// Based on https://meta.wikimedia.org/wiki/User:Krinkle/Tools/Global_SUL.js?oldid=10130488
/**
* This script provides an extra Special-page action called "WMGrep" which
* allows searching over all WMF wikis.
* After enabling the script, the tool is accessible from [[Special:BlankPage/wmgrep]].
*
* @source meta.wikimedia.org/wiki/User:Krinkle/Tools/Global_SUL
* @revision 4 (2014-10-08)
* @stats [[File:Krinkle_Global_SUL.js]]
*/
@he7d3r
he7d3r / informal-words.txt
Last active Aug 29, 2015
Bad words of Wikipedia (ptwiki)
View informal-words.txt
# The raw original list is at
# https://gist.github.com/Ladsgroup/cc22515f55ae3d868f47/c7343517b47b6908ac89e3bd2df68f9cee6cd188#file-ptwiki
# Caveats: https://meta.wikimedia.org/w/index.php?title=Research_talk:Revision_scoring_as_a_service&oldid=12148651#Badwords
adoro
aki
amo
bla
blablabla
coco
@he7d3r
he7d3r / print_dependency_graph.py
Created Jan 31, 2015
Prints a graph in graphviz syntax showing the dependencies between features and data sources of revscoring
View print_dependency_graph.py
from revscoring.features import *
from revscoring.datasources import *
features = [added_badwords_ratio, added_misspellings_ratio, badwords_added,
bytes_changed, chars_added, day_of_week_in_utc, hour_of_day_in_utc,
is_content_namespace, is_custom_comment, is_mainspace,
is_previous_user_same, is_section_comment, longest_repeated_char_added,
longest_token_added, markup_chars_added, misspellings_added,
numeric_chars_added, page_age_in_seconds, prev_badwords,
prev_misspellings, prev_words, proportion_of_badwords_added,
proportion_of_markup_added, proportion_of_misspellings_added,