Skip to content

Instantly share code, notes, and snippets.

Helder he7d3r

Block or report user

Report or block he7d3r

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@he7d3r
he7d3r / fix.py
Last active Aug 20, 2018
Fix typos in .tex files
View fix.py
#!/usr/bin/python3
# Copyright © 2018 He7d3r <http://he7d3r.mit-license.org>
import argparse
import re
import fileinput
from pathlib import Path
re_rule = re.compile("<(?:Typo)?\s+(?:word=\"(.*?)\"\s+)?find=\"(.*?)\"\s+replace=\"(.*?)\"\s*\/?>")
def fix_typos(typos, filename):
View wp-words-by-frequency.pl
#!/usr/bin/perl -w
# Code : Dake
use strict;
use Parse::MediaWikiDump;
use utf8;
my $dump = shift(@ARGV) or die "Please specify a dump file";
my $pages = Parse::MediaWikiDump::Pages->new($dump);
my $page;
View wmgrep.js
// Based on https://meta.wikimedia.org/wiki/User:Krinkle/Tools/Global_SUL.js?oldid=10130488
/**
* This script provides an extra Special-page action called "WMGrep" which
* allows searching over all WMF wikis.
* After enabling the script, the tool is accessible from [[Special:BlankPage/wmgrep]].
*
* @source meta.wikimedia.org/wiki/User:Krinkle/Tools/Global_SUL
* @revision 4 (2014-10-08)
* @stats [[File:Krinkle_Global_SUL.js]]
*/
@he7d3r
he7d3r / informal-words.txt
Last active Aug 29, 2015
Bad words of Wikipedia (ptwiki)
View informal-words.txt
# The raw original list is at
# https://gist.github.com/Ladsgroup/cc22515f55ae3d868f47/c7343517b47b6908ac89e3bd2df68f9cee6cd188#file-ptwiki
# Caveats: https://meta.wikimedia.org/w/index.php?title=Research_talk:Revision_scoring_as_a_service&oldid=12148651#Badwords
adoro
aki
amo
bla
blablabla
coco
@he7d3r
he7d3r / print_dependency_graph.py
Created Jan 31, 2015
Prints a graph in graphviz syntax showing the dependencies between features and data sources of revscoring
View print_dependency_graph.py
from revscoring.features import *
from revscoring.datasources import *
features = [added_badwords_ratio, added_misspellings_ratio, badwords_added,
bytes_changed, chars_added, day_of_week_in_utc, hour_of_day_in_utc,
is_content_namespace, is_custom_comment, is_mainspace,
is_previous_user_same, is_section_comment, longest_repeated_char_added,
longest_token_added, markup_chars_added, misspellings_added,
numeric_chars_added, page_age_in_seconds, prev_badwords,
prev_misspellings, prev_words, proportion_of_badwords_added,
proportion_of_markup_added, proportion_of_misspellings_added,
@he7d3r
he7d3r / commands.sh
Last active Aug 29, 2015
Testing ORES
View commands.sh
# Create some folders
mkdir models datasets
# Generate a file with a new model
./new_model revscores.scorers.LinearSVCModel \
revscores.features.added_badwords_ratio \
revscores.features.added_misspellings_ratio \
revscores.features.day_of_week_in_utc \
revscores.features.hour_of_day_in_utc \
revscores.features.is_custom_comment \
@he7d3r
he7d3r / demonstrate_GridSearchCV.py
Created Dec 24, 2014
Test GridSearchCV using a dataset obtained from a tsv file
View demonstrate_GridSearchCV.py
"""
Test GridSearchCV using a dataset obtained from a tsv file
"""
import csv
from sklearn import svm
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
#from revscores.scorers import LinearSVC
from revscores.features import (added_badwords_ratio, added_misspellings_ratio,
@he7d3r
he7d3r / classification_report.txt
Last active Aug 29, 2015
Test the scorer on recent changes
View classification_report.txt
== Classification Report ==
precision recall f1-score support
0 0.85 0.95 0.90 1617
1 0.56 0.30 0.39 379
avg / total 0.80 0.82 0.80 1996
View Output for direction="newer"
$ python demonstrate_rc.py
39753618 (0 chars): 9a233f038c5f692efb3f0fbff7f4ced8a8c22cb0
40663045 (0 chars): 1f0c550dceb5c542dfb304e5d6337c063aaa3c48
34693351 (0 chars): 479dc3b4d6397134ce9d53e84c2fea0f451c1ae9
34764900 (0 chars): f23ac498773c3a74037ba7f91e68653fa8fa5809
40663042 (0 chars): 4416ce970a3a1bc1b2e1f1638cb716f9bf91c9fa
0 (0 chars):
36949986 (0 chars): 0f3919a4a56b1071087a09dce780a75374e216c6
0 (0 chars):
40659670 (0 chars): 9c1a70783ec78719d16533106816c76887fd139f
@he7d3r
he7d3r / stemToMostFrequentWord.py
Created Dec 14, 2014
For each stem, prints out the most frequent word which is matched by some regex rule and has that stem
View stemToMostFrequentWord.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# Copyright © 2014 He7d3r
# License: http://he7d3r.mit-license.org/
"""
For each stem, prints out the most frequent word which is matched by some regex rule and has that stem
Example:
python stemToMostFrequentWord.py SALEBOT.TXT SALEBOT-STEMS-WORDS-STATS.TXT BADWORDSLIST.TXT
"""
You can’t perform that action at this time.