Skip to content

Instantly share code, notes, and snippets.

View he7d3r's full-sized avatar

Helder Geovane Gomes de Lima he7d3r

View GitHub Profile
@he7d3r
he7d3r / conj-pt-er.js
Created September 4, 2014 14:19
JSON para "conj.pt.er"
// https://pt.wiktionary.org/wiki/Template:conj.pt
// https://pt.wiktionary.org/wiki/Template:conj.pt.er
// https://pt.wiktionary.org/wiki/Special:WhatLinksHere/Template:conj.pt?limit=500&namespace=10
/*var text = $('#wpTextbox1').val();
jsMsg( text.replace( /<noinclude>[\s\S]+?<\/noinclude>/g, '' ) );
$('#wpTextbox1').val().replace(/<noinclude>[\s\S]+?<\/noinclude>/g, '').match( /\{\{\s*conj\.pt\s*\|\s*título\s*=\s*([\s\S]+?)\s*\|\s*\{\{\{1\|?\}\}\}(.+?)\s*\|/ );
@he7d3r
he7d3r / TemplateScript.test.js
Created September 15, 2014 13:06
QUnit tests for TemplateScript
/**
* QUnit tests for TemplateScript.js
*/
/*jslint browser: true, white: true*/
/*global jQuery, mediaWiki, QUnit */
( function ( $, mw /* , undefined */ ) {
'use strict';
function myTests(){
@he7d3r
he7d3r / DisableCodeEditorAutoPairing.js
Created September 15, 2014 13:07
Disable CodeEditor auto pairing
/*global jQuery, mediaWiki */
( function ( mw, $ ) {
'use strict';
if ( $.inArray( mw.config.get( 'wgAction' ), [ 'edit', 'submit' ] ) === -1 ) {
return;
}
mw.hook( "codeEditor.configure" ).add( function( editorSession ) {
editorSession.setBehavioursEnabled( false );
@he7d3r
he7d3r / LanguageConverter.test.js
Created September 15, 2014 13:20
LanguageConverter tests
/**
* MediaWiki JavaScript library test suite
*
* Available on Special:BlankPage?action=lctest&debug=true
* @source Adapted from
* https://www.mediawiki.org/wiki/Special:Code/MediaWiki/87360
*/
/*jslint browser: true, white: true, evil: true, plusplus: true, vars: true, forin: true */
/*global jQuery, mediaWiki */
( function( mw, $ ) {
@he7d3r
he7d3r / SALEBOT-STATS.TXT
Last active August 29, 2015 14:07
Bad words from Salebot config on ptwiki (4 files: SALEBOT-STATS.TXT, SALEBOT-STEMS.TXT, SALEBOT-WORDS.TXT, SALEBOT.TXT)
# This list was generated like this:
# 1. Replace each regex by a list of words it matches (and its stems), limiting
# "infinite modifiers" such as "+", "*" and "{n,}" to just a few matches, using
# https://gist.github.com/he7d3r/34f332d0c0523a1bd438/f3805975fec2513f821f4286429998128171c6b2
#
# python invertSalebotRegexes.py SALEBOT.TXT SALEBOT-WORDS.TXT SALEBOT-STEMS.TXT
#
# 2. Remove stems which users never had to remove from pages. Detected by
# https://gist.github.com/he7d3r/f99482f4f54f97895ccb/9205f3271fe8daa2f694f4ce3ba9b29213dbad6c
#
@he7d3r
he7d3r / BadWordsCounter.py
Last active August 29, 2015 14:07
Prints out badword stems found in a list of XML dumps (by number of removals)
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# Copyright © 2014 He7d3r
# License: http://he7d3r.mit-license.org/
"""
Prints out badword stems (by number of removals) in a dump.xml.
Example:
python BadWordsCounter.py bad.txt bad-stats.txt dump1.xml dump2.xml
"""
@he7d3r
he7d3r / StemsToWords.py
Last active August 29, 2015 14:07
Prints out the frequency of the words (found in a dump.xml) which corresponds to each stem in a file
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# Copyright © 2014 He7d3r
# License: http://he7d3r.mit-license.org/
"""
Prints out words (found in a dump.xml) which corresponds to each stem in a file.
Example:
python StemsToWords.py stems.txt words.txt dump1.xml dump2.xml
"""
@he7d3r
he7d3r / WordsMatchingSalebotRules.py
Last active August 29, 2015 14:07
Prints out words (found in a dump.xml) which corresponds to some rule in a file.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# Copyright © 2014 He7d3r
# License: http://he7d3r.mit-license.org/
"""
Prints out words (found in a dump.xml) which corresponds to some rule in a file.
Example:
python WordsMatchingSalebotRules.py salebot.txt words.txt dump1.xml dump2.xml
"""
@he7d3r
he7d3r / invertSalebotRegexes.py
Created October 22, 2014 15:11
Create a list of words and a list of stems for each regex in the Salebot config
# Adapted from http://utilitymill.com/edit/Regex_inverter
# License: GPL/GFDL
# Extracted from invRegex.py, at http://pyparsing.wikispaces.com
from pyparsing import (Literal, oneOf, printables, ParserElement, Combine,
SkipTo, operatorPrecedence, ParseFatalException, Word, nums, opAssoc,
Suppress, ParseResults, srange)
from nltk.stem.snowball import SnowballStemmer
import sys
import re
@he7d3r
he7d3r / SALEBOT-STEMS-WORDS-STATS.TXT
Last active August 29, 2015 14:08
Words matching each stem in the badwords list from salebot config on ptwiki
# Obtained from
# https://gist.github.com/he7d3r/1285f6b52e2782d96b9e#file-salebot-stats-txt
# using
# https://gist.github.com/he7d3r/82eefda254d416292141/ea2d8f01a9b6530149c056a88da9c47172a91a58
# python StemsToWords.py SALEBOT-STATS.TXT SALEBOT-STEMS-WORDS-STATS.TXT ptwiki-20141015-pages-meta-history1.xml.7z ptwiki-20141015-pages-meta-history2.xml.7z ptwiki-20141015-pages-meta-history3.xml.7z ptwiki-20141015-pages-meta-history4.xml.7z
STEM FREQUENCY WORDS WITH THIS STEM, BY FREQUENCY
com 666632139 Counter({'com': 462043226, 'como': 197059812, 'comando': 3303334, 'come': 1091918, 'comida': 861471, 'comer': 703099, 'coma': 382041, 'comes': 262435, 'comidas': 189295, 'comeu': 140186, 'comendo': 119056, 'comem': 73333, 'comido': 61933, 'comia': 43458, 'comas': 37185, 'comi': 31208, 'comiam': 28096, 'comerem': 25816, 'comê': 24333, 'comar': 23143, 'comeram': 22629, 'comidos': 20561, 'comemos': 15184, 'comesse': 12021, 'comam': 10678, 'comos': 9780, 'comei': 4283, 'comessem': 3716, 'comares': 3238, 'comé': 2687,