Skip to content

Instantly share code, notes, and snippets.

@adamwight
Created July 27, 2017 23:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adamwight/6fcc15995a66b3ac9c4a817be97ff18a to your computer and use it in GitHub Desktop.
Save adamwight/6fcc15995a66b3ac9c4a817be97ff18a to your computer and use it in GitHub Desktop.
Makefile for ORES Flagged Revisions experiment
# See https://phabricator.wikimedia.org/T166235
# From https://quarry.wmflabs.org/query/20200
datasets/fiwiki.flaggedrevs_approved_raw.50k_2017.json:
wget -qO- https://quarry.wmflabs.org/run/192057/output/0/json-lines?download=true > $@
datasets/fiwiki.flaggedrevs_approved.50k_2017.json: \
datasets/fiwiki.flaggedrevs_approved_raw.50k_2017.json
python ~/revscoring/revscoring/utilities/normalize.py < $< > $@
datasets/fiwiki.flaggedrevs_training.65k.json: \
datasets/fiwiki.labeled_revisions_training.15k_2016.json \
datasets/fiwiki.flaggedrevs_approved.50k_2017.json
python ~/revscoring/revscoring/utilities/deduplicate_revs.py $^ > $@
datasets/fiwiki.flaggedrevs_training.w_cache.65k.json: \
datasets/fiwiki.flaggedrevs_training.65k.json
cat $< | \
revscoring extract \
editquality.feature_lists.fiwiki.reverted \
editquality.feature_lists.fiwiki.damaging \
editquality.feature_lists.fiwiki.goodfaith \
--host https://fi.wikipedia.org \
--verbose > $@
# FIXME: --observations is working around an old bug in the docopt
models/fiwiki.damaging_w_flaggedrevs_wo_testinfo.gradient_boosting.model: \
datasets/fiwiki.flaggedrevs_training.w_cache.65k.json
cat $< | \
revscoring train_model \
revscoring.scorer_models.GradientBoosting \
editquality.feature_lists.fiwiki.damaging \
damaging \
--observations "<stdin>" \
-p 'learning_rate=0.01' \
-p 'max_features="log2"' \
-p 'max_depth=5' \
-p 'n_estimators=700' \
--balance-sample-weight \
--version 0.0.1 \
--center --scale > $@
models/fiwiki.damaging_w_flaggedrevs.gradient_boosting.model: \
datasets/fiwiki.labeled_revisions_testing.w_cache.5k_2016.json \
models/fiwiki.damaging_w_flaggedrevs_wo_testinfo.gradient_boosting.model
revscoring test_model \
models/fiwiki.damaging_w_flaggedrevs_wo_testinfo.gradient_boosting.model \
damaging \
--observations=datasets/fiwiki.labeled_revisions_testing.w_cache.5k_2016.json > $@
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment