Skip to content

Instantly share code, notes, and snippets.

@halfak
Last active March 28, 2018 22:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save halfak/f00ea4efb2b158fe3d1924b3dfb10d58 to your computer and use it in GitHub Desktop.
Save halfak/f00ea4efb2b158fe3d1924b3dfb10d58 to your computer and use it in GitHub Desktop.
name: huwiki
label: Hungarian Wikipedia
host: hu.wikipedia.org
external_samples:
sampled_revisions.40k_2016:
quarry_url: "http://quarry.wmflabs.org/run/79645/output/0/json-lines?download=true"
human_labeled_revisions.raw.5k_2016:
labeling_campaign: "https://labels.wmflabs.org/campaigns/huwiki/12/"
autolabeled_samples:
trusted_edits: 1000
trusted_groups:
- sysop
- oversight
- trusted
- bot
- rollbacker
- checkuser
- abusefilter
- bureaucrat
- editor
- templateeditor
- interface-editor
labeled_samples:
autolabeled_revisions.40k_2016: sampled_revisions.40k_2016
balanced_5k_samples:
revisions_for_review.5k_2016: autolabeled_revisions.40k_2016
merged_samples:
labeled_revisions.40k_2016:
- autolabeled_revisions.40k_2016
- human_labeled_revisions.5k_2016
extracted_samples:
labeled_revisions.w_cache.40k_2016:
sample: labeled_revisions.20k_2016
features_for: [damaging, goodfaith]
models:
damaging:
sample: labeled_revisions.w_cache.40k_2016
label: damaging
pop_rate_true: 0.01
tune: {}
cv_train:
algorithm: GradientBoosting
label_weight: 10
parameters:
max_depth: 7
learning_rate: 0.01
max_features: log2
n_estimators: 700
goodfaith: {...}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment