Skip to content

Instantly share code, notes, and snippets.

View codez266's full-sized avatar

Sumit Asthana codez266

View GitHub Profile
@codez266
codez266 / revertslabel
Last active October 28, 2020 05:24
Reverts labeling from dump - involves loading revids from db and storing back, but that part is trivial
import mwreverts
from models import RevRevert, Page, Revision
import mwxml
import pdb
from collections import deque
from mwapilib import get_revs_for_revert_labeling
import sys
# This script is used for processing edits from the dump for reverts and store
# the revert status in a revert table. Edits for the pages from the page table
Max depth - 6, learning_rate - 0.1, max_features - log2
Estimators: 50
real 1m20.208s
user 1m19.112s
sys 0m1.3s
Estimators: 75
real 1m18.758s
user 1m17.748s
Max depth - 3, learning_rate - 0.1, max_features - log2
Estimators: 50
real 1m29.432s
user 1m28.176s
sys 0m1.632s
Estimators: 75
real 1m23.595s
user 1m22.288s
sys 0m1.484s
2018-03-16 03:47:33,577 WARNING:revscoring.scoring.statistics.classification.micro_macro_stats -- Could not generate micro-average of f1: unsupported operand type(s) for *: 'NoneType' and 'int'
2018-03-16 03:47:33,577 WARNING:revscoring.scoring.statistics.classification.micro_macro_stats -- Could not generate macro-average of f1: unsupported operand type(s) for +: 'float' and 'NoneType'
2018-03-16 03:47:52,831 DEBUG:revscoring.utilities.tune -- Cross-validated GradientBoosting with n_estimators=50, max_depth=5, max_features="log2", learning_rate=0.01 in 48.394 minutes: pr_auc.macro=0.6543
2018-03-16 03:48:04,537 WARNING:revscoring.scoring.statistics.classification.micro_macro_stats -- Could not generate micro-average of precision: unsupported operand type(s) for *: 'NoneType' and 'int'
2018-03-16 03:48:04,537 WARNING:revscoring.scoring.statistics.classification.micro_macro_stats -- Could not generate macro-average of precision: unsupported operand type(s) for +: 'float' and 'NoneType'
2018-03-16 03:48:04,538
"""
These meta-datasources operate on :class:`revscoring.Datasource`'s that
return `list`'s of items and produce vectors out of the same.
.. autoclass:: revscoring.datasources.meta.vectors
"""
import os.path
import logging
from gensim.models.keyedvectors import KeyedVectors
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
5814 codezee 20 0 1271M 399M 27464 S 92.0 2.5 0:59.55 python buggy.py
5834 codezee 20 0 1271M 398M 27464 R 92.0 2.5 0:54.93 python buggy.py
5828 codezee 20 0 1417M 836M 6592 S 0.0 5.2 0:08.23 python buggy.py
5829 codezee 20 0 1417M 836M 6592 S 0.0 5.2 0:08.28 python buggy.py
5827 codezee 20 0 1416M 836M 6592 S 0.0 5.2 0:08.41 python buggy.py
5848 codezee 20 0 19532 3884 2872 R 0.7 0.0 0:00.17 htop
5835 codezee 20 0 1271M 398M 27464 S 0.7 2.5 0:00.12 python buggy.py
5645 redis 20 0 40860 2264 1304 S 0.0 0.0 1h44:44 /usr/bin/redis-server 0.0.0.0:6379oload --ini /etc/uwsgi/apps-enabled/ores.ini
5826 codezee 20 0 1416M 836M 6592 S 0.0 5.2 0:08.56 python buggy.pyts/0r/bin/diamond --foreground /etc/uwsgi/apps-enabled/ores.ini
@codez266
codez266 / random_forest_drafttopic
Created January 14, 2018 14:57
RandomForest statistics on drafttopic full dataset
Statistics:
counts (n=93415):
label n TP FP FN TN
--------------------------------------------- ----- --- ----- ---- ----- -----
'STEM.Time' 2382 --> 1904 478 4702 86331
'STEM.Physics' 2633 --> 2411 222 7577 83205
'STEM.Space' 2522 --> 2381 141 2824 88069
'STEM.Mathematics' 1659 --> 1462 197 5666 86090
'Culture.Crafts and hobbies' 2150 --> 1754 396 2419 88846
'History_And_Society.Transportation' 4276 --> 3711 565 2520 86619
@codez266
codez266 / gradient_boosting_drafttopic
Last active January 14, 2018 14:57
GradientBoosting statistics on drafttopic full dataset
counts (n=93415):
label n TP FP FN TN
--------------------------------------------- ----- --- ----- ----- ---- -----
'STEM.Time' 2382 --> 1515 867 116 90917
'STEM.Physics' 2633 --> 1498 1135 378 90404
'STEM.Space' 2522 --> 2135 387 101 90792
'STEM.Mathematics' 1659 --> 1090 569 74 91682
'Culture.Crafts and hobbies' 2150 --> 1236 914 67 91198
'History_And_Society.Transportation' 4276 --> 3091 1185 339 88800
'Geography.Maps' 2552 --> 1374 1178 73 90790
@codez266
codez266 / predictions.json
Last active December 28, 2017 16:56
First 500 predictions with one classifier per label
{"title": "List_of_fish_on_stamps_of_Madeira", "actual": ["Culture.Crafts and hobbies", "Geography.Europe", "Assistance.Maintenance", "STEM.Biology"], "predicted": ["Culture.Crafts and hobbies", "Geography.Countries", "Culture.Language and literature", "Geography.Europe"]}
{"title": "Arne_Tumyr", "actual": ["Culture.Language and literature", "Geography.Europe"], "predicted": ["Culture.Media", "History_And_Society.History and society", "Culture.Language and literature", "Geography.Europe", "History_And_Society.Politics and government"]}
{"title": "Irradiation", "actual": ["STEM.Technology", "STEM.Physics", "STEM.Medicine"], "predicted": ["STEM.Physics", "STEM.Biology", "STEM.Technology", "STEM.Engineering", "STEM.Medicine", "STEM.Chemistry"]}
{"title": "Wesbank,_Western_Cape", "actual": ["Geography.Countries"], "predicted": ["Geography.Countries", "History_And_Society.History and society", "Geography.Europe"]}
{"title": "60_Cycle", "actual": ["Culture.Language and literature", "Geography.Countries", "Culture.P
@codez266
codez266 / drafttopic_10k_scored_stats
Created December 28, 2017 16:29
Statistics on 10k ranomly shuffled wikiproject samples with model aggregations
counts (n=10000):
label n TP FP FN TN
--------------------------------------------- ---- --- ---- ---- ---- ----
'STEM.Time' 270 --> 200 70 155 9575
'STEM.Physics' 284 --> 245 39 554 9162
'STEM.Space' 251 --> 229 22 111 9638
'STEM.Mathematics' 164 --> 133 31 283 9553
'Culture.Crafts and hobbies' 232 --> 156 76 44 9724
'History_And_Society.Transportation' 469 --> 389 80 155 9376
'Geography.Maps' 287 --> 217 70 564 9149