Skip to content

Instantly share code, notes, and snippets.

View halfak's full-sized avatar

Aaron Halfaker halfak

View GitHub Profile
(3.4)[halfak@graphite: ~]
$ R
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
==> default: Checking for guest additions in VM...
default: The guest additions on this VM do not match the installed version of
default: VirtualBox! In most cases this is fine, but in rare cases it can
default: prevent things such as shared folders from working properly. If you see
default: shared folder errors, please make sure the guest additions within the
default: virtual machine match the version of VirtualBox you have installed on
default: your host and reload your VM.
default:
default: Guest Additions Version: 4.3.10
default: VirtualBox Version: 4.2
==> default: Checking for guest additions in VM...
default: The guest additions on this VM do not match the installed version of
default: VirtualBox! In most cases this is fine, but in rare cases it can
default: prevent things such as shared folders from working properly. If you see
default: shared folder errors, please make sure the guest additions within the
default: virtual machine match the version of VirtualBox you have installed on
default: your host and reload your VM.
default:
default: Guest Additions Version: 4.3.10
default: VirtualBox Version: 4.2
from: Marc-André Pelletier via RT <ops-requests@wikimedia.org>
reply-to: ops-requests@wikimedia.org
to: aotto@wikimedia.org
cc: ahalfaker@wikimedia.org
date: Wed, Sep 3, 2014 at 3:59 PM
subject: [wikimedia #8278] libbz2-dev and virtualenv on stat servers
mailed-by: rt.wikimedia.org
function consume_chunk(vector, split, start=1){
items = c()
location = start
for(item in vector[start:length(vector)]){
items = c(items, item)
location += 1
if(split(item)){
break
first = function(vector, condition, start=1){
for(i in start:length(vector)){
if(condition(vector[i])){
return(i)
}
}
return(length(vector))
}
mysql:research@analytics-store.eqiad.wmnet [staging]> CREATE UNIQUE INDEX wiki_rev_id ON events_sandbox_edit (wiki, rev_id);
ERROR 1062 (23000): Duplicate entry 'enwiki-625851283' for key 'wiki_rev_id'
mysql:research@analytics-store.eqiad.wmnet [staging]> select * from events_sandbox_edit where wiki = "enwiki" and rev_id = 625851283;
+--------+-----------+----------------+----------+---------------+----------------+------------------+
| wiki | rev_id | rev_timestamp | rev_user | rev_user_text | page_namespace | page_title |
+--------+-----------+----------------+----------+---------------+----------------+------------------+
| enwiki | 625851283 | 20140916191849 | 22216559 | Olive875 | 2 | Olive875/sandbox |
+--------+-----------+----------------+----------+---------------+----------------+------------------+
1 row in set (1.36 sec)
$ python demonstrate_extractor.py
Extracting features for http://en.wikipedia.org/wiki/?oldid=626489778&diff=prev
<added_badwords_ratio>: 211.95999999999998
<added_misspellings_ratio>: 1.4638121546961327
<badwords_added>: 3
<bytes_changed>: 133
<chars_added>: 145
<day_of_week_in_utc>: 6
<hour_of_day_in_utc>: 15
<is_custom_comment>: True
Host bastion.wmf
Hostname bast1001.wikimedia.org
Host stat3.wmf
HostName stat1003.wikimedia.org
ProxyCommand ssh -a -W %h:%p bastion.wmf
-----
mysql:research@analytics-store.eqiad.wmnet [enwiki]> explain SELECT
-> DATABASE() AS wiki,
-> user_id,
-> revision.rev_id,
-> page_namespace,
-> CAST(revision.rev_len AS INT) -
-> CAST(IFNULL(parent.rev_len, 0) AS INT) AS bytes_changed,
-> IFNULL(parent.rev_len, 0) AS previous_bytes
-> FROM staging.tr_experimental_user
-> INNER JOIN revision FORCE INDEX (user_timestamp) ON