Skip to content

Instantly share code, notes, and snippets.

Avatar

Dan Andreescu milimetric

  • Wikimedia Foundation
  • New York, NY
View GitHub Profile
@milimetric
milimetric / dag_gen.py
Created Mar 9, 2022
Thoughts on DAG generation
View dag_gen.py
projectview_ready = HiveTriggeredHQLTaskFactory(
'run_hql_and_arhive',
default_args=default_args,
...
)
archive = ArchiveTaskFactory(...)
projectview_ready.sensors() >> projectview_ready.etl() >> archive()
@milimetric
milimetric / query.sql
Created Feb 22, 2019
example query to mess around with
View query.sql
use wmf;
-- new data
select coalesce(c.country, g.country_code) as country,
sum(edit_count) as edits,
sum(namespace_zero_edit_count) as namespace_zero_edits
from geoeditors_edits_monthly g
inner join
(select distinct dbname
View sanitize query test.sql
select ar_id, ar_namespace, ar_title, NULL as ar_text, NULL as ar_comment, NULL as ar_comment_id,
case when ar_deleted&4 != 0 then null when ar_actor = 0
then ar_user else COALESCE( actor_user, 0 ) END AS ar_user,
case when ar_deleted&4 != 0 then null when ar_actor = 0
then ar_user_text else actor_name END AS ar_user_text,
if(ar_deleted&4 <> 0,0,ar_actor) as ar_actor, ar_timestamp, ar_minor_edit, NULL as ar_flags, ar_rev_id,
case when ar_deleted&1 != 0 then null when content_id is NULL then ar_text_id
else content_id end as ar_text_id,
ar_deleted, if(ar_deleted&1 <> 0,null,ar_len) as ar_len,
@milimetric
milimetric / basic signal timeout.py
Last active Jul 30, 2018
Set a timeout for executing python code in a with statement
View basic signal timeout.py
import signal
import re
class TimeoutError(Exception):
pass
class timeout:
@milimetric
milimetric / survive-1.sql
Created Jun 13, 2018
history query example
View survive-1.sql
with users_with_revisions as (
select event_user_id,
event_timestamp
from mediawiki_history
where event_entity = 'revision'
and event_type = 'create'
and snapshot = '2018-05'
and wiki_db = 'enwiki'
)
@milimetric
milimetric / OojsUiCheckBoxInputWidget.vue
Created May 24, 2017
This is a quick example that shows how to wrap an oojs-ui component in a Vue component. It's nasty because of the lack of componentization of oojs-ui, but it's just a proof of concept.
View OojsUiCheckBoxInputWidget.vue
<template>
<!-- In Vue, $el is this root element defined in the template section -->
<span></span>
</template>
<script>
// the script-loader webpack plugin has to be used to hack the oojs-ui files directly into script tags
// because they have no modularization whatsoever (AMD, ES6, etc.)
import 'script-loader!oojs/dist/oojs.jquery'
import 'oojs-ui/dist/oojs-ui-core'
View pageview_tranquility_conf.json
{
"dataSources" : [
{
"spec" : {
"dataSchema" : {
"dataSource" : "pageviews-hourly",
"metricsSpec" : [
{
"name" : "view_count",
"type" : "longSum",
View labsusage.py
# NOTE: required for the following to work:
# !pip install pymysql\n",
# !git clone https://gerrit.wikimedia.org/r/p/operations/mediawiki-config\n",
# !cd mediawiki-config && git pull origin master"
import pymysql
import ipaddress
import os
connection = pymysql.connect(
host='analytics-store.eqiad.wmnet',
View squish.py
# download /srv/reportupdater/output/metrics/sessions in the working folder as sessions.old, then run:
# python squish.py
import csv
from path import glob
from collections import OrderedDict, defaultdict
from datetime import datetime, timedelta
@milimetric
milimetric / get daily edits and pages created.sql
Last active Sep 27, 2016
Queries to get simple metrics from mediawiki_history
View get daily edits and pages created.sql
select substring(event_timestamp, 0, 8) day,
count(*) `All namespaces`,
sum(if( page_namespace_latest = 0
,1, 0)) `Namespace Zero`,
sum(if( page_namespace_latest = 0
and revision_deleted_timestamp is null
,1, 0)) `Namespace Zero not Deleted`
from milimetric.mediawiki_history