Skip to content

Instantly share code, notes, and snippets.

@nettrom
Created March 14, 2018 22:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nettrom/deee34b0dd4f4cfcb1e0f4ebe0e73e66 to your computer and use it in GitHub Desktop.
Save nettrom/deee34b0dd4f4cfcb1e0f4ebe0e73e66 to your computer and use it in GitHub Desktop.
Recent Draft publishing rate
Examining the number of pages created in the Draft namespace between 2017-09-15 and 2018-02-15 that currently (2018-03-14) exists
as pages in the article (Main) namespace.
1: Get a dataset of creations in the Draft namespace between those dates from the `log` database:
SELECT page_id
FROM mediawiki_page_create_2
WHERE `database`='enwiki'
AND rev_timestamp >= '2017-09-15 00:00:00'
AND rev_timestamp < '2018-02-15 00:00:00'
AND page_namespace = 118
AND page_is_redirect = 0;
2: Loaded that data into a scratch table:
SELECT count(*) FROM nettrom_temp;
+----------+
| count(*) |
+----------+
| 34115 |
+----------+
1 row in set (0.02 sec)
3: Join with the live page table on enwiki:
SELECT count(*)
FROM nettrom_temp
JOIN enwiki.page p
USING (page_id)
WHERE p.page_namespace=0;
+----------+
| count(*) |
+----------+
| 3771 |
+----------+
1 row in set (9.35 sec)
4: Calculate proportion:
SELECT ROUND(100*3771/34115, 1) AS recent_prop_published;
+-----------------------+
| recent_prop_published |
+-----------------------+
| 11.1 |
+-----------------------+
1 row in set (0.00 sec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment