Skip to content

Instantly share code, notes, and snippets.

@rdmpage
Created August 25, 2014 09:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rdmpage/8c2e2b85e3242f5f773d to your computer and use it in GitHub Desktop.
Save rdmpage/8c2e2b85e3242f5f773d to your computer and use it in GitHub Desktop.
BioStor geotagging stats
How many articles?
SELECT COUNT(reference_id) FROM rdmp_reference WHERE PageID <> 0;
+---------------------+
| COUNT(reference_id) |
+---------------------+
| 106617 |
+---------------------+
1 row in set (0.08 sec)
How many BHL pages?
SELECT COUNT(DISTINCT PageID) FROM rdmp_reference_page_joiner;
+------------------------+
| COUNT(DISTINCT PageID) |
+------------------------+
| 1484050 |
+------------------------+
1 row in set (15.50 sec)
How many localities?
SELECT COUNT(locality_id) FROM rdmp_locality;
+--------------------+
| count(locality_id) |
+--------------------+
| 45452 |
+--------------------+
1 row in set (0.00 sec)
How many BHL pages in BioStor articles are geotagged?
SELECT COUNT(DISTINCT PageID) FROM rdmp_locality_page_joiner WHERE locality_id <> 0;
+------------------------+
| COUNT(DISTINCT PageID) |
+------------------------+
| 15860 |
+------------------------+
1 row in set (0.46 sec)
How many articles are geotagged?
SELECT COUNT(DISTINCT reference_id)
FROM rdmp_reference_page_joiner
INNER JOIN rdmp_locality_page_joiner USING(PageID)
WHERE locality_id <> 0;
+------------------------------+
| COUNT(DISTINCT reference_id) |
+------------------------------+
| 5675 |
+------------------------------+
1 row in set (15.42 sec)
Which articles have the most geotags?
SELECT reference_id, COUNT(locality_id) AS c
FROM rdmp_reference_page_joiner
INNER JOIN rdmp_locality_page_joiner USING(PageID)
WHERE locality_id <> 0
GROUP BY reference_id
ORDER BY c DESC
LIMIT 10;
+--------------+------+
| reference_id | c |
+--------------+------+
| 140883 | 2421 |
| 111325 | 1822 |
| 111685 | 1015 |
| 110244 | 805 |
| 13052 | 803 |
| 116583 | 801 |
| 1781 | 732 |
| 137035 | 665 |
| 65902 | 647 |
| 13132 | 636 |
+--------------+------+
10 rows in set (0.52 sec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment