You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
🌴
On vacation
Tim Robertson
timrobertson100
🌴
On vacation
Hadoop ecosystem (HBase, Hive, YARN, Spark, SOLR, ES etc).
Committer on Apache Beam
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT
occ1.k, occ1.cnt, occ2.cnt, occ2.cnt - occ1.cnt as increase
FROM
(SELECT COALESCE(kingdom, 'UNKNOWN') AS k, count(*) AS cnt
FROM occurrence_20140908 GROUP BY kingdom) occ1
JOIN
(SELECT COALESCE(kingdom, 'UNKNOWN') AS k, count(*) AS cnt
Reducing occurrence download widths to match content
Optimizing the downloads for users
GBIF.org delivers really wide tables, which are unmanageable for many, and slow to work with. By only returning columns with actual values in the data returned for any query, users will have narrower tables and will be easier to manage.
Currently we have 441 fields in occurrence_hdfs. Of these, across all records, only 347 are populated in one or more records.
We could consider
creating occurrence_hdfs only as wide as it needs to be - e.g. skip terms never populated (speeding up download MR jobs)
doing the same query before each download query will likely reduce the width further depending on the biases in the data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters