Skip to content

Instantly share code, notes, and snippets.

Tim Robertson timrobertson100

Block or report user

Report or block timrobertson100

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View pipelines-demo.md

Example running standalone GBIF pipelines

This example will show a DwC-A into interpreted Avro files.

git clone https://github.com/gbif/pipelines.git
cd pipelines
mvn clean package -DskipTests
@timrobertson100
timrobertson100 / ids.md
Created Nov 18, 2019
Diagnosing id duplication
View ids.md

Using the lookup tool on c5gateway-vh.gbif.org we can get the keys for the id 1668748136:

12:06:39 UTC c5gateway-vh /usr/local/bin $ ./lookup-occurrence-key 1668748136
Lookup 1668748136 with dataset key from API 97bd086a-cf43-11e2-a9b3-00145eb45e9a
 27:97bd086a-cf43-11e2-a9b3-00145eb45e9a|JMRC|JMRCfungicoll|JMRC:FSU:02570 / 14837 / 750|null column=o:i, timestamp=1553909664771, value=\x00\x00\x00\x00cw\x13h
 73:97bd086a-cf43-11e2-a9b3-00145eb45e9a|http://id.snsb.info/ext/14837/14837/5004 column=o:i, timestamp=1563244584180, value=\x00\x00\x00\x00cw\x13h
 74:97bd086a-cf43-11e2-a9b3-00145eb45e9a|http://id.snsb.info/ext/14837/14837/5005 column=o:i, timestamp=1563244586420, value=\x00\x00\x00\x00cw\x13h
 75:97bd086a-cf43-11e2-a9b3-00145eb45e9a|http://id.snsb.info/ext/14837/14837/5006 column=o:i, timestamp=1553909265952, value=\x00\x00\x00\x00cw\x13h
 76:97bd086a-cf43-11e2-a9b3-00145eb45e9a|http://id.snsb.info/ext/14837/14837/5007 column=o:i, timestamp=1563244589868, value=\x00\x00\x00\x00cw\x13h
View stats.json
This file has been truncated, but you can view the full file.
{
"_shards": {
"total": 812,
"successful": 812,
"failed": 0
},
"_all": {
"primaries": {},
"total": {}
View format.txt
occurrenceCount,
// verbatim fields in records
v_kingdom,
v_phylum,
v_class,
v_order,
v_family,
v_genus,
v_scientificName,
View error.log
3:43:23.690 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://c4hivemetastore.gbif-uat.org:9083
13:43:23.738 [main] INFO hive.metastore - Connected to metastore.
13:43:23.742 [main] DEBUG org.apache.beam.sdk.Pipeline - Adding SqlTransform to Pipeline#2021601975
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Property 'org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem' not valid for plugin type org.apache.calcite.rel.type.RelDataTypeSystem
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:159)
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:114)
at org.apache.beam.repackaged.sql.org.apache.calcite.prepare.PlannerImpl.ready(PlannerImpl.java:143)
at org.apache.beam.repackaged.sql.org.apache.calcite.prepare.PlannerImpl.parse(PlannerImpl.java:170)
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Planner.parse(Planner.java
View BackbonePreRelease.java
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public class BackbonePreRelease {
private static final String SELECT_SQL = "SELECT kingdom, count(*) AS c FROM `hive`.`%s`.`%s`";
public static void main(String[] args) {
PipelineOptionsFactory.register(BackbonePreReleaseOptions.class);
BackbonePreReleaseOptions options = PipelineOptionsFactory.fromArgs(args).as(BackbonePreReleaseOptions.class);
options.setRunner(SparkRunner.class);
Pipeline p = Pipeline.create(options);
View delme.txt
Actions
1) Verify the correct mailing list is in place (TR / DM)
2) Ensure the participants in the Kilkenny accord are happy that it be finalised (DM mail to list)
- with the change the 5,000€ is not a "hard limit"
3) Draft a communication to be sent to mailing list covering (DM)
- The Kilkenny Accord status and share it
View csv-duplicate.sql
ADD JAR /tmp/hadoop-compress-1.3-SNAPSHOT.jar;
ADD JAR /tmp/occurrence-hive-0.89-20181017.084448-7.jar;
ADD JAR /tmp/brickhouse-0.6.0.jar;
ADD JAR /tmp/occurrence-common-0.89-20181017.084442-7.jar;
ADD JAR /tmp/gbif-api-0.72-20181012.105547-3.jar;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec=org.gbif.hadoop.compress.d2.D2Codec;
SET io.compression.codecs=org.gbif.hadoop.compress.d2.D2Codec;
@timrobertson100
timrobertson100 / taxstatus-top100.csv
Created Oct 1, 2018
Top 100 by occurrence count of taxonomicStatus in occurrence data
View taxstatus-top100.csv
taxonomicstatus c
NULL 987614012
accepted 21892669
Aceptado 3513846
accepted name 1213782
Accepted 956810
valid 675813
ACCEPTED 336521
Temporal 317255
válido 277952
@timrobertson100
timrobertson100 / top.csv
Created Sep 19, 2018
Top 100 dynamic properties
View top.csv
v_dynamicproperties count
NULL 967996798
{"Activity":"Forage"} 2922013
"{'coverScaleCode':'+'}" 2440492
"{'coverScaleCode':'r'}" 1456845
"{'coverScaleCode':'1'}" 1428278
{} 1075352
{"Activity":"Display/Song"} 870676
{"Activity":"Resting"} 674730
"{'coverScaleCode':'3'}" 648481
You can’t perform that action at this time.