Skip to content

Instantly share code, notes, and snippets.

View timrobertson100's full-sized avatar
🌴
On vacation

Tim Robertson timrobertson100

🌴
On vacation
View GitHub Profile
3:43:23.690 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://c4hivemetastore.gbif-uat.org:9083
13:43:23.738 [main] INFO hive.metastore - Connected to metastore.
13:43:23.742 [main] DEBUG org.apache.beam.sdk.Pipeline - Adding SqlTransform to Pipeline#2021601975
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Property 'org.apache.beam.sdk.extensions.sql.impl.planner.BeamRelDataTypeSystem' not valid for plugin type org.apache.calcite.rel.type.RelDataTypeSystem
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:159)
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:114)
at org.apache.beam.repackaged.sql.org.apache.calcite.prepare.PlannerImpl.ready(PlannerImpl.java:143)
at org.apache.beam.repackaged.sql.org.apache.calcite.prepare.PlannerImpl.parse(PlannerImpl.java:170)
at org.apache.beam.repackaged.sql.org.apache.calcite.tools.Planner.parse(Planner.java
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public class BackbonePreRelease {
private static final String SELECT_SQL = "SELECT kingdom, count(*) AS c FROM `hive`.`%s`.`%s`";
public static void main(String[] args) {
PipelineOptionsFactory.register(BackbonePreReleaseOptions.class);
BackbonePreReleaseOptions options = PipelineOptionsFactory.fromArgs(args).as(BackbonePreReleaseOptions.class);
options.setRunner(SparkRunner.class);
Pipeline p = Pipeline.create(options);
Actions
1) Verify the correct mailing list is in place (TR / DM)
2) Ensure the participants in the Kilkenny accord are happy that it be finalised (DM mail to list)
- with the change the 5,000€ is not a "hard limit"
3) Draft a communication to be sent to mailing list covering (DM)
- The Kilkenny Accord status and share it
ADD JAR /tmp/hadoop-compress-1.3-SNAPSHOT.jar;
ADD JAR /tmp/occurrence-hive-0.89-20181017.084448-7.jar;
ADD JAR /tmp/brickhouse-0.6.0.jar;
ADD JAR /tmp/occurrence-common-0.89-20181017.084442-7.jar;
ADD JAR /tmp/gbif-api-0.72-20181012.105547-3.jar;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec=org.gbif.hadoop.compress.d2.D2Codec;
SET io.compression.codecs=org.gbif.hadoop.compress.d2.D2Codec;
@timrobertson100
timrobertson100 / taxstatus-top100.csv
Created October 1, 2018 12:12
Top 100 by occurrence count of taxonomicStatus in occurrence data
taxonomicstatus c
NULL 987614012
accepted 21892669
Aceptado 3513846
accepted name 1213782
Accepted 956810
valid 675813
ACCEPTED 336521
Temporal 317255
válido 277952
@timrobertson100
timrobertson100 / top.csv
Created September 19, 2018 11:15
Top 100 dynamic properties
v_dynamicproperties count
NULL 967996798
{"Activity":"Forage"} 2922013
"{'coverScaleCode':'+'}" 2440492
"{'coverScaleCode':'r'}" 1456845
"{'coverScaleCode':'1'}" 1428278
{} 1075352
{"Activity":"Display/Song"} 870676
{"Activity":"Resting"} 674730
"{'coverScaleCode':'3'}" 648481
@timrobertson100
timrobertson100 / elasticsearch-pr-notes.md
Last active August 13, 2018 19:41
ElasticsearchIOIT notes to self

Elasticsearch

Notes to self while reviewing the PR for BEAM-5107.

Maven instructions

ElasticsearchIOITcommon JDoc references mvn. To work around this quickly I did the following hack(!).

Added this to the elasticsearch-tests-common/build.gradle

@timrobertson100
timrobertson100 / OperationDecoder.java
Last active June 13, 2018 06:33
Decoder of kudu operations (UNTESTED!!!)
/**
* Decodes the protobuf bytes into {@link Operation} instances.
*
* <p>The encoded format is defined as follows:
*
* <ol>
* <li>"rows" is a byte array encoding of:
* <ol>
* <li>The operation type (e.g. Upsert) encoded as a byte
* <li>The "isSet" bitSet encoded as one or more bytes
@timrobertson100
timrobertson100 / ContainsExampleTest.java
Created May 21, 2018 11:30
Example for Tim Cook of the apache/beam channel
package com.opencore.demo;
import com.google.common.collect.ImmutableList;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import org.junit.Rule;
@timrobertson100
timrobertson100 / AuthorizedSolrClient.java
Created March 27, 2018 18:39
AuthorizedSolrClient update
void processWithRetry(
String collection, UpdateRequest request, int numberAttempts, Duration maxDuration)
throws IOException, InterruptedException {
// TODO: sanitize those params
request.setBasicAuthCredentials(username, password);
Sleeper sleeper = Sleeper.DEFAULT;
// Note: FluentBackoff counts retries excluding the original while we count attempts
// to remove any notion of ambiguity (hence the -1)