Skip to content

Instantly share code, notes, and snippets.

@btiernay
btiernay / PreProcessLine.java
Last active May 17, 2017 09:13
Example of extracting information from HDFS paths in a Spark transformation
package org.icgc.dcc.etl.staging.function;
import static com.google.common.base.Stopwatch.createStarted;
import static com.google.common.collect.Iterables.toArray;
import static org.icgc.dcc.common.core.util.FormatUtils.formatCount;
import static org.icgc.dcc.common.core.util.FormatUtils.formatPercent;
import static org.icgc.dcc.common.core.util.Splitters.TAB;
import java.io.Serializable;
import java.util.Iterator;
@btiernay
btiernay / facet-filter-with-nested-filter-fails.sh
Last active December 21, 2015 22:58
Example that shows how using a `facet_filter` with a `nested` filter fails to return `nested` facet terms.
#!/bin/bash
#
# Description:
# Example that shows how using a `facet_filter` with a `nested` filter fails to return `nested` facet terms.
# Remove index if it exists already
curl -XDELETE http://localhost:9200/nested
# Create a fresh index
curl -XPOST http://localhost:9200/nested
@btiernay
btiernay / search-null-value-array-object-field
Last active December 20, 2015 19:49 — forked from anonymous/search-null-value-array-object-field
This query demonstrates that null values are not reverse translated when they mapped with `null_value`
#!/bin/bash
# Clean
curl -XDELETE 'http://localhost:9200/null?pretty'
# Create index
curl -XPOST 'http://localhost:9200/null?pretty'
# Create mapping
curl -XPOST 'http://localhost:9200/null/type/_mapping?pretty' -d '
@btiernay
btiernay / undesired-match-all-nested-behavior.sh
Last active December 20, 2015 04:49
Example that shows how applying a `match_all` `nested` query will exclude documents that do not have nested documents.
#!/bin/bash
#
# Description:
# Example that shows how applying a `match_all` `nested` query will exclude hits that do not have nested documents.
# This may be related to ElasticSearch's `NestedQueryParser` and it's use of Lucene's `ToParentBlockJoinQuery`
#
# See:
# - https://github.com/elasticsearch/elasticsearch/blob/0.90/src/main/java/org/elasticsearch/index/query/NestedQueryParser.java
# - http://lucene.apache.org/core/4_3_0/join/org/apache/lucene/search/join/package-summary.html
# - http://lucene.apache.org/core/4_3_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html
@btiernay
btiernay / Main.java
Last active March 29, 2018 08:57
Example of how one can issue a `splitVector` command in MongoDB 2.4.0 using the Java client when `--auth` is enabled and the requesting user exists in the `admin` db.
import java.net.UnknownHostException;
import com.mongodb.BasicDBObject;
import com.mongodb.BasicDBObjectBuilder;
import com.mongodb.CommandResult;
import com.mongodb.DB;
import com.mongodb.DBObject;
import com.mongodb.Mongo;
import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
@btiernay
btiernay / TupleEntrySerialization.java
Created January 11, 2013 15:40
Custom TupleEntrySerialization implementation that complements the behavior of TupleSerialization. It works by (de)serializing the Fields field and then delegates to a supplied TupleSerialization instance for the Tuple field. Thus, it is possible to nest TupleEntries / Tuples into arbitrary trees and serialize them using Hadoop. This is an exten…
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.io.Serializable;
import java.util.Comparator;