Skip to content

Instantly share code, notes, and snippets.

View epugh's full-sized avatar

Eric Pugh epugh

View GitHub Profile
@epugh
epugh / Bertrand NDCG@10
Created February 13, 2020 22:49
Quepid NDCG@10 developed by Bertrand
// Wrap the Quepid objects and API in a namespace.
// Simple pass-through required by our NDCG scorer.
let quepidApi = {};
(function(context) {
context.getDocs = function() {
return docs;
}
pdf:docinfo:producer Adobe PDF Library 11.0
pdf:docinfo:created 2014-07-14T19:27:34Z
page
For release on delivery
10:00 a.m. EDT
July 15, 2014

Future of Big Data: Philadelphia

These are notes for following along on the talk I am giving.

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone!

  1. This gist is using the latest version of Zeppelin. Replace the ip address 192.168.99.100 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via: docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
  3. If it doesnt' work, go back to the specific "stable" version of Zeppelin. There is a 1 GB layer in there, watch out!
@epugh
epugh / zeppelin_solr_spark_oh_my_meetup_notes.md
Last active October 9, 2018 03:30
Steps for following along with Eric's Zeppelin talk.

The below steps all assume you have installed Docker. I used the Kitematic tool for OSX, and it worked great. Everything is mapped to your "localhost" domain name.

  1. Let's Set up Zeppelin

    I am using this Docker image https://github.com/dylanmei/docker-zeppelin to fire up Zeppelin and Spark. Note, it's slow cause there is so many processes (Spark Master, Spark Worker, Zeppelin) to start! This is now up to Zeppelin 0.7.0

    docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
    
classifier = ClassifierReborn::LSI.new #:auto_rebuild => false
strings = [
["n/a OSC Retreat.", :missing],
["LOOKING FOR SPEAKER", :missing],
["Need speaker", :missing],
["Elizabeth Solr Search", :present],
["Matt Datastax", :present],
["Scott Roll your own user analytics with Zeppelin", :present],
["Brandon Rose Spark and Elasticsearch", :present],

Hortonworks User Group

These are notes for following along on the talk I am giving at http://www.meetup.com/Washington-DC-Hortonworks-User-Group-Meetup/events/230394067/

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone! 1

  1. This gist is using not the latest version of Zeppelin, but the latest stable version. Replace the ip address 192.168.99.101 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via:
@epugh
epugh / pretty_print_xml.rb
Created March 16, 2016 17:49
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)
@epugh
epugh / gist:5f4b40be01aa12ff9f5c
Created March 16, 2016 17:47
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
HttpClient httpClient = HttpClientUtil.createClient(params);
httpClient = new InsecureHttpClient(httpClient, username, password);
SolrServer solrServer = new HttpSolrServer(url, httpClient));
@epugh
epugh / gist:6691303
Last active December 23, 2015 20:39
InsecureHttpClient allows you to access using BASIC authentication and over SSL a HTTP server. I am using it in conjunction with SolrJ. Blog at http://www.opensourceconnections.com/2013/09/24/using-solrj-wi…l-wrapped-solr/
package com.o19s.http;
import java.io.IOException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;