Skip to content

Instantly share code, notes, and snippets.

View epugh's full-sized avatar

Eric Pugh epugh

View GitHub Profile
@epugh
epugh / Bertrand NDCG@10
Created February 13, 2020 22:49
Quepid NDCG@10 developed by Bertrand
// Wrap the Quepid objects and API in a namespace.
// Simple pass-through required by our NDCG scorer.
let quepidApi = {};
(function(context) {
context.getDocs = function() {
return docs;
}
pdf:docinfo:producer Adobe PDF Library 11.0
pdf:docinfo:created 2014-07-14T19:27:34Z
page
For release on delivery
10:00 a.m. EDT
July 15, 2014
classifier = ClassifierReborn::LSI.new #:auto_rebuild => false
strings = [
["n/a OSC Retreat.", :missing],
["LOOKING FOR SPEAKER", :missing],
["Need speaker", :missing],
["Elizabeth Solr Search", :present],
["Matt Datastax", :present],
["Scott Roll your own user analytics with Zeppelin", :present],
["Brandon Rose Spark and Elasticsearch", :present],

Hortonworks User Group

These are notes for following along on the talk I am giving at http://www.meetup.com/Washington-DC-Hortonworks-User-Group-Meetup/events/230394067/

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone! 1

  1. This gist is using not the latest version of Zeppelin, but the latest stable version. Replace the ip address 192.168.99.101 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via:

Future of Big Data: Philadelphia

These are notes for following along on the talk I am giving.

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone!

  1. This gist is using the latest version of Zeppelin. Replace the ip address 192.168.99.100 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via: docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
  3. If it doesnt' work, go back to the specific "stable" version of Zeppelin. There is a 1 GB layer in there, watch out!
@epugh
epugh / zeppelin_solr_spark_oh_my_meetup_notes.md
Last active October 9, 2018 03:30
Steps for following along with Eric's Zeppelin talk.

The below steps all assume you have installed Docker. I used the Kitematic tool for OSX, and it worked great. Everything is mapped to your "localhost" domain name.

  1. Let's Set up Zeppelin

    I am using this Docker image https://github.com/dylanmei/docker-zeppelin to fire up Zeppelin and Spark. Note, it's slow cause there is so many processes (Spark Master, Spark Worker, Zeppelin) to start! This is now up to Zeppelin 0.7.0

    docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
    
@epugh
epugh / pretty_print_xml.rb
Created March 16, 2016 17:49
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)
@epugh
epugh / gist:5f4b40be01aa12ff9f5c
Created March 16, 2016 17:47
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)
Eric Pugh co-wrote the book on Solr (Apache Solr 3 Enterprise Search Server), leads OpenSource Connections, the 2014 DataStax Engagement Partner of the year, blogs at www.opensourceconnections.com and his new love is Spark.
OR
Eric Pugh has been involved in the open source world as a developer, committer, and consultant for the past 15 years. He is a member of the Apache Software Foundation and for the past decade has been focused on building discovery and analytic solutions based on search engines.
In biotech, financial services and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software to build their rich data analytics systems. As a speaker he has spoken widely about the lessons learned in implementing big data based systems.
Eric leads OpenSource Connections, the DataStax Engagement Partner of the Year for 2014. He has worked with both Federal and commercial customers to evaluate Cassandra for the specific use cases they have undergon
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
HttpClient httpClient = HttpClientUtil.createClient(params);
httpClient = new InsecureHttpClient(httpClient, username, password);
SolrServer solrServer = new HttpSolrServer(url, httpClient));