Skip to content

Instantly share code, notes, and snippets.

View epugh's full-sized avatar

Eric Pugh epugh

View GitHub Profile
Eric Pugh co-wrote the book on Solr (Apache Solr 3 Enterprise Search Server), leads OpenSource Connections, the 2014 DataStax Engagement Partner of the year, blogs at www.opensourceconnections.com and his new love is Spark.
OR
Eric Pugh has been involved in the open source world as a developer, committer, and consultant for the past 15 years. He is a member of the Apache Software Foundation and for the past decade has been focused on building discovery and analytic solutions based on search engines.
In biotech, financial services and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software to build their rich data analytics systems. As a speaker he has spoken widely about the lessons learned in implementing big data based systems.
Eric leads OpenSource Connections, the DataStax Engagement Partner of the Year for 2014. He has worked with both Federal and commercial customers to evaluate Cassandra for the specific use cases they have undergon
require 'singleton'
begin
require 'daemon_controller'
rescue LoadError
raise('FATAL: sudo gem install FooBarWidget-daemon_controller -s http://gems.github.com')
end
##
# sudo port install memcached
class DaemonBackgroundrb
[
{
"artist": {
"name": "Enigma",
"image_url": "http://thm-a01.yimg.com/image/238379f10d683a30",
"updated_at": "2009-04-21T16:46:00Z",
"group_type": "2",
"id": 116,
"release_date": "2006-09-22T04:00:00Z",
"created_at": "2009-04-21T00:33:06Z"
@epugh
epugh / gist:1627515
Created January 17, 2012 17:05
SolrInputDocument AVro Schema
{
"type": "record",
"name": "SolrInputDocument",
"namespace": "org.apache.solr.common",
"fields": [
{
"name": "_fields",
"type": {
"type": "map",
"values": {
@epugh
epugh / gist:6691303
Last active December 23, 2015 20:39
InsecureHttpClient allows you to access using BASIC authentication and over SSL a HTTP server. I am using it in conjunction with SolrJ. Blog at http://www.opensourceconnections.com/2013/09/24/using-solrj-wi…l-wrapped-solr/
package com.o19s.http;
import java.io.IOException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128);
params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32);
params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false);
HttpClient httpClient = HttpClientUtil.createClient(params);
httpClient = new InsecureHttpClient(httpClient, username, password);
SolrServer solrServer = new HttpSolrServer(url, httpClient));
@epugh
epugh / gist:5f4b40be01aa12ff9f5c
Created March 16, 2016 17:47
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)
@epugh
epugh / pretty_print_xml.rb
Created March 16, 2016 17:49
Convert a big blob of XML into pretty printed XML in Ruby/Rails
# Oh dear god this was a pain to figure out! REXML had parsing issues, so instead
# I parse with Nokogiri, then dump it out and feed it to REXML to use the pretty printer.
nokogiri_doc = Nokogiri::XML xml_string
rexml_doc = REXML::Document.new nokogiri_doc.to_xml
formatter = REXML::Formatters::Pretty.new(2)
@doc = ""
formatter.write(rexml_doc, @doc)

Hortonworks User Group

These are notes for following along on the talk I am giving at http://www.meetup.com/Washington-DC-Hortonworks-User-Group-Meetup/events/230394067/

This builds on the gist: https://gist.github.com/epugh/5729071c3b8aab81636d422c391aa716, but is meant to be stand alone! 1

  1. This gist is using not the latest version of Zeppelin, but the latest stable version. Replace the ip address 192.168.99.101 with the your docker machine ip. Get it by running docker-machine ip.
  2. Fire up Zeppelin + Spark Master and a Spark Worker via:
classifier = ClassifierReborn::LSI.new #:auto_rebuild => false
strings = [
["n/a OSC Retreat.", :missing],
["LOOKING FOR SPEAKER", :missing],
["Need speaker", :missing],
["Elizabeth Solr Search", :present],
["Matt Datastax", :present],
["Scott Roll your own user analytics with Zeppelin", :present],
["Brandon Rose Spark and Elasticsearch", :present],