Skip to content

Instantly share code, notes, and snippets.

View epugh's full-sized avatar

Eric Pugh epugh

View GitHub Profile
@tteofili
tteofili / NNFreqScoringSimilarity.java
Created January 23, 2018 13:47
Using index, term, doc frequencies to teach a neural network to rank docs
package com.github.tteofili.looseen.dl4j;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.similarities.BasicStats;
import org.apache.lucene.search.similarities.SimilarityBase;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.linalg.api.buffer.FloatBuffer;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
@epugh
epugh / zeppelin_solr_spark_oh_my_meetup_notes.md
Last active October 9, 2018 03:30
Steps for following along with Eric's Zeppelin talk.

The below steps all assume you have installed Docker. I used the Kitematic tool for OSX, and it worked great. Everything is mapped to your "localhost" domain name.

  1. Let's Set up Zeppelin

    I am using this Docker image https://github.com/dylanmei/docker-zeppelin to fire up Zeppelin and Spark. Note, it's slow cause there is so many processes (Spark Master, Spark Worker, Zeppelin) to start! This is now up to Zeppelin 0.7.0

    docker run -d --name zeppelin -p 8080:8080 dylanmei/zeppelin
    
#!/usr/bin/env python -x
import pysolr
import sys
from nltk.corpus import wordnet as wn
class Indexer:
"""
@cb372
cb372 / gist:1d7b1abbbf0c643f2903
Last active June 29, 2018 18:40
Using Elasticsearch as a Spark data source

Install the essentials.

$ brew update && brew install elasticsearch && brew install apache-spark

Start ES.

$ elasticsearch
-verbose:gc \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintGCTimeStamps \
-XX:+PrintHeapAtGC \
-XX:+PrintTenuringDistribution \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintGCApplicationConcurrentTime \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=5 \
@ngauthier
ngauthier / scraping.rb
Last active March 1, 2017 13:29
Scraping the Web with Ruby Code
#!/usr/bin/env ruby
# From: http://ngauthier.com/2014/06/scraping-the-web-with-ruby.html
require 'capybara'
require 'capybara/poltergeist'
require 'csv'
require 'gdbm'
class NickBot
include Capybara::DSL
@epugh
epugh / gist:6691303
Last active December 23, 2015 20:39
InsecureHttpClient allows you to access using BASIC authentication and over SSL a HTTP server. I am using it in conjunction with SolrJ. Blog at http://www.opensourceconnections.com/2013/09/24/using-solrj-wi…l-wrapped-solr/
package com.o19s.http;
import java.io.IOException;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLException;
import javax.net.ssl.SSLSession;
import javax.net.ssl.SSLSocket;