Skip to content

Instantly share code, notes, and snippets.

View johnmiedema's full-sized avatar

John Miedema johnmiedema

View GitHub Profile
@johnmiedema
johnmiedema / lila_0_2.php
Last active August 6, 2017 16:17
Lila 0.2 - Word Count Analysis of WordPress Posts - PHP & Google Charts
<html>
<head>
<title>Lila Prototype 0.2 - johnmiedema.com</title>
</head>
<body>
<?php
//References
//https://gist.github.com/chasewoodford/51e185ed1d49862bf988
//https://developers.google.com/chart/interactive/docs/gallery/linechart
@johnmiedema
johnmiedema / EvernoteRandom
Last active June 9, 2021 03:59
Redirect to a random note link from Evernote
<?php
/*
--------------------------------------------------------------------------
EVERNOTE RANDOM
Use with IFTTT.com to get a daily random evernote note sent to your email
When link is opened, view it in your Evernote app
Edit a note daily to keep up on them all
--------------------------------------------------------------------------
Requirements:
@johnmiedema
johnmiedema / demoExtractSolrQueryResponseData
Last active August 29, 2015 14:03
Extract SolrQuery Response Data
//Extract SolrQuery response data
//johnmiedema.com
package demoCrawlIndexQuery;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
@johnmiedema
johnmiedema / TestCustomOpenNlpModel
Last active March 11, 2019 22:23
Test a custom OpenNLP model
//Test a custom OpenNLP model for NER of book titles
//See https://gist.github.com/johnmiedema/4020deea875ce306971e
package demoModelTrainer;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
@johnmiedema
johnmiedema / OpenNlpModelNERBookTItles
Last active June 20, 2021 15:02
Create an OpenNLP model for Named Entity Recognition of Book Titles
//Create an OpenNLP model for Named Entity Recognition of Book Titles
//See tester at https://gist.github.com/johnmiedema/7e7330e1b9263267bdfc
package demoModelTrainer;
import java.io.File;
import java.io.FileOutputStream;
import java.util.Collections;
import opennlp.tools.namefind.NameFinderME;
@johnmiedema
johnmiedema / RecognizeNamesOpenNLPNameFinder
Last active August 9, 2018 08:42
Recognize names using OpenNLP NameFinder
package demoNameFind;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
@johnmiedema
johnmiedema / extractNounPhrasesOpenNLP
Last active August 15, 2019 20:11
Extract noun phrases from a single sentence using OpenNLP
package demoParseNounPhrases;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashSet;
import java.util.Set;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
@johnmiedema
johnmiedema / ApacheTikaSolrIndexSearch
Last active April 30, 2022 05:40
Use Apache Tika and Solr to index and search documents
//Use Apache Tika and Solr to crawl, index and search documents
//John Miedema http://johnmiedema.com
//-----------------------------------------------------------
//referenced libraries:
//Apache Tika 1.5
//Apache Solr 4.7.2
//Apache HttpClient 4.3.3 reqd to connect to Solr server
//Noggit json parser reqd for Solr commands
//-----------------------------------------------------------
//after Solr is downloaded, start it using the following commands
@johnmiedema
johnmiedema / ApacheTikaMetadataConvertPlainText
Last active August 29, 2015 13:59
Use Apache Tika to extract metadata and convert different content types into plain text
//Use Apache Tika to extract metadata and convert different content types into plain text
//'Whatson' blog series at johnmiedema.com
//http://johnmiedema.com/?tag=whatson
//source documents include different content types
processDocument(&quot;resources/mobydick.htm&quot;);
processDocument(&quot;resources/robinsoncrusoe.txt&quot;);
processDocument(&quot;resources/callofthewild.pdf&quot;);
private static void processDocument(String pathfilename) {
@johnmiedema
johnmiedema / tokenizeUsingOpenNLP
Last active August 15, 2019 20:12
Tokenize content using OpenNLP
//Tokenizing content using OpenNLP
//'Whatson' blog series at johnmiedema.com
//http://johnmiedema.com/?tag=whatson
//select tokenizer model, in this case a pre-trained model from OpenNLP
//custom models can be built for unique whitespace handling requirements
InputStream modelIn = new FileInputStream("en-token.bin");
try {
//load the model