John Laudun johnlaudun

## lcm.txt
The Layton Court Mystery

by Anthony Berkeley


Contents

       I. Eight o’Clock in the Morning
      II. An Interrupted Breakfast

## sentiments.py
#! /usr/bin/env python

'''
sentiments.py compares the outputs of the sentimental modules listed below.
Functionality to be added: normalization and smoothing.
(I haven't implemented the NLTK solution because I don't have classified texts.)
'''

# Imports
import matplotlib.pyplot as plt

## laudun_vita.tex
\documentclass[12pt, letter]{article}
\usepackage[margin=1in]{geometry}
\setlength{\parindent}{0em}
\setlength{\parskip}{0.5em}
\newif\ifdraft
\drafttrue % or \draftfalse


\begin{document}

## JP_MD_fulltext
What I've been working on for the past few days is in preparation for attempting a topic model using the more established LDA instead of the NMF to see how well they compare -- with the understanding that since there is rarely a one-to-one matchup within either method, that there will be no such match across them.

Because LDA does not filter out common words on its own, the way the NMF method does, you have to start with a stoplist. I know we can begin with Blei's and a few other established lists, but I would also like to be able to compare that against our own results. My first thought was to build a dictionary of words and their frequency within the corpus. For convenience sake, I am using the NLTK.

Just as a record of what I've done, here's the usual code for loading the talks from the CSV with everything in it:

```python
import pandas
import re

## wp_jp_md_code_example
Here's the text as I wrote it in Markdown and as it sits in the WP editing pane:

We still need to identify which talks have floats for values and determine what impact, if any, it has on the project.

```python
import nltk

tt_tokens = nltk.word_tokenize(all_words)

tt_freq = {}

## Reverse_Engineering_Text_Mining.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                johnlaudun
                / Reverse_Engineering_Text_Mining.md
            
            
              Last active
              December 18, 2016 01:15
            
          
    In a recent post on
ProfHacker I described a classroom technique I have used when teaching fiction. The
roll-your-own dramatic interpretation mixed in with some reality television
competition is one way I have found to plunge students into immersive encounters
with texts and with each other. The exercise hacks conventional classroom dynamics,
but perhaps ProfHacker readers yearn for something more, something more, well,
hacky. And what if I told you that the hacks I have pursued were to reverse engineer
text mining so that it became a lived process in the classroom?
As perhaps a lot of people who have tried to get undergraduates to read texts as

  
## lda.py
""" Example using GenSim's LDA and sklearn. """

import numpy as np

from gensim import matutils
from gensim.models.ldamodel import LdaModel
from sklearn import linear_model
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

## a_workshop.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                johnlaudun
                / a_workshop.md
            
            
              Created
              May 31, 2016 17:37
                — forked from auremoser/a_workshop.md
            
          
    Building Dynamic Maps with Open Source Tools

###IPAM - UCLA
Aurelia Moser, @auremoser, aurelia@mozillafoundation.org

May 30th, 2016

Instagram @chazhutton
Find this document here:

  
## Heart_of_Darkness-sentiment
0.6,0.25,0.5,0.5,-0.85,-3.05,0.25,0,0,1,-1.3,0,0.1,0.6,-0.25,0.8,0.25,2,0,0,0.6,0.75,1.5,1,-1.2,-2.55,2.5,1.5,0.7,3.55,-1.2,1.5,1.45,0,-0.6,1.1,-0.4,0.25,0.1,1.2,0.5,-3.55,-0.6,0,-0.5,-0.35,0,0.5,-2.45,2.35,-1.5,0.75,0,0.5,0.8,-1.35,0,1.05,0,-0.75,1.85,0.25,1.25,0,-3.5,0,0,-0.2,-1,1.7,0.65,-2,0,-1.5,0.75,0.5,-3.45,0,0.5,2.4,0,-0.75,-1.5,0,-3,1,0,0.5,-0.75,-1.05,-0.75,0,-2,0,0.5,0.75,0.5,0.5,0.2,-0.5,0.25,0,-0.75,0.5,1.25,1.3,0,0.8,0,0,0,1.2,0,0.5,0,2.15,-0.75,0.5,0.1,-0.35,0,1.5,0,-0.25,-0.25,-2.1,-1.25,0.25,0,0,-0.5,-0.75,0.9,0,0,0,0.8,0.8,1.25,0.75,0.8,0.5,0,0.5,0,0.1,-0.5,-0.25,0.55,0.25,0.85,1.6,-2.3,-2.05,-0.5,1.05,0,-0.65,-2.35,-1.25,-0.6,-1.75,0,0.75,1.5,0.55,0,-1.25,-0.5,0,0,-0.7,-1.35,-0.15,0.45,0,0,0.85,2.6,0,0,-0.75,-0.25,0,2.25,0,-0.5,0.5,0,1.5,0,1.1,0,0.8,-1,-0.5,1.55,-0.25,0.25,0,0,-0.2,0,-0.5,0.5,0,-0.75,0.25,-1.25,0,-0.25,0,0,0.4,1.3,0.05,-0.5,0,0.6,0.75,-0.25,0,2.3,1.55,-0.25,0.25,0,0,1.65,0,0.5,0.75,0,-1,-0.5,-0.25,0,-0.75,0,0.6,1.5,0.25,0,0,0.8,1.25,-0.15,0,0,0,0,0.75,-0.5,0,0,1.25,1.35,2.1

## Syuzhet_of_a_Novel
# Syuzhet of a Novel


```R
library(syuzhet)
library(readr)

# Load file
pog_v <- get_sentences(read_file("../texts/banks/Player_of_Games.txt"))
	The Layton Court Mystery

	by Anthony Berkeley



	Contents

	I. Eight o’Clock in the Morning
	II. An Interrupted Breakfast
	#! /usr/bin/env python

	'''
	sentiments.py compares the outputs of the sentimental modules listed below.
	Functionality to be added: normalization and smoothing.
	(I haven't implemented the NLTK solution because I don't have classified texts.)
	'''

	# Imports
	import matplotlib.pyplot as plt
	\documentclass[12pt, letter]{article}
	\usepackage[margin=1in]{geometry}
	\setlength{\parindent}{0em}
	\setlength{\parskip}{0.5em}
	\newif\ifdraft
	\drafttrue % or \draftfalse


	\begin{document}
	What I've been working on for the past few days is in preparation for attempting a topic model using the more established LDA instead of the NMF to see how well they compare -- with the understanding that since there is rarely a one-to-one matchup within either method, that there will be no such match across them.

	Because LDA does not filter out common words on its own, the way the NMF method does, you have to start with a stoplist. I know we can begin with Blei's and a few other established lists, but I would also like to be able to compare that against our own results. My first thought was to build a dictionary of words and their frequency within the corpus. For convenience sake, I am using the NLTK.

	Just as a record of what I've done, here's the usual code for loading the talks from the CSV with everything in it:

	```python
	import pandas
	import re
	Here's the text as I wrote it in Markdown and as it sits in the WP editing pane:

	We still need to identify which talks have floats for values and determine what impact, if any, it has on the project.

	```python
	import nltk

	tt_tokens = nltk.word_tokenize(all_words)

	tt_freq = {}
	""" Example using GenSim's LDA and sklearn. """

	import numpy as np

	from gensim import matutils
	from gensim.models.ldamodel import LdaModel
	from sklearn import linear_model
	from sklearn.datasets import fetch_20newsgroups
	from sklearn.feature_extraction.text import CountVectorizer
	# Syuzhet of a Novel


	```R
	library(syuzhet)
	library(readr)

	# Load file
	pog_v <- get_sentences(read_file("../texts/banks/Player_of_Games.txt"))