###IPAM - UCLA
Aurelia Moser, @auremoser, aurelia@mozillafoundation.org
May 30th, 2016
Instagram @chazhutton
Find this document here:
The Layton Court Mystery | |
by Anthony Berkeley | |
Contents | |
I. Eight o’Clock in the Morning | |
II. An Interrupted Breakfast |
\documentclass[12pt, letter]{article} | |
\usepackage[margin=1in]{geometry} | |
\setlength{\parindent}{0em} | |
\setlength{\parskip}{0.5em} | |
\newif\ifdraft | |
\drafttrue % or \draftfalse | |
\begin{document} |
What I've been working on for the past few days is in preparation for attempting a topic model using the more established LDA instead of the NMF to see how well they compare -- with the understanding that since there is rarely a one-to-one matchup within either method, that there will be no such match across them. | |
Because LDA does not filter out common words on its own, the way the NMF method does, you have to start with a stoplist. I know we can begin with Blei's and a few other established lists, but I would also like to be able to compare that against our own results. My first thought was to build a dictionary of words and their frequency within the corpus. For convenience sake, I am using the NLTK. | |
Just as a record of what I've done, here's the usual code for loading the talks from the CSV with everything in it: | |
```python | |
import pandas | |
import re |
Here's the text as I wrote it in Markdown and as it sits in the WP editing pane: | |
We still need to identify which talks have floats for values and determine what impact, if any, it has on the project. | |
```python | |
import nltk | |
tt_tokens = nltk.word_tokenize(all_words) | |
tt_freq = {} |
#! /usr/bin/env python | |
''' | |
sentiments.py compares the outputs of the sentimental modules listed below. | |
Functionality to be added: normalization and smoothing. | |
(I haven't implemented the NLTK solution because I don't have classified texts.) | |
''' | |
# Imports | |
import matplotlib.pyplot as plt |
""" Example using GenSim's LDA and sklearn. """ | |
import numpy as np | |
from gensim import matutils | |
from gensim.models.ldamodel import LdaModel | |
from sklearn import linear_model | |
from sklearn.datasets import fetch_20newsgroups | |
from sklearn.feature_extraction.text import CountVectorizer |
###IPAM - UCLA
May 30th, 2016
Instagram @chazhutton
Find this document here:
0.6,0.25,0.5,0.5,-0.85,-3.05,0.25,0,0,1,-1.3,0,0.1,0.6,-0.25,0.8,0.25,2,0,0,0.6,0.75,1.5,1,-1.2,-2.55,2.5,1.5,0.7,3.55,-1.2,1.5,1.45,0,-0.6,1.1,-0.4,0.25,0.1,1.2,0.5,-3.55,-0.6,0,-0.5,-0.35,0,0.5,-2.45,2.35,-1.5,0.75,0,0.5,0.8,-1.35,0,1.05,0,-0.75,1.85,0.25,1.25,0,-3.5,0,0,-0.2,-1,1.7,0.65,-2,0,-1.5,0.75,0.5,-3.45,0,0.5,2.4,0,-0.75,-1.5,0,-3,1,0,0.5,-0.75,-1.05,-0.75,0,-2,0,0.5,0.75,0.5,0.5,0.2,-0.5,0.25,0,-0.75,0.5,1.25,1.3,0,0.8,0,0,0,1.2,0,0.5,0,2.15,-0.75,0.5,0.1,-0.35,0,1.5,0,-0.25,-0.25,-2.1,-1.25,0.25,0,0,-0.5,-0.75,0.9,0,0,0,0.8,0.8,1.25,0.75,0.8,0.5,0,0.5,0,0.1,-0.5,-0.25,0.55,0.25,0.85,1.6,-2.3,-2.05,-0.5,1.05,0,-0.65,-2.35,-1.25,-0.6,-1.75,0,0.75,1.5,0.55,0,-1.25,-0.5,0,0,-0.7,-1.35,-0.15,0.45,0,0,0.85,2.6,0,0,-0.75,-0.25,0,2.25,0,-0.5,0.5,0,1.5,0,1.1,0,0.8,-1,-0.5,1.55,-0.25,0.25,0,0,-0.2,0,-0.5,0.5,0,-0.75,0.25,-1.25,0,-0.25,0,0,0.4,1.3,0.05,-0.5,0,0.6,0.75,-0.25,0,2.3,1.55,-0.25,0.25,0,0,1.65,0,0.5,0.75,0,-1,-0.5,-0.25,0,-0.75,0,0.6,1.5,0.25,0,0,0.8,1.25,-0.15,0,0,0,0,0.75,-0.5,0,0,1.25,1.35,2.1 |
# Syuzhet of a Novel | |
```R | |
library(syuzhet) | |
library(readr) | |
# Load file | |
pog_v <- get_sentences(read_file("../texts/banks/Player_of_Games.txt")) |
# Syuzhet Outputs | |
## Sentiment by Sentence for 4 Small Texts | |
```R | |
# Load libraries | |
library(syuzhet) | |
library(readr) |