Skip to content

Instantly share code, notes, and snippets.

This file has been truncated, but you can view the full file.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1 High Dimensional Model"
]
},
{
@bbzzzz
bbzzzz / gist:47eca1d7f0a6190f9967
Last active November 10, 2015 20:59
Data Analysis HW05 Bohan Zhang
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1 High Dimensional Model"
]
},
{
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This code is for ZestFinance modeling team interview homework assisgnment. ML algorithms including Regularized Logistic Regression, Elastic Net, Random Fores and Gradient Boosting (xgboost) are applied."
]
},
{
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
@bbzzzz
bbzzzz / download_report
Created April 16, 2015 23:59
Webscrape all XBRL files given stock ticker
import urllib2
from bs4 import BeautifulSoup as BeautifulSoup
def get_list(ticker):
base_url_part1 = "http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="
base_url_part2 = "&type=&dateb=&owner=&start="
base_url_part3 = "&count=100&output=xml"
href = []
@bbzzzz
bbzzzz / 1.0 README
Last active May 4, 2021 21:00
IMDB Sentiment Analysis using Naive Bayes
Sentiment Analysis using Naive Bayes
====================================
* Naive Bayes
* Add-1 smoothing
* 10-fold cross validation
* regular expression detecting negation words
Besides the regular method, the code also realized:
* Boolean Naive Bayes
* Naive Bayes with stop word
@bbzzzz
bbzzzz / README
Last active August 29, 2015 14:19
IMDB review Sentiment Analysis based on Support Vector Machine
Sentiment Analysis using sklearn
=================================
* sklearn LinearSVC
* 10-fold cross validation
* accuracy 88.45%
@bbzzzz
bbzzzz / README
Last active August 29, 2015 14:18 — forked from larsmans/README
Sentiment analysis experiment using scikit-learn
================================================
The script sentiment.py reproduces the sentiment analysis approach from Pang,
Lee and Vaithyanathan (2002), who tried to classify movie reviews as positive
or negative, with three differences:
* tf-idf weighting is applied to terms
* the three-fold cross validation split is different
* regularization is tuned by cross validation
@bbzzzz
bbzzzz / Word Similarity
Last active August 29, 2015 14:16
Cosine Similarity, for NLP class presentation - ipython notebook version: http://nbviewer.ipython.org/gist/bozhang0504/5f67575d1397416b0f3d
import nltk
from nltk.corpus import wordnet as wn
### Synsets and lemmas
# For an arbitrary word, i.e. dog, it may have different senses, and we can find its synsets.
wn.synsets('dog')
# Once you have a synset, there are functions to find the information on that synset,
# and we will start with “lemma_names”, “lemmas”, “definitions” and “examples”.
# For the first synset 'dog.n.01', which means the first noun sense of ‘dog’, we can first find all of its words/lemma names.
@bbzzzz
bbzzzz / WordNet Interface
Created March 2, 2015 16:21
Natrual Language Processing - Word Meaning and Word Similarity
{
"metadata": {
"name": "Wordnet Interface"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{