Skip to content

Instantly share code, notes, and snippets.

@smartinsightsfromdata
smartinsightsfromdata / TDA_resources.md
Created October 8, 2017 18:48 — forked from calstad/TDA_resources.md
List of resources for TDA

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Other Papers and Web Resources

@smartinsightsfromdata
smartinsightsfromdata / attention_w2v-2017-02-25-1321.py
Created September 5, 2017 22:26
Legal RTE - Attention model w/ w2v*tfidf
# coding: utf-8
import sys
import re
import datetime
import MeCab
import random
import numpy as np
# random.seed(1234)
# np.random.seed(1234)
@smartinsightsfromdata
smartinsightsfromdata / gensim2projector_tf.py
Created September 3, 2017 14:55 — forked from lampts/gensim2projector_tf.py
how to convert/port gensim word2vec to tensorflow projector board.
# required tensorflow 0.12
# required gensim 0.13.3+ for new api model.wv.index2word or just use model.index2word
from gensim.models import Word2Vec
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
# loading your gensim
model = Word2Vec.load("YOUR-MODEL")
@smartinsightsfromdata
smartinsightsfromdata / mice_imp.R
Created April 14, 2017 14:59 — forked from mick001/mice_imp.R
Imputing missing data with R; MICE package: Full article at http://datascienceplus.com/imputing-missing-data-with-r-mice-package/
# Using airquality dataset
data <- airquality
data[4:10,3] <- rep(NA,7)
data[1:5,4] <- NA
# Removing categorical variables
data <- airquality[-c(5,6)]
summary(data)
#-------------------------------------------------------------------------------
@smartinsightsfromdata
smartinsightsfromdata / dplyr-postgres-sessionizing.R
Created October 4, 2016 13:17 — forked from randyzwitch/dplyr-postgres-sessionizing.R
Sessionizing Log File Data Using dplyr
###Sessionization using dplyr
library(dplyr)
#Open a localhost connection to Postgres
#Use table 'single_col_timestamp'
#group by uid and sort by timestamp for window function
#Do minutes calculation, working around missing support for extract(epoch from timestamp)
#Calculate event boundary and unique id via cumulative sum window function
sessions <-
@smartinsightsfromdata
smartinsightsfromdata / Sankey.R
Created April 18, 2016 12:41 — forked from aaronberdanier/Sankey.R
Program for creating sankey diagrams in R
SankeyR <- function(inputs, losses, unit, labels, format="plot"){
########################
# SankeyR version 1.01 (updated August 10, 2010)
# is a function for creating Sankey Diagrams in R.
# See http://www.sankey-diagrams.com for excellent examples of Sankey Diagrams.
#
# OPTIONS:
# 'inputs' is a vector of input values
# 'losses' is a vector of loss values
# 'unit' is a string of the unit
@smartinsightsfromdata
smartinsightsfromdata / create_doc2vec.py
Created March 28, 2016 20:10 — forked from vierja/create_doc2vec.py
Create Doc2Vec using Elasticsearch (while processing the data in parallel)
from lxml import etree
from elasticsearch.helpers import scan
from elasticsearch import Elasticsearch
from multiprocessing import Pool
import bz2
import gensim
import itertools
import logging
import nltk
import os
@smartinsightsfromdata
smartinsightsfromdata / nltk-notebook.ipynb
Created February 28, 2016 16:14 — forked from MHenderson/nltk-notebook.ipynb
NLTK IPython Notebook
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@smartinsightsfromdata
smartinsightsfromdata / readme.md
Created February 13, 2016 16:11 — forked from baraldilorenzo/readme.md
VGG-19 pre-trained model for Keras

##VGG19 model for Keras

This is the Keras model of the 19-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

@smartinsightsfromdata
smartinsightsfromdata / gensim_workflow.py
Created January 8, 2016 07:44 — forked from clemsos/gensim_workflow.py
How to calculate TF-IDF similarity matrix of a complete corpus with Gensim
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
This script just show the basic workflow to compute TF-IDF similarity matrix with Gensim
OUTPUT :