Skip to content

Instantly share code, notes, and snippets.

@zdepablo
zdepablo / dynet-tagger.py
Created December 22, 2017 20:21 — forked from neubig/dynet-tagger.py
A small sequence labeler in DyNet
"""
DyNet implementation of a sequence labeler (POS taggger).
This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232
Basic architecture:
- take words
- run though bidirectional GRU
- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
- most recent word
@zdepablo
zdepablo / install_submodules.R
Last active December 5, 2017 21:00
R: how to Install submodules from git
install_submodule_git <- function(x, ...) {
install_dir <- tempfile()
system(paste("git clone --recursive", shQuote(x), shQuote(install_dir)))
devtools::install(install_dir, ...)
}
install_submodule_git("https://github.com/jonkeane/mocapGrip")
@zdepablo
zdepablo / brand-sentiment.py
Created March 3, 2014 15:19
Analyze Twitter sentiment for your brand using Textalytics Media Analysis API
import smaclient
from TwitterAPI import TwitterAPI
import matplotlib.pyplot as plt
# Go to http://dev.twitter.com and create an app.
# The consumer key and secret will be generated for you after
consumer_key = <consumer-key>
consumer_secret = <consumer-secret>
@zdepablo
zdepablo / gist:daf71447c82391c1b4311ffcceec2ebe
Last active June 21, 2016 09:05
Running remote debugger
# java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=12605 Main # Name of .class program
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/pr/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/hadoop/lib/native
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=12611 -cp ta rget/da_record_linkage-0.0.1-SNAPSHOT-jar-with-dependencies.jar da_record_linkage.TestSnappy
netstat -plten | grep LISTEN | grep :120* # See if there is any open port
@zdepablo
zdepablo / split_strat_scale.r
Last active August 29, 2015 14:26 — forked from multidis/split_strat_scale.r
Stratified sampling: training / test data split preserving class distribution (caret functions) and scaling (standardize) the data. Stratified folds for CV.
library(caret)
## select training indices preserving class distribution
in.train <- createDataPartition(yclass, p=0.8, list=FALSE)
summary(factor(yclass))
ytra <- yclass[in.train]; summary(factor(ytra))
ytst <- yclass[-in.train]; summary(factor(ytst))
## standardize features: training parameters of scaling for test-part
Xtra <- scale(X[in.train,])
@zdepablo
zdepablo / hive-receipts
Last active August 29, 2015 14:24
Hive receipts
# Overwrite non-partitioned table with their own contents
CREATE table xx_COPY LIKE xx;
INSERT OVERWRITE TABLE xx
SELECT * FROM xx
# Overwrite partitioned table with their own contents
CREATE table xx_COPY LIKE xx;
SHOW PARTITIONS ABC;
@zdepablo
zdepablo / hadoop-fs-receipts
Last active August 29, 2015 14:24
Quick Receipts for Hadoop Filesystem
# Reference: http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
# Show disk usage in human format
hadoop fs -du -s -h /user/hive/warehouse/da_cdepablo*
# Show permissions
hadoop fs -getfacl /user/hive/warehouse/da_cdepablo*
# Change permissions
hadoop fs -setfacl -R -m other::rwx /user/hive/warehouse/da_cdepablo
@zdepablo
zdepablo / gist:3587a6755b080b85136c
Last active August 29, 2015 14:13
textalytics-queries per use
#Number of active users per service - with a cutoff
SELECT `service`, COUNT(*) num_users
FROM
(
SELECT `service`, `hash_key`, COUNT(*) num_requests
FROM `log`
WHERE `date_operation` > '2014-12-01'
GROUP BY `service`, `hash_key`
ORDER BY num_requests DESC
@zdepablo
zdepablo / 0_reuse_code.js
Last active August 29, 2015 14:13
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@zdepablo
zdepablo / extractranks.py
Last active August 29, 2015 14:13
Extract UEFA rankings for football team ranks from a HTML table
#!/usr/bin/python
# -*- coding: utf-8 -*-
from lxml import html,etree
import requests
import unicodecsv
def group(iterator, count):
itr = iter(iterator)
while True: