Skip to content

Instantly share code, notes, and snippets.

@zdepablo
zdepablo / tag-sentiment-textalytics.py
Created March 3, 2014 15:34
Tag a twitter using Textalytics Media Analysis API
import smaclient
license_key = <textalytics-license-key>;
textalytics = smaclient.SmaClient(license_key)
doc = smaclient.Document('0', 'Italia se indigna por la negativa de #Barilla a hacer anuncios con gays')
doc.language = 'es'
doc.source = 'TWITTER'
@zdepablo
zdepablo / search-twitter.py
Created March 3, 2014 15:36
Search tweets using TwitterAPI
from TwitterAPI import TwitterAPI
# Go to http://dev.twitter.com and create an app.
# The consumer key and secret will be generated for you after
consumer_key = <consumer-key>
consumer_secret = <consumer-secret>
# After the step above, you will be redirected to your app's page.
# Create an access token under the the "Your access token" section
access_token_key = <access-token-key>
# Credit http://stackoverflow.com/a/2514279
for branch in `git branch -r | grep -v HEAD`;do echo -e `git show --format="%ci %cr" $branch | head -n 1` \\t$branch; done | sort -r
@zdepablo
zdepablo / extractranks.py
Last active August 29, 2015 14:13
Extract UEFA rankings for football team ranks from a HTML table
#!/usr/bin/python
# -*- coding: utf-8 -*-
from lxml import html,etree
import requests
import unicodecsv
def group(iterator, count):
itr = iter(iterator)
while True:
@zdepablo
zdepablo / 0_reuse_code.js
Last active August 29, 2015 14:13
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@zdepablo
zdepablo / gist:3587a6755b080b85136c
Last active August 29, 2015 14:13
textalytics-queries per use
#Number of active users per service - with a cutoff
SELECT `service`, COUNT(*) num_users
FROM
(
SELECT `service`, `hash_key`, COUNT(*) num_requests
FROM `log`
WHERE `date_operation` > '2014-12-01'
GROUP BY `service`, `hash_key`
ORDER BY num_requests DESC
@zdepablo
zdepablo / hadoop-fs-receipts
Last active August 29, 2015 14:24
Quick Receipts for Hadoop Filesystem
# Reference: http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
# Show disk usage in human format
hadoop fs -du -s -h /user/hive/warehouse/da_cdepablo*
# Show permissions
hadoop fs -getfacl /user/hive/warehouse/da_cdepablo*
# Change permissions
hadoop fs -setfacl -R -m other::rwx /user/hive/warehouse/da_cdepablo
@zdepablo
zdepablo / hive-receipts
Last active August 29, 2015 14:24
Hive receipts
# Overwrite non-partitioned table with their own contents
CREATE table xx_COPY LIKE xx;
INSERT OVERWRITE TABLE xx
SELECT * FROM xx
# Overwrite partitioned table with their own contents
CREATE table xx_COPY LIKE xx;
SHOW PARTITIONS ABC;
@zdepablo
zdepablo / split_strat_scale.r
Last active August 29, 2015 14:26 — forked from multidis/split_strat_scale.r
Stratified sampling: training / test data split preserving class distribution (caret functions) and scaling (standardize) the data. Stratified folds for CV.
library(caret)
## select training indices preserving class distribution
in.train <- createDataPartition(yclass, p=0.8, list=FALSE)
summary(factor(yclass))
ytra <- yclass[in.train]; summary(factor(ytra))
ytst <- yclass[-in.train]; summary(factor(ytst))
## standardize features: training parameters of scaling for test-part
Xtra <- scale(X[in.train,])
@zdepablo
zdepablo / gist:daf71447c82391c1b4311ffcceec2ebe
Last active June 21, 2016 09:05
Running remote debugger
# java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=12605 Main # Name of .class program
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/pr/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/hadoop/lib/native
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=12611 -cp ta rget/da_record_linkage-0.0.1-SNAPSHOT-jar-with-dependencies.jar da_record_linkage.TestSnappy
netstat -plten | grep LISTEN | grep :120* # See if there is any open port