Skip to content

Instantly share code, notes, and snippets.

@telvis07
telvis07 / ngram_prune.R
Last active July 1, 2016 14:18
NLP - prune ngrams by finding the minimum number of ngrams that cover X percent of the word instances
prune_ngram_df_by_cover_percentage <- function(df, percentage) {
# assumes df contains columns (word, freq)
# assumes df is sorted by freq in descending order
# prune ngrams by finding the minimum number of ngrams that cover X percent of the word instances
sums <- cumsum(df$freq)
cover <- which(sums >= sum(df$freq) * percentage)[1]
print(sprintf("%s of %s (%s%%) cover %s%% of word instances",
cover,
nrow(df),
cover/nrow(df)*100,
{
"facets": {
"terms": {
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
# index the information for user with id 2, specifically, its friends
curl -XPUT localhost:9200/users/user/2 -d '{
"friends" : ["1", "3"]
}'
# index a tweet, from user with id 2
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"user" : "1",
"tweet" : "hi i am user 1 "
}'
@telvis07
telvis07 / multi_field_nested_test.sh
Created December 11, 2012 16:53
multi-field in a nested document
curl -XDELETE localhost:9200/test
curl -XPOST localhost:9200/test -d '
{
"mappings": {
"type1": {
"properties": {
"message": {
"index": "analyzed",
"type": "string"
@telvis07
telvis07 / elasticsearch_multi_field_geo_point.sh
Created December 6, 2012 19:17
multi-field with geo_point in a nested document
curl -XPOST localhost:9200/test -d '
{
"mappings": {
"type1": {
"properties": {
"message": {
"index": "analyzed",
"type": "string"
},
"depart": {