Skip to content

Instantly share code, notes, and snippets.

View mcavdar's full-sized avatar

Mahmut CAVDAR mcavdar

View GitHub Profile
@thatguysimon
thatguysimon / standoff2corenlp.py
Last active October 4, 2023 12:04
A python script to convert annotated data in standoff format (brat annotation tool) to the formats expected by Stanford NER and Relation Extractor models
# A python script to turn annotated data in standoff format (brat annotation tool) to the formats expected by Stanford NER and Relation Extractor models
# - NER format based on: http://nlp.stanford.edu/software/crf-faq.html#a
# - RE format based on: http://nlp.stanford.edu/software/relationExtractor.html#training
# Usage:
# 1) Install the pycorenlp package
# 2) Run CoreNLP server (change CORENLP_SERVER_ADDRESS if needed)
# 3) Place .ann and .txt files from brat in the location specified in DATA_DIRECTORY
# 4) Run this script
@ipbastola
ipbastola / jq to filter by value.md
Last active June 21, 2024 14:29
JQ to filter JSON by value

JQ to filter JSON by value

Syntax: cat <filename> | jq -c '.[] | select( .<key> | contains("<value>"))'

Example: To get json record having _id equal 611

cat my.json | jq -c '.[] | select( ._id | contains(611))'

Remember: if JSON value has no double quotes (eg. for numeric) to do not supply in filter i.e. in contains(611)

@edsu
edsu / replies.py
Last active December 7, 2022 18:59
Try to get replies to a particular set of tweets, recursively.
#!/usr/bin/env python
"""
Twitter's API doesn't allow you to get replies to a particular tweet. Strange
but true. But you can use Twitter's Search API to search for tweets that are
directed at a particular user, and then search through the results to see if
any are replies to a given tweet. You probably are also interested in the
replies to any replies as well, so the process is recursive. The big caveat
here is that the search API only returns results for the last 7 days. So
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
@jkbradley
jkbradley / LDA_SparkDocs
Created March 24, 2015 23:56
LDA Example: Modeling topics in the Spark documentation
/*
This example uses Scala. Please see the MLlib documentation for a Java example.
Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.
This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/
import scala.collection.mutable