Mahmut CAVDAR mcavdar

## standoff2corenlp.py
# A python script to turn annotated data in standoff format (brat annotation tool) to the formats expected by Stanford NER and Relation Extractor models
# - NER format based on: http://nlp.stanford.edu/software/crf-faq.html#a
# - RE format based on: http://nlp.stanford.edu/software/relationExtractor.html#training

# Usage:
# 1) Install the pycorenlp package
# 2) Run CoreNLP server (change CORENLP_SERVER_ADDRESS if needed)
# 3) Place .ann and .txt files from brat in the location specified in DATA_DIRECTORY
# 4) Run this script

## jq to filter by value.md

      
              1 file
            
          
              20 forks
            
          
              14 comments
            
          
              194 stars
            
          
                ipbastola
                / jq to filter by value.md
            
            
              Last active
              June 21, 2024 14:29
            
              
                JQ to filter JSON by value
              
          
    JQ to filter JSON by value

Syntax: cat <filename> | jq -c '.[] | select( .<key> | contains("<value>"))'
Example: To get json record having _id equal 611
cat my.json | jq -c '.[] | select( ._id | contains(611))'
Remember: if JSON value has no double quotes (eg. for numeric) to do not supply in filter i.e. in contains(611)

  
## replies.py
#!/usr/bin/env python

"""

Twitter's API doesn't allow you to get replies to a particular tweet. Strange
but true. But you can use Twitter's Search API to search for tweets that are
directed at a particular user, and then search through the results to see if
any are replies to a given tweet. You probably are also interested in the
replies to any replies as well, so the process is recursive. The big caveat
here is that the search API only returns results for the last 7 days. So

## Spark_OnlineLDA_wikipedia_example.scala
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector

import sqlContext.implicits._

val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000

## LDA_SparkDocs
/*
This example uses Scala.  Please see the MLlib documentation for a Java example.

Try running this code in the Spark shell.  It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
Spark: http://spark.apache.org/
*/

import scala.collection.mutable
	# A python script to turn annotated data in standoff format (brat annotation tool) to the formats expected by Stanford NER and Relation Extractor models
	# - NER format based on: http://nlp.stanford.edu/software/crf-faq.html#a
	# - RE format based on: http://nlp.stanford.edu/software/relationExtractor.html#training

	# Usage:
	# 1) Install the pycorenlp package
	# 2) Run CoreNLP server (change CORENLP_SERVER_ADDRESS if needed)
	# 3) Place .ann and .txt files from brat in the location specified in DATA_DIRECTORY
	# 4) Run this script
	#!/usr/bin/env python

	"""

	Twitter's API doesn't allow you to get replies to a particular tweet. Strange
	but true. But you can use Twitter's Search API to search for tweets that are
	directed at a particular user, and then search through the results to see if
	any are replies to a given tweet. You probably are also interested in the
	replies to any replies as well, so the process is recursive. The big caveat
	here is that the search API only returns results for the last 7 days. So
	import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
	import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
	import org.apache.spark.mllib.linalg.Vector

	import sqlContext.implicits._

	val numTopics: Int = 100
	val maxIterations: Int = 100
	val vocabSize: Int = 10000
	/*
	This example uses Scala. Please see the MLlib documentation for a Java example.

	Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

	This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
	Spark: http://spark.apache.org/
	*/

	import scala.collection.mutable