Skip to content

Instantly share code, notes, and snippets.

View MerouaneBen's full-sized avatar
🎯
Focusing

Merouane MerouaneBen

🎯
Focusing
View GitHub Profile
@tamizhgeek
tamizhgeek / spark-yarn-emr-client.rb
Created February 10, 2018 12:55
Chef Recipe for remote spark-submit setup to YARN running on Amazon EMR
# setup for the hadoop + spark env for airflow
remote_file "/home/airflow/spark.tgz" do
source "/remote/spark/download/url"
owner "airflow"
group "airflow"
mode '0755'
not_if { File.exists?("/home/airflow/spark.tgz") }
end
@MerouaneBen
MerouaneBen / Nielsen2012Python_case.py
Created November 5, 2015 20:32 — forked from fnielsen/Nielsen2012Python_case.py
Text mining example in Python
# $Id: Nielsen2012Python_case.py,v 1.2 2012/09/02 16:55:25 fn Exp $
# Define a url as a Python string (note we are only getting 100 documents)
url = "http://wikilit.referata.com/" + \
"wiki/Special:Ask/" + \
"-5B-5BCategory:Publications-5D-5D/" + \
"-3FHas-20author%3DAuthor(s)/-3FYear/" + \
"-3FPublished-20in/-3FAbstract/-3FHas-20topic%3DTopic(s)/" + \
"-3FHas-20domain%3DDomain(s)/" + \
"format%3D-20csv/limit%3D-20100/offset%3D0"