Skip to content

Instantly share code, notes, and snippets.

View wolframalpha's full-sized avatar
💭
knobs!

Devi Prasad Khatua wolframalpha

💭
knobs!
View GitHub Profile
@wolframalpha
wolframalpha / data_extraction.py
Last active February 22, 2016 14:26
SAMPLE DATA EXTRACTION - custom pos tagger !!
# this should work - confirm it !!
def get_lists(filename='evaluation3_without\n.txt')
with open(filename, 'r') as outfile:
# filename of the file w/o '\n'
lines = outfile.readlines()
data_list = [eval(lists.strip('\n')) for index, lists in enumerate(lines) if (index+1)%2 == 0]
return data_list
@wolframalpha
wolframalpha / ExperienceTagger.py
Last active March 24, 2017 07:00
Please extract the NP using this !
# coding: utf-8
import pickle
import re
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import numpy as np
import pandas as pd
from nltk.tag import pos_tag, pos_tag_sents
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

dateextractor

Basic Example is given in the DateFinder.py file itself!

from DateExtractor import DateFinder

df = DateFinder()

# This `find_dates(text)` returns a list of datetime objects taking a string/buffer as a parameter 
date_list = df.find_dates('<foobar> (oct 2013 to december 2014) </foobar>')
{% load staticfiles %}<!--
<script type="text/javascript" src="http://google-maps-utility-library-v3.googlecode.com/svn/trunk/markerwithlabel/src/markerwithlabel.js"></script> -->
<script type="text/javascript">
function inherits(childCtor, parentCtor) {
/** @constructor */
function tempCtor() {};
tempCtor.prototype = parentCtor.prototype;
childCtor.superClass_ = parentCtor.prototype;
childCtor.prototype = new tempCtor();
> Official Google news API - depreciated
Alternative:
- https://newsapi.org/
- Free to use while not clearly mentioned when the limit is breached
- Mulitple&Huge news sources - can explicit select one of those https://newsapi.org/sources
- Key/Pair auth
- response type: JSON
> Google finance API - depriciated/ Yahoo finance API - depreciated
Alternative:
@wolframalpha
wolframalpha / generate_result.py
Last active September 18, 2017 13:52
Append the script to the bottom of the notebook
import itertools
import pandas as pd
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.ml.regression import LinearRegression
results = []
# START ----- VARIABLES
mandatory_columns = set(['Log_Price'])
from pyspark.sql.functions import udf
from pyspark.sql.types import *
schema = StructType([
StructField("foo", FloatType(), False),
StructField("bar", FloatType(), False)
])
udf(function(), schema)
from pyspark import SparkConf,SparkContext
from pyspark.sql.functions import *
from pyspark.sql import *
from pyspark.sql.types import *
configs = [('spark.eventLog.enabled', 'true'),
('spark.dynamicAllocation.minExecutors', '8'),
('spark.executor.instances', '1000'),
('spark.driver.host', '10.142.0.3'),
('spark.yarn.am.memory', '640m'),
@wolframalpha
wolframalpha / vmops.sh
Last active January 3, 2018 05:03
Bash script to STOP/START VMs in a cluster
#!/bin/bash
VM=$1
OPS=$2
NODES=$4
ZONE=$3
vms=($VM-m)
for (( i=0 ; i<$NODES-1; i++ ))
do