Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
"""Information Retrieval metrics | |
Useful Resources: | |
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt | |
http://www.nii.ac.jp/TechReports/05-014E.pdf | |
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf | |
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf | |
Learning to Rank for Information Retrieval (Tie-Yan Liu) | |
""" | |
import numpy as np |
<!--To center an image--> | |
<p align="center"> | |
![alt]() | |
</p> | |
<!--To right align an image--> | |
<p align="right"> | |
![alt]() | |
</p> |
import numpy as np | |
import pylab as pl | |
from numpy import fft | |
def fourierExtrapolation(x, n_predict): | |
n = x.size | |
n_harm = 10 # number of harmonics in model | |
t = np.arange(0, n) | |
p = np.polyfit(t, x, 1) # find linear trend in x | |
x_notrend = x - p[0] * t # detrended x |
require(caret) | |
#load some data | |
data(USArrests) | |
### Prepare Data (postive observations) | |
# add a column to be the strata. In this case it is states, it can be sites, or other locations | |
# the original data has 50 rows, so this adds a state label to 10 consecutive observations | |
USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5)) | |
# this replaces the existing rownames (states) with a simple numerical index |
import math | |
def rgb_to_hsv(r, g, b): | |
r = float(r) | |
g = float(g) | |
b = float(b) | |
high = max(r, g, b) | |
low = min(r, g, b) | |
h, s, v = high, high, high |
Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
minmax_scaler <- function(x, a, b) { | |
" | |
x: data. numeric vector of values to be scaled | |
a: desired minimum after scaling takes place | |
b: desired maximum after scaling takes place | |
e.g. f(c(1,2,3,4), 1, 17) | |
[1] 1.000000 6.333333 11.666667 17.000000 | |
" | |
(((b - a)*(x - min(x))) / (max(x) - min(x))) + a |
The dplyr
package in R makes data wrangling significantly easier.
The beauty of dplyr
is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas
package).
dplyr is organised around six key verbs:
#!/usr/bin/env python | |
# coding=utf-8 | |
import numpy as np | |
""" | |
1: Procedure Policy_Iteration(S,A,P,R) | |
2: Inputs | |
3: S is the set of all states | |
4: A is the set of all actions | |
5: P is state transition function specifying P(s'|s,a) |
# -*- coding: UTF-8 -*- | |
import sys | |
from wit import Wit | |
from pyowm import OWM | |
from telegram import InlineKeyboardButton, InlineKeyboardMarkup | |
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, CallbackQueryHandler | |
import logging | |
DEFAULT_MAX_STEPS = 5 | |
# Enable logging |