Skip to content

Instantly share code, notes, and snippets.

View sudeepraja's full-sized avatar

Sudeep Raja sudeepraja

View GitHub Profile
@sudeepraja
sudeepraja / online-newton-step.ipynb
Created March 4, 2023 21:16
Online Newton Step.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sudeepraja
sudeepraja / smooth-prediction.ipynb
Created January 7, 2023 04:00
Smooth Prediction.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sudeepraja
sudeepraja / exponentiated-gradient.ipynb
Created December 6, 2022 23:21
Exponentiated Gradient.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sudeepraja
sudeepraja / cover-s-two-stock.ipynb
Last active December 6, 2022 21:00
cover-s-two-stock.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sudeepraja
sudeepraja / ftl.ipynb
Last active December 6, 2022 21:00
ftl.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sudeepraja
sudeepraja / icp.md
Last active February 21, 2018 05:18

Let $X = {x_1,x_2,..,x_n}$ and $Y = {y_1,y_2,..,y_n}$ be two set of points, where $x_i,y_i \in \mathbb{R}^{n\times k}, i \in[n]$. The goal is to find a permutation $\pi:[n]\to[n]$, orthogonal(rotation) matrix $R \in \mathbb{R}^{k \times k}$ and a translation vector $t$ such that the following error is minimized:

$$\sum_{i=1}^n |Rx_{\pi(i)} + t - y_i|_2^2$$

Another equivalent error is:

$$\sum_{i=1}^n |Rx_i + t - y_{\pi(i)}|_2^2$$

These errors can be written in matrix form:

This blog post is about the Multi Armed Bandit(MAB) problem and about the Exploration-Explotation dilemma faced in reinforcement learning. MABs find applications in areas such as advertising, drug trials, website optimization, packet routing and resource allocation.

A Multi Armed Bandit consists of $K$ arms, $K\ge2$ numbered from $1$ to $K$. Each arm $i$ is associated with an unknown probability distribution $P_i$ with whose mean is $\mu_i$. Pulling the $i$th arm produces a reward $r$ which is sampled from $P_i$. There is an agent which has a budget of $T$ arm pulls. The task of the agent is to maximise the accumulated reward after $T$ arm pulls.

$$Maximise \quad \sum_{t=1}^T r_{it} \quad \mbox{where } r_{it} \sim P_i $$

The arm which has the highest mean reward is called as the optimal arm. Let $i^$ be the optimal arm and $\mu^$ be its mean reward. Another way of maximising the cumulative reward is by minimising the cumulative expected Regret.

$$Regret = \sum_{t=1}^Tr_{i^}-r_{it}$$ $$\begin{align}

@sudeepraja
sudeepraja / bandits.py
Last active June 18, 2023 23:04
Code to test different exploration strategies for Multi armed bandits
import numpy as np
import matplotlib.pyplot as plt
import math
number_of_bandits=10
number_of_arms=10
number_of_pulls=30000
epsilon=0.3
temperature=10.0
min_temp = 0.1
decay_rate=0.999
layout title published
post
Backpropagation in Matrix Form
true

Backpropagation is an algorithm used to train neural networks, used along with an optimization routine such as gradient descent. Gradient descent requires access to the gradient of the loss function with respect to all the weights in the network to perform a weight update, in order to minimize the loss function. Backpropagation computes these gradients in a systematic way. Backpropagation along with Gradient descent is arguably the single most important algorithm for training Deep Neural Networks and could be said to be the driving force behind the recent emergence of Deep Learning.

Any layer of a neural network can be considered as an Affine Transformation followed by application of a non linear function. A vector is received as input and is multiplied with a matrix to produce an output , t