zerolocker zerolocker

## 《预测、学习和博弈》学习笔记
Prediction,Learning,and Games
========================
+ Idea from P14: Would it be possible to select some weight vectors which have smaller loss function values    when training Algorithm 2, and average those weight vectors $w$ *in some manner to be designed* so as to obtain a more accurate/better weight vector $w$?


> Written with [StackEdit](https://stackedit.io/).

## FuncAnal_Week1_Prob2
Proof of Problem2 in Week 1
=====================
Proof: Let A ⊂ (X, T ). Then $A = \bar A$ iff A is closed.
The proof consists of three parts:

+ Part 1: proving for any A, $A \subseteq \bar A$
+ Part 2: proving **Problem 3**: Given a topological space (X, T), a set A ⊂ X is open if and only if every point x ∈ A has an open neighborhood contained in A.
+ Part 3: Using the proved proposition of Part 2 to complete the rest of the proof.

Part 1

## lazy update

### 2014-9-12
#### **lazy update**
+  This is an solution used to deal with the issue mentioned in **2014-8-28**.
+ It is described in this paper: *Efficient Online and Batch Learning Using Forward Backward Splitting, Section 6. Efficient Implementation in High Dimensions*.
+ This solution is quite simple. Here I describe it in a few lines.
First let's see what's our problem to solve. Typically, the cost function to be minimized could be summarized as $loss + regularization$. If I consider the cost function of only one example $\vec x$, which is true in SGD method, the cost function will be: $ J = l(\vec x) + \lambda r(\vec w)$.In SGD method, we need to compute the gradient w.r.t every component of $\vec w$ (denoted by $w_j$), and perform an update of  $w_j\leftarrow w_j-\eta\frac{\partial{J}}{\partial{\vec w_j}}  $,  where $\frac{\partial{J}}{\partial{\vec w_j}} =\frac{\partial l(\vec x)}{\partial w_j} + \lambda \frac{\partial r(\vec w)}{\partial w_j} $
The first term $\frac{\partial l(\vec x)}{\partial w_
	Prediction,Learning,and Games
	========================
	+ Idea from P14: Would it be possible to select some weight vectors which have smaller loss function values when training Algorithm 2, and average those weight vectors $w$ in some manner to be designed so as to obtain a more accurate/better weight vector $w$?


	> Written with [StackEdit](https://stackedit.io/).
	Proof of Problem2 in Week 1
	=====================
	Proof: Let A ⊂ (X, T ). Then $A = \bar A$ iff A is closed.
	The proof consists of three parts:

	+ Part 1: proving for any A, $A \subseteq \bar A$
	+ Part 2: proving Problem 3: Given a topological space (X, T), a set A ⊂ X is open if and only if every point x ∈ A has an open neighborhood contained in A.
	+ Part 3: Using the proved proposition of Part 2 to complete the rest of the proof.

	Part 1

	### 2014-9-12
	#### lazy update
	+ This is an solution used to deal with the issue mentioned in 2014-8-28.
	+ It is described in this paper: Efficient Online and Batch Learning Using Forward Backward Splitting, Section 6. Efficient Implementation in High Dimensions.
	+ This solution is quite simple. Here I describe it in a few lines.
	First let's see what's our problem to solve. Typically, the cost function to be minimized could be summarized as $loss + regularization$. If I consider the cost function of only one example $\vec x$, which is true in SGD method, the cost function will be: $ J = l(\vec x) + \lambda r(\vec w)$.In SGD method, we need to compute the gradient w.r.t every component of $\vec w$ (denoted by $w_j$), and perform an update of $w_j\leftarrow w_j-\eta\frac{\partial{J}}{\partial{\vec w_j}} $, where $\frac{\partial{J}}{\partial{\vec w_j}} =\frac{\partial l(\vec x)}{\partial w_j} + \lambda \frac{\partial r(\vec w)}{\partial w_j} $
	The first term $\frac{\partial l(\vec x)}{\partial w_