Skip to content

Instantly share code, notes, and snippets.

@petered
Created August 29, 2017 12:26
Show Gist options
  • Save petered/9b0f3aad702a6cdf8a34cf0682fa94eb to your computer and use it in GitHub Desktop.
Save petered/9b0f3aad702a6cdf8a34cf0682fa94eb to your computer and use it in GitHub Desktop.
2017-08-29 Parameter Tuning
# Distributed Low-Bit Computation
Suppose we're trying to communicate a scalar parameter $\theta$ from a worker $W$ to a server $S$.
$\theta$ changes with time $t$. The worker simply communicates bits of theta asynchronously - so if it sends a bit $b\in {0, 1}$ at time $t\in \mathbb I^+$ we say that the worker communicated a message $(b, t)$. If the worker sends M messages between times $t_1$ and $t_2$, we say $N_{t_1}^{t_2} = M$
The Server takes in these bits and uses them to build a distribution $p(\hat \theta)$ over the current value of theta.
**Can we create an encoding with the following properties?:**
1. When $\theta$ stops changing, the Server's estimate of $\theta$ converges to the correct value: i.e.
IF: $\theta_t= \theta_T \forall t>T$
THEN: $\lim_{t\rightarrow \infty} p(\hat\theta)_t = \delta(\theta_T-\hat \theta_t)$
2. When $\theta$ stops changing, the communication per unit time approaches 0: i.e.:
IF: $\theta_t= \theta_T \forall t>T$
THEN: $\lim_{t\rightarrow \infty} \frac{1}{t-T} {N}_T^t = 0$
**Can we mix this with the notion of Elasticity from EASGD, so that we have the following properties:**
1. When $\hat \theta$ is broadly distributed, the effect of this value on the parameter server (the elascicity $\alpha$) is weak. i.e.
$\lim_{\mathbb V[{\hat \theta}] \rightarrow \infty} \alpha = 0$
2. When theta is known exacly, the network parameters are in perfect sync. i.e.:
IF $p(\hat \theta) = \delta(\hat \theta - \theta)$
THEN: $\theta = \theta_{Server}$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment