myleott/gist:cdf685b8b3ce20b0221e1842782bce74

## gistfile1.txt
text: a b c </s> d e f g </s>

Suppose the model is trained with a context length of 4.
Then the most favorable way to evaluate your model's perplexity is:

batch 1:  a b c </s>
         |----------|    <-- count perplexity of this
batch 2:  b c </s> d
                  |-|    <-- count perplexity of this
batch 3:  c </s> d e
                  |-|    <-- count perplexity of this
batch 4:  </s> d e f
                  |-|    <-- count perplexity of this
batch 5:  d e f g
               |-|       <-- count perplexity of this
batch 6:  e f g </s>
               |----|    <-- count perplexity of this
	text: a b c </s> d e f g </s>

	Suppose the model is trained with a context length of 4.
	Then the most favorable way to evaluate your model's perplexity is:

	batch 1: a b c </s>
	\|----------\| <-- count perplexity of this
	batch 2: b c </s> d
	\|-\| <-- count perplexity of this
	batch 3: c </s> d e
	\|-\| <-- count perplexity of this
	batch 4: </s> d e f
	\|-\| <-- count perplexity of this
	batch 5: d e f g
	\|-\| <-- count perplexity of this
	batch 6: e f g </s>
	\|----\| <-- count perplexity of this