Few-Shot (FS) is the term we will use in this work to refer to the setting where the model is given a few
demonstrations of the task at inference time as conditioning [RWC+19 ], but no weight updates are allowed.
As shown in Figure 2.1, for a typical dataset an example has a context and a desired completion (for example
an English sentence and the French translation), and few-shot works by giving K examples of context and
completion, and then one final example of context, with the model expected to provide the completion
GPT 3 use straining method described in this paper Language Models are Unsupervised Multitask Learners - https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
All models were trained on V100 GPU’s on part of a high-bandwidth cluster provided by Microsoft