A recent paper titled "Better & Faster Large Language Models via Multi-token Prediction" (arXiv:2404.19737v1) introduces a simple but effective modification to the standard language modeling training loss that significantly improves performance, inference speed, and reasoning capabilities of large language models, especially for code-related tasks.
The authors propose training language models to predict multiple future tokens at once, using a shared model trunk and independent output heads for each future token position. This multi-token prediction approach is compared to the standard next-token prediction loss through comprehensive experiments on both synthetic and natural datasets. The key findings are summarized in the following fact table:
| Fact | Details/Context | Results/Metrics