Can you give me a very clear explanation of the core assertions, implications, and mechanics elucidated in this paper?
Certainly! The paper "Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models" by Jacob Pfau, William Merrill, and Samuel R. Bowman explores an intriguing aspect of how transformer language models (LMs) process information. Here’s a breakdown of the core assertions, implications, and mechanics discussed in the paper: