Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li
Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern,
Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the Hourglass Evolution of Internal Consistency'',
Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at \url{this https URL}.
URL: https://huggingface.co/papers/2407.14507
▲ Please explain the paper to reader with no relevant technical or research background, be clear and concise.
This paper explores the problem of large language models (LLMs) like ChatGPT sometimes making mistakes or generating inaccurate information. The authors argue that these issues stem from a lack of "internal consistency" within the model.
Imagine a model as a human trying to answer a question. If the human is confused about their own thoughts or reasoning, they are likely to give a wrong or inconsistent answer. Similarly, LLMs can struggle with internal consistency, leading to errors.
The paper proposes a framework called "Self-Feedback" to address this issue. This framework involves the LLM evaluating its own output and then using that feedback to improve its responses. It's like a human thinking about their answer, realizing a mistake, and then correcting it.
The paper then dives into different ways LLMs can self-evaluate and update themselves. It explores various techniques, including:
- Reasoning Topologically: Helping the model find the best reasoning path to reach a correct answer.
- Refining with Responses: Having the model generate multiple answers and then selecting the best one.
- Multi-Agent Collaboration: Using multiple LLMs to work together and debate to reach a consensus.
- Refining the Response Iteratively: Having the model refine its own answer step-by-step until it's accurate.
- Mitigating Hallucination while Generating: Helping the model avoid generating false information as it creates text.
- Decoding Truthfully: Improving the model's ability to choose the most accurate words when generating text.
- Activating Truthfulness: Enhancing the model's internal representation of truthfulness to make it more likely to generate accurate information.
The paper concludes by discussing the challenges and future directions for this research. It emphasizes the need for more research on how to improve LLMs' internal consistency and make them more reliable and accurate.