Skip to content

Instantly share code, notes, and snippets.

@yoavg
Created September 9, 2024 20:23
Show Gist options
  • Save yoavg/4e4b48afda8693bc274869c2c23cbfb2 to your computer and use it in GitHub Desktop.
Save yoavg/4e4b48afda8693bc274869c2c23cbfb2 to your computer and use it in GitHub Desktop.
Is telling a model to "not hallucinate" absurd?

Is telling a model to "not hallucinate" absurd?

Can you tell an LLM "don't hallucinate" and expect it to work? my gut reaction was "oh this is so silly" but upon some reflection, it really isn't. There is actually no reason why it shouldn't work, especially if it was preference-fine-tuned on instructions with "don't hallucinate" in them, and if it a recent commercial model, it likely was.

What does an LLM need in order to follow an instruction? It needs two things:

  1. an ability to perform then task. Something in its parameters/mechanism should be indicative of the task objective, in a way that can be influenced. (In our case, it should "know" when it hallucinates, and/or should be able to change or adapt its behavior to reduce the chance of hallucinations.)
  2. an ability to ground the instruction: the model should be able to associate the requested behavior with its parameters/mechanisms. (In our case, the model should associate "don't hallucinate" with the behavior related to 1).

Number (2) is easy to achieve with fine-tuning, assuming (1) exists. Does (1) exist? There is evidence that yes, it does. Presumably, "retrieving from memory" and "improvising an answer" are two different model behaviors, which use different internal mechanisms. Indeed, we can probe model inner layers and infer if it is "lying"1 or if "the question is unanswerable"2. These are very much related to "hallucinations". And if we can do it, then why can't a model use this internally when fine-tuned on contrastive examples, as in what happens in preference fine-tuning? Another possible trainable behavior to reduce hallucinations is to make the output distribution sharper, in order to reduce the chance of wrong random sampling (only if the answer is in the parameters, of course).

So, given these pieces of information, yes, LLM can be trained to reduce hallucinations upon request. And given the prominence and popularity of the term, strong new models likely were trained for exactly that. This is not an absurd or silly instruction. Maybe it's absurd in the sense that you have to explicitly request it, and that the models weren't trained to always reduce hallucinations. On the other hand, maybe always trying to avoid hallucinations has some other undesired consequences, which model trainers and product managers would like to avoid. (Actually, if you are in a position to know of such undesired consequences, and are free to tell the world about them, I will be really curious to learn more!)

Footnotes

  1. https://arxiv.org/abs/2304.13734

  2. https://arxiv.org/abs/2310.11877

@shauray8
Copy link

I agree with @impredicative's statement. Larger models handle uncertainty better, and allowing them the freedom to reject flawed inputs significantly reduces hallucinations. Also, using the “h-word” explicitly might force the model into hallucinating more, as evident from https://arxiv.org/abs/2402.07896. So, it's always better to instruct the model in different ways.

@impredicative
Copy link

impredicative commented Sep 14, 2024

I now have two public-interest projects, podgenai (using gpt-4-0125-preview) [prompts] and newssurvey (using gpt-4o-2024-08-06) [prompts], both of which are using reasonably tailored prompts to almost completely eliminate hallucinations from long text outputs. At least you may struggle to find any in their outputs.

It requires some amount of iterative updates to the prompts to improve them based on the results until things stabilize.

@impredicative
Copy link

impredicative commented Nov 21, 2024

I keep getting asked in job interviews, "But how do you get the LLM to not hallucinate?", as if the interviewer even knows what the correct answer is. As a practitioner, to get a pretrained model to hallucinate less, I use "P.A.I.R.":

  • P: prompt engineering and tuning, with an example if necessary. It goes without saying that one must use a very good instruction-tuned model.
  • A: augmented generation, with the assumption that the model is less likely to hallucinate if it has to depend less on its internal knowledge, and depend more on the externally provided knowledge that has been added to the prompt context.
  • I: iterative refinement of the model's output using a second prompt. I don't always do this, and it's not always necessary, but if the quality of the output could use further improvement, I feed in the output back as the input, and have the model review it for accuracy and everything else. I do this until convergence, meaning until the output stops changing, also instructing the model to not nitpick. It is useful to add a max iteration count to prohibit an infinite loop.
  • R: rejection of output as a valid explicit choice for the model if it deems so. Sometimes the input is just bad, as can be for a number of reasons, and it's best to get the model to reject it, optionally with an explanation, instead of being forced to work with inconsistencies that could risk a poor quality output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment