willf/llm-reading-list.md

## llm-reading-list.md

      
    Raw
  

              llm-reading-list.md
            
          
    The Walugui Effect:  After you train an LLM to satisfy a desirable property P, then it's easier to elicit the chatbot into satisfying the exact opposite of property P.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.
# ChatGPT Is a Blurry JPEG of the Web For us to have confidence in [Large language models], we would need to know that they haven’t been fed propaganda and conspiracy theories—we’d need to know that the jpeg is capturing the right sections of the Web. But, even if a large language model includes only the information we want, there’s still the matter of blurriness. There’s a type of blurriness that is acceptable, which is the re-stating of information in different words. Then there’s the blurriness of outright fabrication, which we consider unacceptable when we’re looking for facts. It’s not clear that it’s technically possible to retain the acceptable kind of blurriness while eliminating the unacceptable kind, but I expect that we’ll find out in the near future.
Will AI Become the New McKinsey? Another article by Ted Chaing. Like McKinsey, the consulting firm, LLM will act as the "capital's willing executioners," providing an "escape from responsibility." Has some interesting notes about why the productivity gains of the personal computer and internet age haven't led to improvements for the middle class, and how LLMs will only exacerbate the problem unless "we" working on getting LLMs to work on behalf of workers.
Concrete Problems in AI Safety. Specific examples of misalignments between human goals and AI. Written by people deeply embedded in large language model development and marketing (Google Brain, OpenAI, Stanford, Berkeley).
Wikipedia article on AI Alignment. In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests. An aligned AI system advances the intended objective; a misaligned AI system is competent at advancing some objective, but not the intended one. Alignment problem perhaps first discussed by Norbert Wiener # Some Moral and Technical Consequences of Automation in 1960!
Prompt Injection Explained and why AI won't defeat it. (In security, 99% is a failing grade!)