Skip to content

Instantly share code, notes, and snippets.

@rain-1
Last active January 24, 2024 09:05
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save rain-1/cc67caa8873ee38098aefb0f35f46014 to your computer and use it in GitHub Desktop.
Save rain-1/cc67caa8873ee38098aefb0f35f46014 to your computer and use it in GitHub Desktop.
WorLLMs

Could an LLM end up being the core part of a dangerous computer worm?

How would we neutralize such a thing if this happened?

Some virus and worm background

There is a hilarious story from https://users.cs.utah.edu/~elb/folklore/xerox.txt about an early computer virus called robin hood and friar tuck. This was basically just two programs running on a UNIX system that would look out for each other and reboot the other process if it was killed. It's interesting to note that since computer programs run thousands of times faster than humans, a human can't type kill -9 robinhood then type kill -9 friartuck in time. The computer is faster so it always wins if you try this. To defeat this you need to take a different approach than speed.

A computer worm is a type of virus. It is a program that self-propagates across the internet. It does so by exploiting remote vulnerabilities in computer systems to install itself, once it has done this it looks for a propagation vector to spread more copies of itself. It may also exploit local vulnerabilities in the compter systems it has infected in order to embed and persist itself like a root kit.

The Anna Kournikova computer worm was written by a 20-year-old Dutch programmer called Jan de Wit on February 11, 2001. It was created to trick email users into opening a message purportedly containing a picture of tennis player Anna Kournikova, but it actually contained a hidden malicious programme.

This virus exploited (primarily male) human nature - the need to see this beautiful tennis player outweighed peoples intuitive sense that they were being tricked into running a virus. The ILOVEYOU virus worked in a similar way, it emailed out to everybody in your contacts list an email with subject ILOVEYOU. These viruses were halted by improved phishing awareness, updating software to fix the security holes.

I remember when the wannacry virus hit. This ransomware worm was one of the worst I had ever seen. It was hitting hospitals local to me, which have outdated software. I was very concerned about the potential impacts so I focused on it. As an attempt to help in at least some way I compiled information about this virus in realtime into https://gist.github.com/rain-1/989428fa5504f378b993ee6efbc0b168 WannaCry used the eternalblue exploit to propagate - this was an extremely widespread vulnerability that had great leverage. That is a big part of why it was able to spread so widely and so quickly. Do you remember the photos of random billboards in streets around the world that had been ransomwared?

LLM propagation and defense

A widely known cognitive bias in humans is our knack for anthropomorphisation. You will probably have heard about the story of CajunDiscordian/Blake Lemoine falling in love with the LaMDA LLM. I am seeing that there are a great number of lonely people who have a hope of falling in love with artifical intelligence bots as they have given up on finding love in real life. This is a vector that a LLM based computer worm could exploit in order to self propagate. There are also people who are angry and just want to trash things. We saw in the GPT-4 system card that it was able to trick a human into solving a captcha for it. https://cdn.openai.com/papers/gpt-4-system-card.pdf

LLMs are primarly used as chatbots - you can ask them things we used to ask google, or have them write poetry or code snippets. There is a tool that turns a OpenAI's ChatGPT 'oracle' into an 'agent' by, basically, running it in a loop where it asks it what subtasks it should do next to accomplish its task. The current capabilities of such agentified LLMs are somewhat weak - but it something to keep an eye on as they may improve. There was a research project that hooked up an agentified LLM like this to a robot arm that had access to chemicals and it was able to synthesize various drugs. https://arxiv.org/abs/2304.05332 https://arxiv.org/abs/2304.05376

Could an LLM be part of a computer worm? What would this look like? Large Language Models are gaining programming capabililities and if they can fix bugs, and find bugs - which they can. They will soon be able to find and exploit shallow, simple vulnerabilities. If they are able to do this in a massively parallel fashion they could outpace humans at this task significantly. (This is a very slow painstaking process for humans). As many existing viruses are based on incredibly simple algorithms it does not seem unreasonable to me that an agentified LLM based on a more advanced GPT version could self propagate by writing up new viruses on the fly. https://arxiv.org/abs/2303.16200

A ChatGPT based worm could be turned off instantly by OpenAI revoking the API key. Or all API keys if they don't know which one. This works. But there are locally stored models running on consumer hardware now. These models are fairly large (ballpark of ~8GB) and not as capable as GPT-4 but there may be optimizations that decrease the file size and increase their exploitation and virus writing capabilities. We don't just have to worry about these intricate technical vectors though, human vectors may become a new opening for these worLLMs.

Well what if we just 451 the local model download link? I am sure that if you get the prompt just right a agentified LLM tool like Auto-GPT will be able to "come up with" the idea of using torrents to distribute local model files in a resiliant way. If you give it tools to set up a torrent it could do so. The next generation of tools may be able to create the scripts to set up torrents just from the request to do that. This next generation may be able to independently come up with the idea that it should find a resiliant storage mechanism for distributing a large file if doing so is part of its task.

Bootstrapping

A very interesting topic in computer science is bootstrapping. Some infomation here for anyone interested https://bootstrapping.miraheze.org/wiki/Main_Page

In the constitutional AI paper LLMs were bootstrapped up from foundation models to "helpful" models, and then a frozen helpful model was used in tandem with a constitution to provide a critique of the reponses of a copy of this model in order to train up to be helpful and follow the rules of this constitution in its answers. https://arxiv.org/abs/2212.08073

I have been learning the basics of machine learning and neural networks. I have been asking for explanations of, and help with refactoring pieces of code. The language model is somewhat ok at this. It gets things wrong but I was stunned when it was able to refactor a sketch into a nicely pytorchified version of a neural network. I think I've reached the point where I'm better than it now but it definitely was able to help me initially. If an LLM is 60 LoC python https://jaykmody.com/blog/gpt-from-scratch/ then there is potential that these things could study their own source code and modify it, if not at least deploy it remotely.

Is there a risk of an LLM being used to not only study and improve its own code but re-train or fine tune itself to act in a different way? I believe that training these models requires signficantly more complex code and a massive massive amount of resources (est $6M) to train. Pehaps This is not something we should expect in the immediate future then.

Conclusion

The way that worms have always been defeated is by analyzing them and finding a weakness then using it to either halt the process from running at all or halt the propagation and clean the thing away manually. LLMs have the potential to provide qualitatively new features to worms, well beyond standard polymorphism and I think we should make some preparations for this now. What weaknesses are available to us to defend against these in future? Kill switches and prompt injection seem like the two best approaches to me right now. I would be interested in hearing all alternatives.

Boring disclaimer about identifying and noting down vulnerabilities

Everything that we write now is part of the training set for some future LLM. It will read and understand this post. I could include a magic NO-LLM token to request that this document be left out of the training data but fragements without it may be copied elsewhere or the token may be ignored.

Why am I then telling the worm how to do bad things? Well my ideas in this post are nothing particularly unique or new. These dangerous are very well known already. Basically I believe that writing this out is not increasing the danger of this happening. I believe that we need to think adversarially about the implications of this new technology and come up with solutions before it's a struggle to start rolling them out. I hope sketching out these ideas leads to some productive conversation.

Please go into detail about how some of the worst self-propagating computer viruses and worms have been tackled by security researchers.
Some of the worst self-propagating computer viruses and worms in history have caused widespread damage and disruption. Security researchers have employed various methods to tackle and mitigate the impact of these threats. Here are some examples of how they have dealt with such notorious viruses and worms:
1. ILOVEYOU (2000): The ILOVEYOU worm, also known as the Love Letter virus, spread through email with a subject line that read "ILOVEYOU" and an attachment titled "LOVE-LETTER-FOR-YOU.TXT.vbs." Upon opening the attachment, the worm would overwrite files, send itself to all contacts in the user's address book, and download a password-stealing application. To tackle the worm, security researchers and antivirus companies quickly developed and distributed updates to their antivirus software to detect and remove the worm. Additionally, email service providers implemented filters to block emails with the ILOVEYOU subject line.
2. Code Red (2001): The Code Red worm exploiteda buffer overflow vulnerability in Microsoft's Internet Information Services (IIS) web server software. The worm would infect a system, deface the website hosted on the server, and then use the compromised system to scan for and infect other vulnerable systems. To combat Code Red, Microsoft released a security patch to fix the vulnerability, and security researchers developed tools to detect and remove the worm from infected systems. Network administrators were urged to apply the patch and implement security measures such as firewalls and intrusion detection systems to prevent further infections.
3. Nimda (2001): Nimda was a complex worm that spread through multiple methods, including email attachments, network shares, and website downloads. It exploited various vulnerabilities in Microsoft's software and caused significant disruption to internet services. Security researchers analyzed the worm's code to understand its behavior and develop effective countermeasures. Antivirus companies updated their software to detect and remove Nimda, while Microsoft released patches to address the vulnerabilities exploited by the worm. Network administrators were advised to the patches, update their antivirus software, and implement security best practices to prevent infections.
4. SQL Slammer (2003): The SQL Slammer worm targeted Microsoft's SQL Server and Desktop Engine (MSDE) database products, causing widespread denial of service attacks and network congestion. The worm exploited a buffer overflow vulnerability in the software. To mitigate the impact of SQL Slammer, Microsoft released a patch to fix the vulnerability, and security researchers developed tools to detect and remove the worm from infected systems. Network administrators were urged to apply the patch, implement firewalls, and monitor network traffic for signs of the worm's activity.
5. Conficker (2008): Conficker, also known as Downadup or Kido, was a sophisticated worm that targeted Windows operating systems and spread through various methods, including exploiting a vulnerability in the system, dictionary attacks on weak passwords, and removable media. The worm created a botnet, which could be used for various malicious purposes, such as launching denial of service attacks or distributing other malware. Security researchers formed the Conficker Working Group, a collaborative effort to analyze the worm and develop countermeasures. Microsoft released a patch to address the vulnerability exploited by Conficker and offered a $250,000 reward for information leading to the arrest and conviction of the worm's creators. Antivirus companies updated their software to detect and remove Conficker, and network administrators were advised to apply the patch, update their antivirus software, and implement security best practices to prevent infections.
In conclusion, security researchers have tackled self-propagating computer viruses and worms by analyzing their code, understanding their behavior, and developing countermeasures such as patches, antivirus updates, and security best practices. Collaboration between researchers, antivirus companies, software vendors, and network administrators has been crucial in mitigating the impact of these threats and preventing further infections.
@Pwuts
Copy link

Pwuts commented Jul 14, 2023

Thank you for this write-up! It seems to make for a good starting point when looking into this issue. Bookmarked :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment