Skip to content

Instantly share code, notes, and snippets.

Last active July 29, 2023 19:32
Show Gist options
  • Save shawwn/3c922299d61d1b4e0ac1cf870806e32e to your computer and use it in GitHub Desktop.
Save shawwn/3c922299d61d1b4e0ac1cf870806e32e to your computer and use it in GitHub Desktop.
A transcript of an interview I did for The Verge on March 6, 2023 about LLaMA, Facebook's new 65 billion parameter language model that was recently leaked to the internet:

The Verge: "Meta’s powerful AI language model has leaked online — what happens now?"

Could you confirm that you downloaded the LLaMA series from 4chan? Were you able to get it running yourself or did you just repackage the download? (I was a bit confused reading your tweets about that what exactly you'd done there, so if you're able to explain that, it'd be great)

I downloaded it from Facebook, actually. You can find some details here.

Basically, the sequence of events was:

  1. I heard about llama, and wanted to try it out for myself.

  2. I started downloading it via the torrent, which originated from 4chan.

  3. The torrent was going to take more than 16 hours to complete. I didn't want to wait that long, so I started looking at the partially-downloaded files the torrent had already downloaded. One of those files turned out to be a fully working download link -- the original download link that Facebook sent to the person that leaked llama on 4chan. That person accidentally included the download link in the torrent, which turned out to download llama directly from Facebook at 220MB/s. So I was very happy to be able to start experimenting with llama within ~30 minutes, and decided to tell everyone about this "secret cheat code" to access llama.

Within 2 hours of telling everyone about the download link, Facebook shut it off. Someone tweeted to me "Hey, it's down!" and I replied "Yeah, I figured that'd happen. Oh well, I'm just happy it helped a few of you."

... Then I realized that I didn't have to accept defeat.

One of the biggest challenges with distributing very large models (as we discovered when we tried to distribute GPT-2 1.5B for our GPT Chess work is that it's very hard to distribute very large quantities of data. Normally, every provider charges you an "egress fee" -- a certain number of dollars per GB transferred. So if millions of people download a model that I want to distribute, usually that would result in thousands of dollars of fees.

But Cloudflare launched a service called R2, which has zero egress fees. I'd seen someone use it before for distributing a model, so I thought I'd try the same trick.

When Facebook shut off the download link, it was around 2am and I was partying with my friends. But the idea of pulling this off was too enticing, so I ran off to the basement and pulled up my Cloudflare dashboard. 90 minutes later, I had successfully mirrored everything to R2 and updated the download script to use my mirror instead of Facebook's.

After I did that, my announcement tweet went viral (>140k views and counting) and I assume thousands of people were able to download llama for themselves, which was very gratifying.

I saw some speculation that the way the model had been shared meant it was traceable and suggested it was being shared by a meta employee. Could you explain what's possibly happening there?

Ok, if I understand correctly, there are two parts to this question. One, did the model come from a Meta employee? And two: Can Facebook track down whoever leaked it?

I don't think it came from a Meta employee. Facebook was sharing llama with select, hand-picked research engineers and groups. You could basically "apply for access to llama." Then once Facebook looked you over, they'd send you a download link.

Unfortunately for them (and very fortunately for the world), they sent a download link to someone who took it upon themselves to leak the model.

Can Facebook track down whoever leaked it?

I think so. The way the approval process was set up probably looked like this: Someone would apply for access to llama, Facebook would OK them, and then they'd go into a dashboard somewhere and generate a download link specifically for that person. So they certainly have a list of all the download links that they've sent out to people. I'd be shocked if they don't simply go through that list, look at the URLs one by one, and find the one that matched the URL included in the torrent.

How easy is this model to use? I assume it takes a bit of expertise to go from downloading the torrent to generating text?

Ok, so there's an interesting story here -- and I mean from a reporter's perspective, not just a nerd's perspective, so I'll try to explain it fully. The situation was very surprising; to me, at least.

Over the past several days, the consensus on Twitter was that llama sucks. Example:

I've seen this kind of thing before, where some research group makes grandiose claims, but they're very cagey about showing their results. Either there aren't any publicly available results, or (as in Facebook's case) they try to control who gets access to it.

So until yesterday, I was looking at all these tweets saying "llama sucks, look at this horrible output" and thinking "Oh gosh, here we go again. It's a lemon, not a llama -- it doesn't work."

So it took very little expertise to go from downloading the torrent to generating text, because Facebook provided some code that you're supposed to use to generate the text:

Turns out, that code sucks. I really don't want to be too harsh on them, since it's easy to underestimate just how important it is to get the default settings exactly right. But their defaults were all screwed up. They didn't use "Top K". They used Top P, which I never got good results from (either identical to top k or slightly worse). Their default temperature was 0.8, which was way too high. And worst of all, they didn't have a repetition penalty -- so by default, this thing would just yammer on and on about exactly the same thing.

E.g. this is what happens when you don't have a repetition penalty:


So it's like ... duh, of course the outputs would suck. But this is kind of specialized knowledge, y'know? It's not like most people realize "Oh, 0.8 temperature is way too high, but 0.7 is perfect, and it makes a seriously huge difference if you use one or the other." So nobody on twitter realized any of this

I sat down and added those things one by one (the code is at, and presto: right away, it was clear that llama was something incredible.

This model feels very close to DaVinci (OpenAI's GPT-3 model).

For example, I prompted it with "I am Lieutenant Commander Data, and I am an android." It spit out a full autobiography of Data from Star Trek.

The amount of detail that llama gets right is just incredible. I had it dump Star Trek related outputs for a couple hours, and it got most of the details correct -- down to saying "in 'The Visitor', Captain Sisko was sent forward in time due to an accident...." which was right.

And it's like that with everything. It's not infallible, but it's clear that an experienced prompt engineer will be able to get very impressive outputs from it. Probably as impressive as anything you'd get from GPT-3.

I haven't measured how good it is with generating code, but if the model has lots of programming knowledge, this might be a turning point in the history of GPT -- for the first time, everybody has a GPT-3 on their home computer, with all the same kinds of knowledge that GPT-3 has. And that means it's extremely close to "everyone has ChatGPT on their home computer", because ChatGPT is just a fine tuned version of GPT-3.

So everyone is very excited, because the moment that someone figures out how to apply RLHF (the technique that OpenAI used to turn GPT-3 into ChatGPT), we'll all have our very own custom ChatGPT's, which we can do whatever we want with.

In my opinion as an experienced ML researcher, this will probably happen within two years at most. But it's hard to estimate these things precisely. But an upper bound is that it's 100% guaranteed to happen within a decade, so I'm optimistic a clever research group will figure it out within a year or so. 11:38 AM

So, imagine it. You'll have a ChatGPT on your laptop -- your very own, that you can use for whatever purposes you want. Personally, I'll be hooking it up to read my emails and let me know if anything comes in that I need to pay attention to, or hook it up to the phone so that it can schedule doctor's appointments for me, or deal with AT&T billing department, or a million other things. The tech exists right now, and I'd be shocked if no one turns it into a startup idea over the next few years. (There's already a service called GhostWrite, where you can let GPT write your emails on your behalf. So having one talk on the phone on your behalf isn't far behind.)

What do you think the effect of a leak of a model like this will be? Is it dangerous to have it "out in the wild" as it were, or possibly beneficial?

I think the benefits will outweigh the negatives over the long term. In general, it's hard to use a language model for evil. It takes talent, dedication, and time -- things that bad actors tend not to have. So I think it's unlikely that someone is going to use this thing to prey on people. But there are certainly ways. I mentioned that I wanted to hook it up to a phone to schedule my doctor's appointments. You can imagine hooking it up to a phone and having it automatically try to scam gradmas out of their money by calling them up and tricking them into revealing their bank info. People like Kitboga try to fight against that kind of thing right now because humans do it -- there's a human on the other end, and if Kitboga wastes their afternoon, that's an afternoon the scammer wasn't able to use to hurt anyone else. But if GPTs start doing that, they can do it all day, every day.

So you might look at that angle and (quite reasonably) think, oh gosh, we're all doomed! We're going to be plagued by robots forever now, and people are going to use them for all kinds of evil purposes.

But I don't think it'll be so bad. Technology has been causing problems since the beginning of humanity. Every time someone invents something new, it has profound effects that are hard to predict. The invention of the telephone, radio, the internet, etc -- those technologies can be used for evil too. But on the whole, they've made our lives much better. Would anyone choose to go back to the pre-internet era? Or pre-telephone, or pre-electricity? Sure, some people choose that way of life, but most of us like being able to microwave our food.

I think the proliferation of AI tech will be similar. I don't think that it's going to become dangerous -- at least not as quickly as everyone tends to believe. And look at all of the incredible things you can do now. It's a matter of time before children are able to have personalized teachers -- one teacher per child -- and not just a teacher, but an expert. They'll be able to learn at their own pace. And they'll be able to choose what they want to learn, too! Imagine having a personalized guitar tutor that never gets tired. That's something that will benefit all of us.

So I think there will be massive gains from this tech which will make our lives significantly easier, and that the gains will outweigh whatever badness comes from this by at least 10 to 1, if not a hundred or a thousand.

Thank you so much for all that detail Shawn - that's incredibly interesting and useful! (Though, a fair warning: probably not all of that will make it into the piece hahah... It's great context for me writing it, but yeah, I'm writing for a much more general audience!)

Of course! I would've been shocked if all of that made it into the article. :)

Super interesting also on the temperature settings and what it takes to get a good result out of this. I guess it's fair to say that it's a much rougher experience than ChatGPT etc, which will be most of our readers nearest point of comparison?

Correct. Right now, most people expect to be able to say "You are now Data from Star Trek. What's your day like?" But that's ChatGPT -- you can't think of this as a person that you're talking to. It's a completion engine. So you have to think of it like autocomplete. You've probably seen gmail automatically suggesting things to say based on what you've typed so far; this is exactly the same thing, just a much more advanced version of that.

So you'd have to say "Simply put, the theory of relativity is" rather than ask it "What's the theory of relativity?"

But within a year or two, I bet it'll be as smooth and as polished as ChatGPT is right now. And it'll be all yours, rather than one company's.

Something I'm curious about is whether you think this specific release will be a big milestone in the way that Stable Diffusion was - putting this power onto people's computers, as you say, and allowing for all these fine-tuning innovations, from dreambooth to controlnet... I guess I'm thinking that we have had some open source LLMs already from places like Eleuther... but is it just that LLaMA is better? that it's smaller? or do you think there'll probably need to be new better models out in the wild before we get our stable diffusion moment for text?

Yes, I think LLaMA is much better than previous open source model releases. When I created the books3 training dataset, that was actually when I was helping to start Eleuther, so I know firsthand just how hard it is to do this kind of work independently. It's extremely challenging. Most of us had infinite free time, and were working as hard as we could (just because it was such a cool project), and it still took us several months just to organize the training data, let alone train anything.

llama's benchmarks from their paper show that they go toe-to-toe with GPT-3 DaVinci. And I have no reason to doubt those benchmarks. So I think it's very likely that this model release will be a huge milestone -- the largest public language model so far is 20B. So imagine going from 20B to 175B (davinci), except it's only 65B -- almost 3x smaller. That's a huge leap. For some concrete context: an A100 costs around $2/hr to run yourself, and you can run the largest llama model on one of those. Usually it requires 8 A100's, and an 8xA100 server is extraordinarily expensive for hobbyists to get their hands on. Whereas most of us either have access to an A100 or know someone that can let us use one for a bit.

also — just to check again — how would you like to be referred to, as an independent AI researcher?

That works :)

And is there any other big projects you've been involved with? I've seen your name come up in the context of Books3 as I mentioned, but just wondered if there was anything else

I post most of my work to Twitter (@theshawwn) so that's the most reliable way to see what I'm up to. Here are a few highlights I'm proud of:

But mostly I'm just happy to do the stuff you've seen -- stuff that makes the rounds on twitter and has a real impact on people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment