Skip to content

Instantly share code, notes, and snippets.

@msaroufim
Created March 12, 2024 02:45
Show Gist options
  • Save msaroufim/f91e2615f2e68ee33a3aa05393cfb97b to your computer and use it in GitHub Desktop.
Save msaroufim/f91e2615f2e68ee33a3aa05393cfb97b to your computer and use it in GitHub Desktop.

How to build a Discord community TL;DR: Be responsive, have a bold raison d’etre, make sure people have low and high effort things to do, impact the real world with as many artifacts as possible and share the impact with external partners.

A lot of the leading applied research in ML these days is happening on Discord so a common question I get asked is “Hey Mark, which Discord group should I join?”. That’s an easy enough question to answer these days just subscribe to https://buttondown.email/ainews but then I always make sure to remind people: “You should probably create your own Discord community” and I feel like people don’t quite like it when I say this because well how do you create a discord community from scratch?

I’ve created 3 communities so far and each one has grown larger more quickly than the last so hopefully some of these lessons apply to you as well.

Robot Overlords: Took about a year to reach 450 people NeurIPS LLM Efficiency Competition: Took about 6 months to reach 1,300 people. Learn more here CUDA MODE: Took about 2 months to reach 4,500 people. Learn more here

If you’re working on creating a community and need some consulting advice please let me know and in the meantime, the below captures most of what I’ve learnt building communities.

Robot Overlords

Before my Meta days I had written a few spicy blogs on my substack and at a point I was really overwhelmed with the number of emails I had to answer, some of those discussions got quite repetitive too, I can’t tell you how many times I had to give people the advice for whether they should get a PhD or not.

So I started to redirect people towards my first Discord channel The Robot Overlord Manual and that first community was very much centered around me. Whenever I wrote a new blog or did a new technical stream, people would show up and ask me questions.

Over time the brand of the channel became something akin to “Casual conversations with a senior engineer on your team” and I enjoyed it because I myself didn’t have many technical mentors until I was much older so it felt like I was doing something virtuous.

However, over time my output was decreasing. I wasn't as proud of some of the blogs I was writing, got busy at work, started to write less and very quickly things started to decay because I was a single point of failure. The internet favors consistency but I had lost my motivation for spending more time on this particular community, I didn’t want to be the “hot takes” guy and I learnt first-hand how a community dies and it’s not with a bang.

NeurIPS LLM Efficiency

The second time around we were building the NeurIPS LLM efficiency competition where the goal was for competitors to finetune 1 LLM on 1 GPU in 1 day. We started a Discord server without too much thought as a forum to clarify rules. However, because the competition was interesting, it had a bold goal that resonated with a broader ethos in OSS and it attracted a bunch of attention. Some of that attention came from tail users coming in and the other was mega influencers like Sebastian Raschka who wrote guides for a strong baseline entry.

So now with a few hundred people and the deadline for the competition drawing closer many people were asking me for rules clarifications and the debates around whether my clarifications were correct or not got quite intense at times with hundreds of comments demanding that I allow ChatGPT generated datasets and others that they will abandon the competition if do. In these moments I became quite stressed but decided that I will do what I thought would make the competition more interesting. I started to learn why for example Jerome Powell needs to be so careful with his wording. However, the debates in the open were engaging and drew in more people.

What made the competition go into overdrive though was that Luca Antiga (a PyTorch OG) was very intrigued by the competition, wanted to test an early cloud offering Lightning.ai was building and so he made it possible to submit evaluation jobs directly from a Discord channel so now people could share results and also discuss them even if they had no access to GPUs. We had turned Discord into a cloud service where we could directly interact with our users instead of working via proxies like cloud vendors.

So here was when my understanding of communities started to evolve and a lot of the lessons apply if you’re trying to host a good party. You want to be nice and responsive (I was personally an all day on-call), the goal of the community needs to be bold and interesting not just chilling and especially not selling (finetune on 1 GPU) and people need to have something to do (Run evals on discord). Interesting communities not only attract tail users but also hard core engineers, because the effort was interesting. We had over 25 engineers spend significant amounts of their time to make sure all the infra for the competition was solid.

If you apply these 3 lessons I can promise you will have an interesting community.

As a sidenote when the competition ended the group again quickly decayed, I was burnt out from answering so many messages and couldn’t keep the group on life support, but the difference now was that I was very much OK with the group dying Memento Mori , the competition ended so the intensity naturally had to go down and eventually die and that’s OK, we learnt a lot from this experience and used it to make big strategic bets in PyTorch. When we resume the v2 of the competition this year we will bring it back to life stronger.

We also had a physical workshop at NeurIPS which was covered again and again in the media, the talks were also recorded by NeurIPS and had a lineup of the top people in the finetuning community. One of the speakers Tim Dettmers was describing his process for how we authors CUDA kernels where he turns off lights, music and internet and just codes in what he called “CUDA MODE '' and that was the origin for the 3rd community I worked on.

CUDA MODE The main thing I wasn’t too happy about with the NeurIPS competition was that by the end of it there were competitors that had not created useful artifacts, as in here were a few models that did well on some eval datasets but not here’s some popular models that others are now downloading. I was stewing on this I was watching Yannic Kilcher present https://github.com/LAION-AI/Open-Assistant which was a dataset created by a community which is now widely useful and cited and that was an aha moment for me I want the next competition to be closer to a collaboration.

I wanted to bounce ideas by the authors so the first author was Andreas Kopf and I messaged him on Twitter, we got to talking about many things but one of the things that came up was that we both needed to get better at CUDA and we couldn’t find good ways to go about it so we created a small private Discord group with the genuine intent of just inviting experts and sharing notes.

As soon as rumors of the group’s existence became apparent, the FOMO intensified and I was getting hundreds of DM’s from people asking to join so we opened things up. Working with Andreas too was a pleasure because it was my first time working with someone who was a true peer at building communities.

Somewhat serendipitously one of our keynote speakers Jeremy Howard from the NeurIPS compeition had started his own company https://www.answer.ai/ where the majority of their hardware was consumer GPUs like 3090, he was running into performance issues since most frameworks optimize for enterprise GPUs like A100 and he signed up to give an intro to CUDA lesson which blew the server up from about 1K to 3K people within less than a week.

I lied it wasn’t totally serendipitous because the goal was interesting, many people wanted to learn CUDA, NVIDIA is on its way to become the largest company in the world, CUDA Is hard, more people are writing kernels and enjoying it a la ggml, there was a lot of latent energy for kernel authors to hang out somewhere but no such forum existed quite yet. So if you wish there was a forum to discuss exactly X, please create it, the demand is likely broader than you think.

Back to the lesson of having something to do, we always have something to do in the form of a weekly reading group where we go over both beginner and advanced performance topics with the number of live attendees ranging from 50 to 200+. We had a lot more artifacts in the real world too, we have our lectures on YouTube, lecture resources on Github, working groups developing useful kernels, PyTorch core engineers getting feedback on their features, people recruiting talented kernel authors to the group.

CUDA MODE is still ongoing and we will resume the NeurIPS competition again in a few months but my main lesson from CUDA MODE is to dramatically increase the number of useful artifacts that a group is creating in the world and also provide low efforts channels of people to be a part of the community since writing a kernel is much more challenging than watching a lecture. I’m also much more willing to share moderation duties with external stakeholders since the right ones have helped me scale this community much more quickly than I could have done alone.

I genuinely enjoy hanging out in these communities but they also serve a real benefit to Meta in that we can observe trends in Machine Learning very quickly, the real problem becomes an abundance of feedback and if you’ve ever worked with a cagey customer you’ll appreciate why abundance of feedback is a problem I love to have.

So hopefully this was a useful read and it would have done its job if it encourages you to spend time creating and nurturing a new community so if you are attempting to do so, I would love to hear from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment