And if you care about the response, you have to transfer a new state back. It's possible with postMessage to transfer a buffer entirely by reference, and that solves some subsets of problems, but you still can't work on the same data at the same time. This really is a bottleneck, a coordination bottleneck, and it's something that we want to address. So while looking at different possible ways of handling this, it sort of became apparent that obviously native has handled this already. The operating system libraries have a slew of constructs that allow us to solve concurrent problems. Solve them well. And solve them in ways that each individual application can tailor for its own needs.
So that helped us come up with some design considerations for what our ideal solution may look like. Of course, we want native-like performance. That's the Holy Grail benchmark. Let's be as fast as we can be. We don't want to be dependent on the main event loop, if possible. Because everyone uses the event loop, and that -- as Simon pointed out, can lead to some jankiness. Implementation versatility. What I mean by that is... Every problem has probably an ideal answer. And so you want to allow the developer to come up with an answer that makes the most sense for them, instead of being constrained by a really high level set of utilities that force them to talk in a particular vocabulary.
We want to support -- this one may seem a little arbitrary, but we'll explain more in the future -- we want to support the algorithms and applications that are based on threads and pthreads. For a long time, threads have been around, and there's a lot of good computer science research on them. How to use them really effectively. And all those algorithms have a lot of value. Not every single application, not every single problem can benefit from those algorithms. But it is possible to directly apply them with minimal changes. It does seem like a win for the user. So it's something that we would like to support. And finally, support the extensible web philosophy.
Anyone out there familiar with the extensible web philosophy? Or have read the extensible web manifesto? A couple of people? It's worth explaining a little bit here. I'm sure every single person will have their own take on it, and I certainly invite you to go to the extensible web manifesto website, to read about it in more detail. But at a high level, it's thought that the best innovation is going to come from developers, come from you guys. Sometimes standards bodies and browser developers can spend too much time iterating an idea, trying to come up with something perfect, and by the time it comes out to where the developers can use it, we didn't really hit the mark. So it seems like it's a better model for us to provide low level primitives that you can then generate and iterate on quickly in JS, instead of for us to provide rich high level implementations that are surfaced, and then you don't have a lot of flexibility to tailor that to the way you want to do work.
So... First we're going to do this with postMessage, and what I did here was build a sequence diagram so you can really understand exactly what I'm trying to do. Create a worker. Brief that worker to tell us it's ready, and then we'll start working. And the goal of this incrementing worker -- really all it's going to do is talk in some kind of mechanism channel back to the master or the thread, passing an integer each time and incrementing it when it gets an integer from the master. They're going to reach some count, in this case I used 100,000, and when that message passing is done, the master will display some results. So the simplest scenario here is... The master says -- hey, start at zero. Go for 100,000 times. At the end, the worker will -- and the master will have a variable that says 100,000. Not rocket science, but hopefully enough to illustrate some points. Here's an implementation of the master side. It's code and it's small, and I think it's worth calling out some lines here.
So we agree on 100,000 iterations, and the master posts to the worker to start at zero. The worker and the master will talk inside this callback, and the worker's date is going to come back as event data. It's going to be unpackaged. When some condition has met the terminal condition, it will end. Otherwise the master will pass the integer right back to the worker. On the worker's side, it's even simpler. The worker -- all it does is unpack the master's integer, increments it, and sends it right back on its way. So how does that work? How fast does that work? Well, for postMessage in this implementation on this laptop, I get about 54,000 messages per second.
I think the example is pretty simple. There's probably a variety of these that you use all the time just for solving message passing techniques between the worker and the master. Let's do that again, using shared memory, specifically using shared Int32 array. So there's going to be some new constructs here, coming out of the shared memory document that I would like to call out. This is actually too small for even me to read. Make that bigger. Going to make sure I'm on the same page with all of you. So the first line I'm highlighting here is shared array buffer. This is one of the core pieces of the concurrency technique that we've come up with. Which allows us to share memory.
Basically you specify a side. And here is a syncretic object that's actually layered on top of what actually exists in the spec. But what's in the spec is a little too low level to cleanly show it in a presentation. The purpose of this is basically -- it's a waitable object. It's going to allow for the master and the worker to coordinate. Using postMessage, the only use of postMessage in this example, we're passing the shared buffer to the worker, but because it's a shared buffer, the master will be able to use it as well. So this is a shared Int32 array. That's a view on the shared buffer. We're basically saying -- at this address, inside the shared memory region, it's an integer. Treat it like an integer. Give me integer semantics on it. And here, inside the callback, using the syncretic, the master is saying, after incrementing the buffer, the master is saying -- all right, I'm done. So the worker will wake up. Now, in this example, unlike the example from before, the master is also incrementing the counts. The only real reason for doing that is -- this loop, the master will do nothing other than saying I'm done, without doing any real work. So I wanted him to do something to justify the existence of that loop, other than the wall for the worker to continue. Otherwise the worker would count to 1,000 instantly and be done without the master and the worker going back and forth. So here on the worker side we have once again that waitable object, the syncretic, and the shared Int32 view on the shared array buffer.
And on the loop side, we wait for the master to be done, increment the buffer, and then say we're done. So it looks nothing like the postMessage version. It's incredibly specific to integer-only. PostMessage you can pass in anything. But it does function, and it functions incredibly fast. Where before we had 54,000 messages per second, this time we have 6 million. Over 6 million messages per second. I think we can all agree that I cheated a little bit. Right? So unlike the postMessage model, where I can pass anything in and do anything I want with it, where it's synchronized for me, and where I have constructs where I don't have to worry about locking, all that sort of nitty-gritty low level stuff was exposed in the shared memory version, so it's not really a drop-in replacement for postMessage. I think I can do better, and I'm going to try to do better in this second version. Once again, because performance isn't everything. Sometimes ergonomics of the implementation matter.
So here what I do is -- and I'm hiding a lot of the details, but what I do differently here is I'm creating a new object that wraps a sender and receiver, a channel sender and a channel receiver, and all this is built on top of the same primitives that you saw before. And if we look inside our loop now, it's a lot clearer. And it maps much, much more directly, one to one, with postMessage. We can basically treat the send like the post to the worker, and the receive like the post from the worker. There's no locks exposed here. There's no waitable objects that you can screw up, by signaling at the wrong time. And the worker side is just as clean, just as simple. And I think very, very easy to read. Arguably even easier to read than the native postMessage implementation. With receive, increment, and send. So how does this perform? Still a lot better than postMessage. And it's a huge trade-off for using these high level constructs, versus shared Int32 array. But the choice is yours. Right? The choice for having high level constructs or low level performance is something that this implementation allows you to have.
So the next demo I would like to show... Something I'm really, really excited by. And in fact, something we only just got completed. So hopefully this works. Like I said, I'm running a private build of Firefox. So I'll take a sip of water for good luck. This is the Unity WebGL benchmark. People out there familiar with Unity technology as a company? Pretty good percentage of the audience. For those who are not, Unity technologies makes a 3D game engine called Unity. It's one of the most popular licensed 3D game engines in the world, if not the most popular one, and it's a company that I think does really, really exciting things. So for us to be able to work with them on enabling their benchmark, using our shared memory, was really exciting. Like I said, it's something we only got done Friday of last week. So we'll see how that goes.
Back to my nightly. And let's start this up. These are default four cores. So... As you can see right away, we have a text rendering bug on the first screen, which we'll ignore. I'm also going to uncheck the first two demos, because they're graphically maybe not as interesting as some of the other ones. Let this go. So this benchmark is much, much more than a WebGL benchmark. What it is, actually, is testing the entire Unity 3D engine, and it's all running in the browser. Basically automatically ported from C++, very minimal changes on the Unity side. All the changes were in our shared memory and in Firefox itself. Here we have a bunch of dancing bears. And what's more exciting than skinned dancing bears?
So like I said, it's much more than WebGL. They have AI running. Physics. Particles. Skinning. And also their job system, which allows for execution of threads. Threaded content. I think it was much prettier than any demo I could write. I'll let it go one more... Oh, yes, have to wait for the snowman in the middle of the flurries. Like I said, the reason why I'm really excited by this is... The Unity 3D engine is a massive code base. Enabling that -- basically automatically ported over directly from C++ on top of this is an incredible validation of the technology we put together. And that it's working at all is amazing. But I think... I'll show here... Let me stop this. Oops. Too many.
I think they're not just fun to play. They're fun to develop, and they really challenge software developers and hardware developers. So I think it's a fantastic test of any infrastructure. And we were able to make all this work -- the benchmark is not just functional. But we were able to make it work fast. So Yuca Yolanki did most of the implementation work on making the Unity benchmark functional. He worked on the (inaudible) side, and he worked on the JS side. He's a Mozillian. I asked the team for why is this particular thing significant to them, and he offered up this quote, which I would like to share with everyone. With shared memory, the web lifts an important implementation with shared execution that it had compared to native. Shared memory is not comparable to a library call that can be emulated or polyfilled. I'm sorry for all the people who are asking this question -- can we polyfill this? We cannot. But it's a fundamental concept of parallel execution architectures. I'm not sure if it can get any bigger than this. He's a very emphatic guy and he's very excited, but I think he really did capture the significance of what we accomplished here. So what's next for shared memory and for the work that we've done here?
Well, shared memory is in Nightly right now. As on JS, for those that are... Can I get another show of hands -- are people familiar with (inaudible) JS? Most of the room. So it's a low level subset of JS that we put together that you can really optimize for, and it's fantastic for certain workloads, especially cross compilation ones. So it's not going to move out of Nightly, until we standardize it. But we are in talks to standardize it. The API, the set of documents you saw that Lars put together, still subject to change. We're getting feedback from a lot of sources inside Mozilla and out. But the good news is Google has actually announced fairly recently that they're going to start implementing this too. So hopefully a year from now we're going to see this in applications built against it everywhere. I have a bunch of links that nobody can read, but I think there'll be some opportunities to share this presentation later. So hopefully you can catch up on any details here that you may have. And you're always welcome to find me. Thank you very much for your time.