Raynos/non-block.md

## non-block.md

      
    Raw
  

              non-block.md
            
          
    What it means to be non-blocking in node.

From October 2011. Outdated.

In reply to Wilcox's research paper there seems to be a major confusion about what it means to be non blocking and why node is non blocking.
Now let's start with why node is in a single process and why its non-blocking.

What's the alternative?

Let's assume it's one thread per request.

Why would we not want one thread per request?

Well creating a thread is far, far from cheap. It's expensive and has a large overhead. So if your doing 1ns of computation then having a thread is madness.

Ok, so what's the solution?

No threads! Just one process, remove all the overhead. Great, we now have no overhead at all. Everything is nice and pure.
But ehm, we are doing blocking IO everywhere. We are waiting for some thing to happen and then for it to come back to us. Why should we be sitting around waiting for it when we can do more computation in parallel!.

Ok, so what's the solution?

Make all that blocking IO asynchronous. So, what just happened? We made it parallel! How did it become parallel? Well when you send a remote request to some other HTTP server we are no longer blocking waiting for it to come back to us, we are recovering our CPU cycles and putting them to good use doing other things! Blocking HTTP is bad because when you block in your HTTP request you are doing nothing but waiting.

What about the file system? Surely the file system is doing something useful while we are waiting, it's crunching data on our own PC!

Yes when you make a call to the file system your not wasting CPU cycles as such. However what if you could parallelize file system access without having to faff around with threads ourself? See, if we make file system access asynchronous it basically means that someone is accessing the file system for us in the background (possibly in parallel on the internal thread pool or the IOCP windows magic!) and once it is done it will come back to us.

But where does this event loop business come into play?

Well we have an event loop. On the first loop of the event loop we do a whole bunch of things. Some of these things are asynchronous. Now with asynchronous things we basically tell the event loop "Go and tell someone to do this, then run this when someone finished doing it". Once were done with the first loop of the event loop, we are just going to sit there idly waiting.

Woh but idling is bad!

Idling is only bad if we could be doing something useful instead. Can we do anything useful instead? Oh wait, we just got a message from the event loop. The event loop says "here's an incoming HTTP request and here's that code you wanted me to run when it came in", so now we can go and do something useful and handle that HTTP request. Once we are done handling the HTTP request we give control back to the event loop.
Now the event loop says "Whilst you were doing that two file system requests finished, I'll give you file1 now and once you give control back to me again, I'll give you file2".

So basically the event loop is a global handler for message passing between IO calls.

Eh.., yes something like that.

So everything should be asynchronous and non-blocking in node! Ok, when I do a bit of computation I will make it non blocking by calling process.nextTick and giving control back to the event loop!

WOH! Wait, WHAT? Now ask yourself why would you want to make that computation asyc & non-blocking? Is it because your sitting there waiting for something to happen remotely?
No it's not, your doing the computation on your own CPU. Is it because if you do it asynchronously you will be able to make it parallel without faffing around with threads? No, because JavaScript is single threaded, you can't just magically make it parallel.
Now what are you actually doing, your saying I have these things happen in sequence and I want to make it look like they happen in parallel. So I will time share them using process.nextTick.

But I'm doing some expensive computation and I'm hogging the event loop!

I see, well a manual load balancing system using process.nextTick and the event loop is not the solution. What you want instead is a real load balancer in front of your node server.
Ideally you have N node instances behind the node server. Each node instance has it's own event loop and all actions on the node instance take roughly roughly the same time. If you split your node instances up by fast, average and slow actions then relatively none of these actions are "hogging" the event loop more then the other.

That's not good enough, I want it to be magically parallel!

It doesn't quite work like that, if you want it to be parallel create a new child process. Of course a child process has overhead so you have to think about whether you want to do this. For example, spawning a child process to do video encoding using a C library would be a good use-case.
What you were doing before was time sharing.
Now go read up on Time-sharing and Co-operative multi-tasking
What you really want is for the operation system to use Pre-emption on your multiple processes. I.e. the OS can do this time sharing better then you can.
Now hopefully you understand why we only care about non-blocking IO and that abusing process.nextTick and claiming you can do better process scheduling then an OS makes no sense.

But isn't running one process bad for my node server when I have 8 cores?

Meh node --cluster free cluster based load balancing to scale to your 8 cores.