When we read about Nodejs on the web, everybody start to say "In Node, everything runs in parallel except your code". We have hard time understanding this. What runs in parallel and what not? How does node gets additional thread to run something in parallel ? Who manages these threads? Once we get into a situation, where we have more than one thread, how does node achieve synchronization over the shared objects?
So this is a slightly tricky part of understanding Node's architecture. For all purposes, Node.js is single threaded and runs in a single process. That is the code you type is executed in a single thread. So if you do a
while(block for 1 second)
, the entire process will be blocked. But this obviously isn't performant - if you have a couple of I/O requests to process for each HTTP request, it will block all further execution until that is complete.
Here is where node's async bit comes into play. For everything that is I/O intensive, Node.js automagically executes it in the background using threads. That is handled by libuv (https://github.com/libuv/libuv) which allocates and manages thread pools, uses the appropriate event loop (epoll, kqueue, etc). So an async function like
fs.readFile
is queued in the background until there a thread can execute it. Once done, the callback function is invoked with the result.
Unlike Java where you have to manage threads and synchronization, all that is handled by libuv.
When does a user request joins the event loop? As soon as it reaches the server socket? or only when the current request encounters an expensive IO bound operation?
Certain operations are queued into the event loop. For example, the callback of any async function is added to the event loop, to be executed at the next
loop
. Since you seem to be asking about a HTTP request, and since sockets are I/O bound, the incoming request will be added to the event loop as well.
In Node, When does events are emitted? Only when there is an expensive IO bound operation Or there are lot of custom events that many libraries will generate ? What is the ideal scenario for a library to generate a custom event?
So since all I/O operations are async, there is a need to report progress when the event loop executes that particular function. Node.js has the EventEmitter abstraction that helps you handle that. Usually, most libraries tend to stick to the convention of supporting events that the standard library supports - data, finish, error, etc. However, using the EventEmitter, you are free to support custom events. And many libraries do emit custom events.
AFAIK, there is no ideal scenario as such. Usually, async code which cannot just return a value can either return an EventEmitter or a Promise.
In Java, we use Threadlocal to bind some information to the running thread and make use of that info throughout the thread flow. For example, basic user details attached to Threadlocal are available in any where in that request processing flow. How do we achieve this kind of behaviour in Nodejs?
For example, if you take the case of express.js, each incoming request is captured in the req object, which is passed onto each middleware handler. That becomes the context of execution.
What is the preferred ORM library - that support Transactions and all of jpa functionality in java? we started looking into Sequalize.
Although I've not worked with Sequalize, it does seem to be one of the better ORM libraries.
How can we write promise based api?
The default node convention for callbacks is function(err, result). So any async function takes the callback function as the last argument. As long as you stick to this convention, you can use any library like Q, or Bluebird to avoid callback hell (which is a very real thing and quite frustrating!)
In this context, I would recommend cspjs (https://github.com/srikumarks/cspjs) written by Sri. It gives you the abstraction of working with channels, which is then compiled down to vanilla JS using Sweet.js. I find it indispensable when working with Node.
Any flyway kind of library for node (db migration)?
I use a thin wrapper around https://github.com/tj/node-migrate, although it cannot be compared to flyway. AFAIK, Sequalize has Django style migrations support.
Some more reading:
https://nodesource.com/blog/understanding-the-nodejs-event-loop
http://venodesigns.net/2011/04/16/emitting-custom-events-in-node-js/
Adding a few cents to Srinath's good explanations -
It is useful to look at the system as a whole. A machine on which the node server runs has a few subsystems -
No matter how many threads you run on the processor, all activities requiring access to the RAM must go through the single channel between the processor and the RAM. Same goes for disk access. Also, there is (typically) only one network port, so all I/O must be multiplexed on to it.
If you don't have any compute-intensive tasks, this is the maximum parallelism you can get out of such a system design. A useful mental model is to think of the event-loop approach as letting one thread and its associated event loop service each of these subsystems. A further simplification to this scenario that Node imposes is that all responses to events generated by these threads are processed in a single compute thread. This mental model gives you some idea about how to squeeze more performance out of Node-based systems. For example, if you have disk related work as well as network related work, you may want to split them into two different Node processes and have them communicate via memory. Whether this actually gets you a performance gain is to be measured in your specific application, but this mental model makes such options available to you when you need them.
The "event loop" is just a while loop doing this (warning: simplification) -
So a user's request can enter your code only after
wait_next_event
provides it to you ... and all subsystems mentioned above pump their events into this singlequeue
. If you want to give supreme attention to network events, for example, minimize the chance of other event types (ex: timers, disk I/O, IPC) coming down your queue, so that you'll quickly be able todo_something
with your network events.You don't do this in Node. In Java, you might have one thread servicing one request at a time. In Node, you have the process main thread servicing all requests in a multiplixed manner, so such a "thread local" corresponds to a "global".
Instead, as Srinath said, simply pass around the request object throughout the pipeline that processes the request. That way, there can be multiple request/response objects simultaneously in flight while the system is doing the necessary I/O and compute to satisfy the request. Javascript's first class functions and closures make this easy.
If you're using cspjs, a
task
will feel like a thread and you can declare variables that will be visible throughout the many steps a task may involve.