Create a gist now

Instantly share code, notes, and snippets.

@trevnorris /module-notes.txt Secret
Last active Mar 29, 2017

What would you like to do?
Example Performance Enhancement Module
======================================
This module has been created to teach you how to make things awesome.
Notes about what this module is going to do:
- Have an HTTP API.
- The example will be like LinkedIn's case where the API is accessed via
a controlled environment. Where we always expect the request headers to
follow a specific format.
- Connect with some remote service that will feed back data that will be
used in the return response to the HTTP request.
- The returned data will be fed into doT to template the data we are
sending back to the client.
- Log events/data to disk.
- Use plenty of modules to complete these tasks.
- Initially use express for the http server. Then progress away from it.
- Consider using winston for logging. (Depends on performance on how I
use it. Either move towards it or away from it.)
- Use doT for content page templates.
(http://olado.github.io/doT/index.html)
What stability improvements will be possible:
- Lack of type checking.
- This will lead to spontaneous errors that arise as we being
implementing the performance improvements. We'll show that having god
tests in place is very necessary.
- Lack of range checks.
- This will result in incorrect values passed to our internal API as we
begin to implement some performance features that will look fine from
the beginning because we will not have realized the implications
somewhere else.
What performance improvements will be possible:
- Logging process events.
- Start by writing files using vs.appendFileSync().
- First we'll start with the very obvious by switching to fs.appendFile().
- Next move to fs.createWriteStream().
- Once the module is running in multiple processes then move to piping
log data to the cluster's parent via a Pipe. Have the parent process
write out file data in proper order.
- The module will start off as a single running process.
- Introduce the cluster module to use the machine better.
- How to recover crashed processes.
- Sending HTTP responses
- Start by generically writing out each response as individual strings
and not setting Content-Length.
- First make sure each response knows ahead of time how long the
Content-Length will be. (Make sure everyone understands that the
Content-Length is the BYTE SIZE of the data. Not the character length.
Also that it is the byte size of the body of the return message. It
doesn't include the headers.)
- Use stream.cork() to internally buffer all responses until the data is
ready to be sent.
- Cache strings as Buffers where you can.
- *Use proper character encoding for writes.
- Hammer the GC.
- Find bits of buffer data that we want to hang on to and cache their
slice.
- Then move to creating a new SlowBuffer and writing out the data to it.
- Hammer v8.
- Create lots of function closures that cause
- Ultimate awesome performance improvements.
- The HTTP API expects very specific headers. So instead of using the
http module, which is made to handle everything, instead setup a custom
TCP server and parse the incoming requests yourself. (Some of the http
logic will come in handy here. Like knowing the max header size. e.g.
http_parser only allows 80KB through.)
- Quickly check for malformed requests and immediately close the
connection.
* (Used in multiple places. So cover the topic then go through the module
looking for ways to fix that one issue.)
Misc. Notes:
- Each commit will be an improvement. This way we can make sure to go
through the steps in proper order and it'll be easy to make sure each
step works smoothly. (Optimally it might be best to have the improvements
committed in reverse. So the most optimal code is the first commit. Then
would only need to `git checkout HEAD^` for each step.)
- Each step should have a note about what tool to use in order to find the
performance improvement that will be found in the next step.
- Each step should have pre-recorded the individual benchmarks that result
from the change.
- Each step should include an appropriate benchmark for what we're testing.
- Show not only the performance gains but also the memory usage gains.

Syllabus

Introduction to the three key areas of performance

The most important thing for a well performing site isn't just that the code runs fast. In fact, that's not even priority. Here we are going to take a quick look into the three key areas of making your site fast.

Stability

These are just a few basics, but you'd be surprised how often I don't see them followed. Hence the quick overview.

If your site can't hold itself together then developer time will be spent rushing hot patches out the door to keep it afloat. Creating solid code from day one allows developers the time and psychological freedom to experiment with new ways of doing things. This is how I was able to come up with many of the performance improvements in Node core. While there's the occasional bug that needs to be fixed in a hurry, most days I'm able to spend time trying new ways of doing things, assembling benchmarks, analyzing code, etc.

Here are some quick practices you can implement from day one:

  • Make no assumptions about how the API will be used. Type and range check every variable that comes in from the user facing API. I recommend always leaving in these checks.
  • For internal API calls do all the type checks, but add them in a way that can be stripped when the code is ready for production. Having macros in JavaScript can look strange, but creates a good balance of safety and performance.
  • Try not to make APIs that allow function parameters to bubble down. Usage docs that look like function runMe([object[, string[, number]][string[, object]][number]) are best to avoid completely. If you need to do more than one type check per function parameter then rethink your API. This type of API also makes for code that v8 will DEOPTIMIZE.
  • Create a standard way to coerce. This is something that Node core currently suffers from in order to keep backwards compatibility. For argument coercion, here are a couple examples:
    • function run(string[, number]): The first argument is required, so any Primitive passed as the first argument will be coerced to a String.
    • function run([object][string]): Really try not to do this. If working with legacy code then I'd treat all Primitives as a string and everything else as an Object.
  • Create a standard for when Errors will be thrown or passed. This is another thing Node core currently suffers from for backwards compatibility. If the function doesn't accept a callback then there's little choice but to throw. When a callback is accepted then I recommend throwing immediately if arguments types are incorrect and cannot be coerced. Otherwise pass the error to the callback. I also recommend having a document standard set of errors that can be expected so they're easy to check at runtime.
  • UNIT TEST. Make sure there are unit tests for every user facing API. You can easily deduce, at the least, the set of parameters that should cause the function to throw. Start there. If a number is expected then enter all the incorrect ranges. Make sure your functions fail when and how you expect. Allowing bad input in without you knowing is far worse then keeping good input out.

Note: Nothing special, but here is my Primitive coercion guide:

  • Number (e.g. double): +arg
  • Int32: ~~arg or arg|0
  • Uint32: arg >>> 0
  • String: ''+arg
  • Boolean: !!arg

Scalability

At any moment a barrage of users is about to visit your site. They want to sign up for your cool new service, but alas. The site has stopped responding to the flocking users. Can you quickly fire up more servers at a moment's notice without needing to do a bunch of configuration and last minute code changes?

Fortunately Node lends itself to scalable design. Being stuck single threaded (usually) forces users to think about how they can get their application running on multiple processes. From here it's usually easy to figure out how to run on multiple machines.

Proper stress testing can show you if the application's architecture is solid enough to handle the unanticipated loads. Here is a short list of ways to stress test your system.

  • Rush of users (i.e. many requests coming from different IP's)
  • Hacking attempt (e.g. continue to run skipfish instances until your boxes become unresponsive)
  • DDOS (many requests coming from the same IP's)
  • If running in multiple locations, test data center outages. (e.g. what happens to East Coast response times if your Atlanta data center goes down?)
  • Use services to test response time from different locations around the world.

Much of this will come from experience in designing scalable systems. There are some basic guidelines that developers can follow and questions they can ask themselves when architecting a platform:

  • If your Node process takes more than a second (literally) to start up and be ready for use, rethink how things are working.
  • How long does it take to stand up a new server?
  • Can a server maintain itself once it's running (e.g. if a process fails will a new one be launched automatically? will log files be automatically delivered to the correct location if errors occur? etc.).

These tests will truly show if our efforts in the next section are working. Once you get the hang of it a lot of micro optimization are easy to see, but are the micro performance gains costing something at the macro level?

Making your code faster while making the application slower is possible, and easy to overlook. As we begin to explore the intricacies of improving source performance always keep in the back of your mind that the reason for these improvements is to make the application run faster. Not some micro benchmark.

Source

Your mantra should be "hardware is the limitation, not my application."

This will be the focus of the remainder of the training. Please continue to the next section.

Areas of Application Performance

Measuring your application's performance at the micro level can be done in many ways. I generally see a 2x2 grid of computationally vs I/O intensive and my code vs the world.

I/O intensive

Here we go over how data is passed through Node. What Streams and the Event Emitter actually are. We'll strip away some of the abstraction and reveal some of Node's guts.

Computationally intensive

Brief discussion on why doing computationally expensive things is not great for Node, and how they can utilize native modules to accomplish some of these things.

The world

The number of modules in npm are increasing near exponentially. So there are two things that a developer needs to look for:

Selecting a module is a commitment. You can probably find at least half a dozen modules that perform the same task for many different things. Just keep one thing in mind. You don't have a contract for continued support. If the developer decides to stop supporting that module you either have to fork and self maintain that, or rewrite however many lines of code to use a new module. There are plenty of good modules out there maintained by the open source community. Unless you're just hacking together a prototype, make sure you pick one that has a good track record.

Module performance affecting my code. Even if I adhere to every best practice and make the fastest code v8 can chew, if the module I'm using sucks then it will be the bottleneck.

It's important we learn how to differentiate the performance of our code vs the performance of module code. This will be covered in more detail in a latter section.

My code

A lot of good performance is just knowing the API you're using and how to create an API using best practices.

(continue to fill this out...)

Notes:

  • Scoped functions (v8 needing to recompile each new instance of the function, and show that not naming functions can lead to pain).
  • Keep things monomorphic.
    • Predefine object properties on constructors for values set later.
  • Find your hot paths (what's the best way to profile a program so you know how many times functions run?)
  • When profiling, it may be painful separating your code from that of modules being used.
  • How to keep methods in hot paths local (need to write some test cases to see if this is still an issue).
  • Demonstrate what v8 will do when function parameters change during runtime.

Tools to use:

  • Start with v8 built in options to pick the low hanging fruit

    • --trace_inlining (trace inlining decisions)
    • --trace_opt_verbose (extra verbose compilation tracing)
    • --code_comments (emit comments in code disassembly)
    • --trace_deopt (trace optimize function deoptimization) (remember this should be used with --code_comments)
  • Now we want to get an overall picture of what the flow looks like

    • Use --prof and the linux-tick-processor (want a good picture where time is being spent)
    • Use strace
    • Use prof

Bringing it all together. Begin to analyze the module and explaining what different outputs mean from the above flags. Show how to quickly check the v8 source to understand a given code comment better.

Should we step into the Hydra and/or use ll_prof?

Sites for reference:

https://mkw.st/p/gdd11-berlin-v8-performance-tuning-tricks/

http://mrale.ph/blog/2011/12/18/v8-optimization-checklist.html

Into the Beast

This is where the majority of time will be spent. We will begin to walk though a module I will have created. We will analyze the module from the different points of view already discussed while stepping through many of the different tools developers can use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment