Skip to content

Instantly share code, notes, and snippets.

@mgan59
Last active December 29, 2015 00:09
Show Gist options
  • Save mgan59/7584055 to your computer and use it in GitHub Desktop.
Save mgan59/7584055 to your computer and use it in GitHub Desktop.
my notes from the node philly hackon-ops 2013

Leadonomics on Monitoring

  • Presenter: Tom Shawver, @tomfrost
  • Company: Leadnomics

They are an ad server platform that was built on php/mysql. did not scale.

Greatist Hits (aka great fail)

Node is single thread, this is false, shows a libuv code sample

static uv_thread_t default_threads[4];

There are actually 4 threads in the pool. But networking calls are async and use a system-stack kernel and don't consume threads. File system calls will consume a thread. So some things are still thread blocking and some things are not.

With node 0.10.x garbage collection improvement

setImmediate() vs processNextTick()

before 0.10.0 the event loop worked

empty nextTick queue -> callbacks for queued tasks empty nextTick queue… you wouldn't know if the code would execute now or next tick.

need to schedule execution immediately within the current tick processNextTick()

so if you needed to schedule execution for the next tick setImmediate()

WTF

Harsh reality things will break.

So need to monitor it, use splunk new relic.

But you need super smarter monitor.

  • Use metrics more suited to Node.js
  • Monitor application specific values alongside system values, aka 'free-memory' is probably useless and other values.
  • Know your danger zones
  • Auto-repair unforeseen issues when possible
  • Log issues at non-alert level if the monitor can auto-reapir

Tick Monitoring, we measure how long it takes to do a group of 1000 ticks

{
    tick:7.23
    avgMs:5.87
    maxMS:9.00
    perSec:1234
}

code demo of express with [github.com/Leadnomics/node-vitalsigns](vital signs)

attaches vital signs to the express and then tells it to record various metrics [cpu, men, tick, uptime]

Show that we can configure vitalSigns to monitor for unhealthy behavior which will then have express server out a 503 (service unavailable) whereby most load balancers can remove the server from it's round-robin-system and retarget at a later time to see if the server is returning 200 again and if so add it back to the round-robin.

Controlling and Load Balancing Complex Applications with Nodejs

  • Presenter: Bryan Paluch, @bryanpaluch
  • Company: Comcast

Does Telephony / IP Communications, tech lead on the WebRTC initiative

Shows a crazy diagram of a complex system, legacy telephony that are stateful and hard to load balance. Most of the system require system level programming and not web development. Node is great cross platform and without the bulk of the jvm. Node provides incredible real time system communication.

So at comcast they were outsourcing all their conference call communication systems. Cost them several million dollars a year. Decision to bring conference calling in house, they are a telephony company after all.

Load Balancing Conference Servers

  • Calls come in, and get routed to a conferencing server (these servers have to be sticky)
  • Every user in the conference must be routed to the same server
  • multiple servers are needed to support service
  • load needs to be equally spread among the cluster

Problem 1

Hot Spotting, occurs when a buildup can occur, example of 100 people on a conference call talking for 2 hrs vs 2 people for 2 mins. Can't move them real time. Round robin is incapable of solving this problem.

Solution 1

Comcast uses FreeSwitch an open source conference server. Freeswitch exposes an interface that node-esl can communicate with.

Node as a central brain to report load and server statistics (analyze Freeswitch server)

Create a lock on conference location in central brain make sure the central brain does not go down

So how do we do this? Redis acts a state storage/central brain

Problem 2

Load Balancing Transcoders

Differences in transcoding from conferencing.

  • state doesn't need to be distributed to all users
  • only the service needs to know the stickiness (which transcoding gateway)

Solution 2

Use Peer 2 Peer Service Architecture, diagram showing how the various services message one-another their state information instead of using a central db-server.

The gossip is the term for how they can integrate, so any given server should know which server is least loaded and they can contact the load balancer indicating they should receive jobs/processes.

Node libraries GossipMonger by Tristan Slominski Gossiper Vines by Paolo Fragomeni (nodejitsu)

DEMO using his p2pdemo-arch using GossipMonger Shows his github demo where has this p2p messaging system. Talks about the cons to this approach being scale.

StrongLoop - NodeFly/Ops

  • Presenter: Glen Lougheed, @glougheed
  • Company: StrongLoop

demo of the StrongLoop dashboard for node's server performance.

Basic Architecture, initially to have api,web,db machines all communicate to a collector service using web sockets.

They tried using express and socket.io to communicate with their collector. Found Express to be awesome. socket.io sucks. Socket.io is terrible for machine-machine communication specifically around reconnection. Socket.io requires a dual-handshake which a load-balancer can mis-match the handshake which invalidates the authentication. Tried socksjs and axon. both fail. They built Uhura, a small event emitter that routes communications. Goal being to keep it small so that people can swap out their own networking pieces, tcp or udp.

Journey Through LevelDB

  • Presenter: Jarrett Cruger, @jcrugzz
  • Company: Nodejitsu

LevelDB is a basic key-value pair storage system Uses LSM (Log Structured Merge Tree)

demo of a twitter-stream system that pipes to levelDB and then pipes a readable stream from levelDB to websocket/primus

Websocket system called Primus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment