Skip to content

Instantly share code, notes, and snippets.

@lancejpollard
Created July 25, 2012 02:49
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save lancejpollard/3174095 to your computer and use it in GitHub Desktop.
Save lancejpollard/3174095 to your computer and use it in GitHub Desktop.
Random Notes on Redis/Mongodb/Memcached/Caching (mostly on Twitter and hasMany through relations)

Garbage collection

Rails and Garbage Collection (and inserting data faster - Identity Map)

The problem in Rails was that if you have post.comments.first.update_attribute('post', null), and Post.destroy(post.id), it will destroy the comment, which you just changed postId to null so it shouldn't be available to the post. To fix this you need to remove the comment from the comments array after its postId is changed. So you need to have a map of the comment to the associations it's in (the cursors), and just iterate through them and remove it. Basically, just iterate through all cursors for the comment when a property changes on it that's part of its observableFields, and if it doesn't match anymore, remove it from the in-memory array. This way, when you find the post again with Post.destroy(), which returns the in-memory post, it will have the post.comments association, but the comment won't be in there, so dependent-destroy won't have effect. Also, dependent-destroy shouldn't even be affecting this... it should realize the comment.postId is null now.

Answer. So, global identity map scoped to the current request. Attach the request to the controller and vice-versa. Do App.Post.with(@), which initializes an identity map on the request. Have the identity map keep track of all the cursors and models instantiated in the request. Then after the controller responds, after any after callbacks, everything in the identity-map is cleared from memory with Ember.Object#destroy. If you have some async callback after the request has been written (say, doing a streaming operation or progress indicator), then it's up to you to fetch the records again. Instead of doing that, you should create a background job and pass the current user's socket id so you can send them messages through the already-instantiated web socket. This frees up the controller and everything in the identity map for garbage collection, making room for the next request.

Rails identity map is cleared when a request is closed. rails/rails#6524

Some ideas for garbage collecting ember in node

You can keep a global hash pointing to all of the controllers instantiated, and if it or a model has not been accessed within some interval of time, it destroys it. This way, you could store all ember guids in a global list and refresh a timer if any of the objects have been accessed in less than a specified interval ['__ember__guid_1232181', '__ember__guid_1232132', ...], otherwise it will iterate through the list, find the objects by that id, and delete it.

Perhaps we could also keep a global object pool of instantiated models of each type, just so the server only needs to swap the attributes out. It might be cheaper to just delete them and start over, but maybe it's better to have like a million objects in memory, and swap them out.

The computed properties are what we need to worry about on the server. If they return another model instance, then there is potentially a memory leak if they are not destroyed, no? Hmm... If there are no objects pointing to either of those records (circular referencing each other), will there be a memory leak? That is, if they are "unreachable", shouldn't they be garbage collected? Need to test.

What about variables in the controller, do they need to be garbage collected?

How about the .instance() property for the current controller on the server? (don't think we're even using that).

Need to set up sort of debugger/logger for the properties watched in Ember (or all the event listeners) on the server.

Need to clear out the cursor.data property.

You want your requests to return as quickly as possible so the javascript can be garbage collected. Then run processor-intensive functions in a separate process. How do you then do things like streaming back progressive file upload data? Maybe in this case you have a deallocate function that you can run when you start your long-running process. Or you can get access to the socket for the user from a background job! Tower.connections[job.data.socketId]. To make this work we'll have to message via the command-line and hook.io to the socket.io server. Unless there's some way to run the worker alongside the job.

If this happens in the controller, will the controller be garbage collected (and all properties on it), even if the function it calls internally is long-running?

class App.AttachmentsController extends App.Controller
  create: ->
    App.Attachment.create @params, (error, attachment) =>
      # Say this is non-blocking but takes about a minute, will everything except the attachment be garbage collected?
      # Probably not, which is why you want to start up background processes.
      # So, this function should create a background job, passing the currentUser id, which we can use to search the sockets
      # for the socket, which we can use to send data back, all in a separate process so the 
      # request/response cycle can be freed up and garbage collected.
      attachment.processAndUploadInBackground()
      @render json: attachment
  • There is no explicit garbage collection code for the current HTTP request, so it must be getting cleaned up.

You can set the Ember guid to the model guid!

record[Ember.GUID_KEY] = databaseRecord._id.toString()

Then maybe whenever you call Ember.guidFor and it matches the object id, then maybe you can pass that into the identity map.

Ember.destroy

  • Ember.destroy: Tears down the meta on an object so that it can be garbage collected.
  • Ember.Object.create().destroy(): Destroys an object by setting the isDestroyed flag and removing its metadata, which effectively destroys observers and bindings.
  • Ember.Object#willDestroy: called the frame before it will actually be destroyed.
  • Ember.Object#didDestroy: called the next frame, just after all metadata for it has been destroyed.

Debugging/Profiling

Better way to remove items from away (not adding to the garbage collector by creating new arrays):

for (var i = index, len = arr.length - 1; i < len; i++)
  arr[i] = arr[i + 1];

arr.length = len;

Caching

MongoDB

Redis vs. Memcached

Redis

Use MongoDB to store the details (membership.createdAt, membership.role, etc.) but use redis just to map the ids (user.membership_ids, user.group_ids).

You want to store all of these ids in redis so you can do fast writes as well! So every time a user posts a tweet, it instantly can grab all users following that group (pure redis query) and push that tweet id into their feed (even if they have 1 million followers redis can do that in 10 seconds). And twitter probably only pushes it into the users that have been recently active (so if you come after not coming for a month or whatever, you have to wait for it to load). In that case, it has to fetch all the users you followed and compute the ids (grab all follower ids for user from redis, then grab each user timeline's latest tweets, and add them to this users home timeline).

  • twitter stream algorithm

Redis Search

Scaling

News Feed

Determining what data to store depends on your front-end (including what activities your users participate in) and your back-end. I'll describe some general information you can store. Italics are special, optional information you might want or need depending on your schema.

Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)

user_id - user who generated activity source_id - record activity is related to activity_type - type of activity (photo album, comment, etc.) edge_rank - the rank for this particular activity parent_type - the parent activity type (particular interest, group, etc.) parent_id - primary key id for parent type data - serialized object with meta-data

To support relevance filtering and personalization, we needed three types of signals:

These servers use a specialized ranking function that combines relevance signals and the social graph to compute a personalized relevance score for each Tweet.

Twitter is a complex yet elegant distributed network of queues, daemons, caches, and databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment