Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save wangjingyi/4635625 to your computer and use it in GitHub Desktop.
Save wangjingyi/4635625 to your computer and use it in GitHub Desktop.
/**
* Automatic Reference Counting (ARC) Garbage Collection technique.
*
* This diff describes Reference counting GC technique.
* See also previous lesson on Mark and Sweep GC:
* https://gist.github.com/4391763
*
* Covered topics:
*
* 1. Reference value in ECMAScript (JS specific)
* 2. ARC algorimths from generic computer science.
*
* by Dmitry Soshnikov <dmitry.soshnikov@gmail.com>
* MIT Style License
*/
// ******************************************************************
// I. Preamble: heap and references
// ******************************************************************
// ------------------------------------------------------------------
// 1. Heap
// ------------------------------------------------------------------
// For simplicity, we represent the "heap" (our memory where all objects
// are stored) as a JS array. Address at which an object is stored is
// the array index.
var heap = [];
// ------------------------------------------------------------------
// 2. References
// ------------------------------------------------------------------
// To *refer* objects on the heap, we need special values which
// store *addresses* of the objects. These are *pointers*, or in our
// cases -- *references*.
//
// For example, if "x" reference stores value 1, it means that at index
// 1 of the heap array there is an object that "x" points to.
//
// For global variables we could store addresses directly in the variables
// (like var x = 1), however, some properties of objects may be the references
// themselves and refer another objects.
//
// To handle both cases in the same way, we introduce small
// helper class, the "Reference" class.
//
// In JS world, we achieve it through two components of a reference object:
// the "base" object and the "offset" in borders of this object.
// (The "offset" is a property name of this object in JS)
//
// Examples:
//
// Global variable (reference): the "base" is the global object,
// the "offset" (the property name) is "a":
//
// var a = {name: 'a'}; // Reference(global, "a")
//
// A reference within another object, the "base" is "a" object,
// the offset (property name) is "x":
//
// a.x = {name: 'x'}; // Reference(a, "x");
//
// The value stored at these coordinates (base:offset) is exactly
// the *address* on the heap (the index of the heap array). Having
// the address, we then can get the value stored at it.
function Reference(base, offset) {
this._base = base;
this._offset = offset;
}
Reference.prototype = {
// gets the reference value (the address)
getAddress: function () {
return this._base[this._offset];
},
// Gets the actual contents stored by this address.
// This operation is known as *dereferencing* of
// a reference (of a pointer).
getContents: function () {
return heap[this.getAddress()];
},
// Rebinds the reference to another address.
setAddress: function (address) {
this._base[this._offset] = address;
}
};
// (BTW, exactly this implementation of references is used and specified
// in ECMAScript specification).
// ******************************************************************
// II. Reference counting GC
// ******************************************************************
// Automatic reference counting (ARC) is a garbage collection technique
// with a very simple algorithm: every object stores a special value (reference
// count value) which is increased every time a new reference is assigned the
// address of this object, and decreased each time the address is removed from
// some reference to this object. The whole "magic" happens at assignment to
// a reference, rebinding the reference to another address and removing
// objects from the heap.
// (Note, JS's GC is not specified. However, in practice usually can be found
// Mark and Sweep GC (since ARC has some disadvantages described bellow).
// ARC is used in some languages though, like Objective-C).
// ------------------------------------------------------------------
// 3. Objects allocation
// ------------------------------------------------------------------
// To manage this reference counting special value, we introduce a handy
// "allocate" function which initializes reference counting flag to zero,
// and also provides actual allocation of this object on the heap (at first
// available free address). As a result, the function returns the address
// at which object was allocated.
function allocate(obj) {
var allocAddress = heap.length;
obj.__rc__ = 0;
heap.push(obj);
return allocAddress;
}
// ------------------------------------------------------------------
// 4. Assignment to references, ARC algorithm
// ------------------------------------------------------------------
// That's said, the main algorithm actually happens at assignment operation.
//
// 1. We get the *previous (current) object* to which the assigning reference
// points to and *decrease* its reference counting.
//
// 2. Then we get the object stored at *new address* and *increase*
// its reference count (but, only if the new address is not null:
// in this case we just reset this reference at first step).
//
// 3. Now, if reference count flag of the *previously referred object*
// is zero, we immediately remove the object from the heap.
//
// 4. Finally, we do actual assignment to the reference, i.e. set it
// to the new passed address.
function assign(ref, address) {
// Previously (currently) stored object
// at the address of this reference.
var valueOfRef = ref.getContents();
// Object at new address.
var valueAtAddress = heap[address];
// Decrement the reference count of the previous object
// (if its reference was not null-reference of course)
valueOfRef && valueOfRef.__rc__--;
// And if the new address is not null, increase the
// reference count of the new object.
if (address != null) {
valueAtAddress.__rc__++;
}
// Check and remove old object from the heap, if there no reference to
// it anymore (i.e. its reference count value is zero).
recycleObjectAtAddressIfNoRef(ref.getAddress());
// Finally, assign the reference to the new address.
ref.setAddress(address);
// And return the reference back.
return ref;
}
// ------------------------------------------------------------------
// 5. Recycling non-referenced objects, ARC algorithm
// ------------------------------------------------------------------
// As we said, if reference counting value becomes zero,
// the object may be removed from the heap (since no one refers
// it anymore). However, at destructuring the object, we should
// deallocate all its properties too. And if some of its properties
// are reference to objects too, we should decrease reference counting
// value of these sub-objects as well, and potentially check them
// for recycling.
function recycleObjectAtAddressIfNoRef(address) {
// Object stored at address.
var valueAtAddress = heap[address];
// If it's not null and no one refers it anymore, it's time
// to remove it from the heap.
if (valueAtAddress && valueAtAddress.__rc__ == 0) {
// But, before recycling the object itself, first all its reference
// sub-properties, decreasing reference count of object they refer.
for (var k in valueAtAddress) if (k != '__rc__' && isAddress(valueAtAddress[k])) {
// The address on the heap of this sub-object.
var subObjectAddress = valueAtAddress[k];
// Actual object stored by this address.
var subObject = heap[subObjectAddress];
// Decrease its reference count.
subObject.__rc__--;
// And also check for possible releasing from the heap.
recycleObjectAtAddressIfNoRef(subObjectAddress);
}
// Now we can remove our object itself from the heap.
heap.splice(address, 1);
}
}
// For simplicity, use this helper funciton checking whether
// a property of some object is also the reference to another
// object, i.e. stores *address* (just the numed, the heap's array
// indecies).
function isAddress(value) {
return typeof value == 'number';
}
// ------------------------------------------------------------------
// 6. Testing
// ------------------------------------------------------------------
// global variables (references) have the global
// object as their "base" part.
var global = this;
// -- 6.1 One reference to an object: -------------------------------
// Allocate "a" object and assign it to "aRef" reference
// (this is rougly equivalent to var aRef = {name: 'a'} in JS).
// aRef ---> {name: 'a'}
var aRef = assign(new Reference(global, 'a'), allocate({name: 'a'}));
// The same with "b" object, which is assigned to "bRef" reference,
// (var bRef = {name: 'b'} in JS).
// bRef ---> {name: 'b'}
var bRef = assign(new Reference(global, 'b'), allocate({name: 'b'}));
// Check the heap state: currently "a" and "b" objects
// are referenced by one reference per each object
// (i.e. __rc__ flag of both objects is 1).
// [{name: 'a', __rc__: 1}, {name: 'b', __rc__: 1}]
console.log(heap); // "a" and "b" objects
// -- 6.2 Two references points to the same object: -----------------
// Now set the "bRef" reference to the *same address* as "aRef" has.
assign(bRef, aRef.getAddress());
// This causes decreasing reference count of "b" object on the heap and
// increasing reference count of the "a" object. Now "b" object doesn't
// have any reference to it, which makes it automatically garbage collected:
// aRef ---> {name: 'a', __rc__: 2} <--- bRef
// {name: 'b', __rc__: 0}
// [{name: 'a', __rc__: 2}]
console.log(heap); // only "a" object is left and "b" object is removed.
// -- 6.3 Reset references (set to null): ---------------------------
// As we implemented in "assign", if the assigning address is null,
// it means the __rc__ of the object the reference points to is
// just decreased:
assign(aRef, null); // reset "aRef" reference
console.log(heap); // [{name: 'a', __rc__: 1}]
// Still there is one reference to "a" object via "bRef",
// reset it as well. It automatically removes "a" object
// from the heap, since __rc__ becomes 0.
assign(bRef, null);
console.log(heap); // [], empty.
// -- 6.4 Issue: reference cycles -----------------------------------
// There is one problem with ARC which cannot be handled in obvious way
// (although, there are some workarounds) -- this is *reference cycles*.
//
// Consider a use case (in JS-code example):
//
// var xRef = {name: 'x'};
// x.next = {name: 'x:next'};
// x.next.backToX = x;
//
// In our implementation, it's represented as follows:
// Allocate "x" object, x ---> {name: 'x'}
var xRef = assign(new Reference(global, 'x'), allocate({name: 'x'}));
// Allocate "next" object which is referenced as x.next:
// x ---> {name: 'x', next: ---> {name: 'x:next'}}
var xNextRef = assign(new Reference(xRef.getContents(), 'next'), allocate({name: 'x:next'}));
// And now property "backToX" of the "next" object refers back to the
// "x" object. Set "x.next.backToX" to the same address as "x" reference has.
var backToXRef = assign(new Reference(xNextRef.getContents(), 'backToX'), xRef.getAddress());
// Check the heap state, it should be 2 objects: "x" is referred twice (from the
// global "xRef" and from the "backToX", and "x:next" is referred once from "x.next").
// Notice how "next" and "backToX" properties stores addresses (heaps array indices).
//
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}]
console.log(heap); // "x" and "x:next"
// Now the actual problem happens if we reset global "xRef" reference setting it to null:
assign(xRef, null);
// This decreased __rc__ of the "x" object by one and from the user code we cannot reach
// this object anymore. But it *cannot be garbage collected* by ARC, since its __rc__ is
// still not zero! -- there is still one reference to it from "x.next.backToX" property!
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}]
console.log(heap);
// To avoid this situation, *before setting xRef to null* we should set *manually* reset
// "x.next.backToX" reference. I.e. set it manually to null too.
// (Restore "xRef" for our experiment, from the "backToRef" now)
assign(xRef, backToXRef.getAddress());
// Check that "x" object is again referenced by two pointers:
// [{name: 'x', next: 1, __rc__: 2}, {name: 'x:next', backToX: 0, __rc__: 1}]
console.log(heap); // "x" and "x:next"
// Now do it right, and reset first "backToRef":
assign(backToXRef, null);
// Check the heap, see "x" is correctly referenced by one reference:
// [{name: 'x', next: 1, __rc__: 1}, {name: 'x:next', backToX: 0, __rc__: 1}]
console.log(heap);
// And now we can safely reset the global "xRef" pointer, which automatically
// will garbage collect both "x" object and "x.next" as one of its sub-objects:
assign(xRef, null);
console.log(heap); // [], empty.
// ------------------------------------------------------------------
// 7. Conclusion
// ------------------------------------------------------------------
// ARC GC technique has its own advantages and disadvantages as we saw:
//
// Advantages:
//
// 1. Simple at implementation
// 2. No pauses for GC cycles as e.g. in Mark and Sweep GC.
//
// Disadvantages:
//
// 1. Reference cycles which cannot be handled in obvious way.
// In the example above we used manual resetting sub-objects
// to null. Alternatively, a runtime may use both ARC and
// Mark and Sweep GC (first removes objects immediately, the second
// cleans up objects which ARC couldn't handle).
//
// 2. Performance issues. The assignment operation besides actual assignment
// should additionally and every time handle __rc__ value of two objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment