Skip to content

Instantly share code, notes, and snippets.

@jed
Created September 3, 2011 06:34
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jed/1190733 to your computer and use it in GitHub Desktop.
Save jed/1190733 to your computer and use it in GitHub Desktop.
testing namespace collisions for 2-byte abbreviated keys for the document object

experimenting with a more general way of hashing a namespace, for future javascript golf. the idea is that existing browser objects can be hashed so that wasteful strings like createDocumentFragment can be avoided entirely, and keys from several objects (such as document, window, String, etc.) can be hashed into one namespace.

for example:

d = document
d.body.appendChild(d.createElement("script")).src="URL"

could be shortened to

d = document
d[rk][XU](d[Ou]("script")).src="URL"

this specific approach takes every key on the document, turns it into a large number by multiplying each codepoint with an index and then summing and modding by 3456 (54 * 64), which is the maximum number of items representable with a two-byte javascript identifier:

/[a-z_$][0-9a-z_$]/i

this yields decent results. in an empty chrome page, there were only 3 collisions. the idea is that the upper bound can be lowered as necessary, until the collisions are acceptable.

var namespace = function(a,b,c,d,e) {
for(c in document){
for(d=e=c.length;d;e%=3456)e+=d*c.charCodeAt(--d)
e = a.slice(10)[e>>6] + a[e%64]
b[e] || (b[e] = [])
b[e].push(c)
}
return b
}("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_$",{})
{
"Dp": ["attribution-img"],
"IP": ["images"],
"jS": ["uid"],
"pF": ["head"],
"k0": ["all", "createNodeIterator"],
"W2": ["doctype"],
"f6": ["nodeType"],
"uk": ["linkColor"],
"rk": ["body"],
"IB": ["embeds"],
"hw": ["URL"],
"PY": ["parentElement"],
"Ug": ["bgColor"],
"bO": ["styleSheets"],
"qc": ["localName"],
"RZ": ["ownerDocument"],
"Ai": ["forms"],
"h3": ["referrer"],
"kY": ["defaultCharset"],
"sO": ["nodeValue"],
"S$": ["documentURI"],
"Jf": ["height"],
"Hl": ["designMode"],
"K2": ["readyState"],
"qV": ["lastModified"],
"ez": ["webkitVisibilityState"],
"LJ": ["preferredStylesheetSet"],
"J_": ["prefix"],
"zi": ["width", "close"],
"$0": ["xmlEncoding"],
"sY": ["characterSet"],
"W9": ["anchors"],
"K9": ["previousSibling"],
"Wg": ["plugins"],
"Uk": ["fgColor"],
"g6": ["namespaceURI", "createTextNode"],
"Pl": ["activeElement"],
"ry": ["lastChild"],
"Pa": ["xmlStandalone"],
"dx": ["textContent"],
"a8": ["nextSibling"],
"IQ": ["domain"],
"Wu": ["applets"],
"VK": ["charset"],
"c3": ["nodeName"],
"IH": ["cookie"],
"IA": ["childNodes"],
"MI": ["baseURI"],
"OD": ["inputEncoding"],
"pU": ["implementation"],
"HU": ["compatMode"],
"zU": ["links"],
"zg": ["title"],
"Ie": ["firstChild"],
"OK": ["attributes"],
"bs": ["defaultView"],
"KU": ["vlinkColor"],
"MZ": ["xmlVersion"],
"aH": ["selectedStylesheetSet"],
"Kz": ["alinkColor"],
"Ik": ["parentNode"],
"qf": ["webkitHidden"],
"gO": ["location"],
"X7": ["scripts"],
"JU": ["documentElement"],
"kf": ["dir"],
"qW": ["open"],
"zA": ["write"],
"VM": ["writeln"],
"yJ": ["clear"],
"Tb": ["captureEvents"],
"QY": ["releaseEvents"],
"fH": ["hasFocus"],
"Ou": ["createElement"],
"Je": ["createDocumentFragment"],
"PM": ["createComment"],
"NB": ["createCDATASection"],
"PG": ["createProcessingInstruction"],
"LF": ["createAttribute"],
"YL": ["createEntityReference"],
"co": ["getElementsByTagName"],
"x1": ["createElementNS"],
"ze": ["createAttributeNS"],
"cy": ["getElementsByTagNameNS"],
"$x": ["getElementById"],
"af": ["createEvent"],
"YE": ["createRange"],
"gf": ["evaluate"],
"ZE": ["execCommand"],
"Dk": ["queryCommandEnabled"],
"rY": ["queryCommandIndeterm"],
"F$": ["queryCommandState"],
"gk": ["queryCommandSupported"],
"Fh": ["queryCommandValue"],
"zr": ["getElementsByName"],
"hP": ["elementFromPoint"],
"Lu": ["caretRangeFromPoint"],
"vs": ["getSelection"],
"Mf": ["getCSSCanvasContext"],
"uI": ["getElementsByClassName"],
"SS": ["querySelector"],
"dr": ["querySelectorAll"],
"Jr": ["importNode"],
"sd": ["adoptNode"],
"cN": ["createTreeWalker"],
"iJ": ["getOverrideStyle"],
"nA": ["createExpression"],
"fq": ["createNSResolver"],
"sQ": ["insertBefore"],
"nM": ["replaceChild"],
"Z0": ["removeChild"],
"XU": ["appendChild"],
"Kx": ["hasChildNodes"],
"rc": ["cloneNode"],
"w8": ["normalize"],
"dy": ["isSupported"],
"U9": ["hasAttributes"],
"wA": ["lookupPrefix"],
"eT": ["isDefaultNamespace"],
"Q6": ["lookupNamespaceURI"],
"iu": ["addEventListener"],
"Sf": ["removeEventListener"],
"FB": ["isSameNode"],
"Y6": ["isEqualNode"],
"C5": ["compareDocumentPosition"],
"Pq": ["dispatchEvent"],
"Np": ["ELEMENT_NODE"],
"si": ["ATTRIBUTE_NODE"],
"aA": ["TEXT_NODE"],
"Ru": ["CDATA_SECTION_NODE"],
"ek": ["ENTITY_REFERENCE_NODE"],
"AR": ["ENTITY_NODE"],
"EG": ["PROCESSING_INSTRUCTION_NODE"],
"NR": ["COMMENT_NODE"],
"bC": ["DOCUMENT_NODE"],
"WT": ["DOCUMENT_TYPE_NODE"],
"Ih": ["DOCUMENT_FRAGMENT_NODE"],
"bB": ["NOTATION_NODE"],
"o2": ["DOCUMENT_POSITION_DISCONNECTED"],
"tX": ["DOCUMENT_POSITION_PRECEDING"],
"G8": ["DOCUMENT_POSITION_FOLLOWING"],
"_o": ["DOCUMENT_POSITION_CONTAINS"],
"AE": ["DOCUMENT_POSITION_CONTAINED_BY"],
"dq": ["DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC"]
}
@mathiasbynens
Copy link

// while this is more efficient:
b[e]||(b[e]=[])
// this is shorter:
b[e]=b[e]||[]

@jed
Copy link
Author

jed commented Sep 3, 2011

sure sure, and i can modify the base64 code to get rid of the map. the question is whether the approach (not the implementation) is viable at all, or if there's a better way to do it.

@p01
Copy link

p01 commented Sep 3, 2011

This approach is definitely viable. No doubt about it.
And as a general purpose hashing technique, this is pretty good: Fast, simple and quite compact already. Of course the implementation can be shortened.

Better in which way ? It depends on the use case. For a js1k prod, I would go for something more specific and therefore more compact.

@p01
Copy link

p01 commented Sep 4, 2011

Also, I believe the shortened example would actually be shortened to ( if the hashing is also done on the body AKA rk )

d.rk.XU(d.Ou("script")).src="URL"

@jed
Copy link
Author

jed commented Sep 5, 2011

then we would have to hash on every node. would make more sense to use the strings for any DOM node, even though it takes one more character.

@jed
Copy link
Author

jed commented Sep 5, 2011

i'm also thinking that .toString(36) may be a better way to generate keys as long as collisions aren't a problem, since it requires much less code.

@p01
Copy link

p01 commented Sep 5, 2011

We could hash the methods/properties name of HTMLElement under one namespace, making the example:

d.rk[N.XU](d.Ou("script")).src="URL" // N holds the hashed keys of HTMLElement

However, I'm not sure what is the use case for this: a new Uglify JS ?

Alas, .toString(36) has a pretty high chance of producing invalid identifiers ( starting by a digit ), but who knows, we might get lucky.

@jed
Copy link
Author

jed commented Sep 5, 2011

well, i think we could avoid invalid identifiers by limiting the integers to everything greater than '9z'.

(and i'm considering this for my own js1k project based on 140byt.es snippets.)

@jed
Copy link
Author

jed commented Sep 24, 2011

okay, so a big flaw in this approach is that in a significantly saturated namespace, there are bound to be keywords that overlap with existing javascript keywords like do and in. blarg.

@p01
Copy link

p01 commented Sep 24, 2011

How likely is that to happen ? Surely it's possible to fiddle with the formula e+=d*c.charCodeAt(--d) formula, or if that's really only these 2 keys ( 'do' and 'in' ), we could skip them altogether:

for(c in document){
    for(d=e=c.length;d;e%=3454)e+=d*c.charCodeAt(--d)

    e+=25711<e;  // skip the 'do' key
    e+=26990<e;  // skip the 'in' key

    e = a.slice(10)[e>>6] + a[e%64]

    b[e] || (b[e] = [])
    b[e].push(c)
  }

  return b

}("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_$",{})

@jed
Copy link
Author

jed commented Sep 24, 2011

it's happening in the code i'm using, giving me errors like Uncaught SyntaxError: Unexpected token if. i guess i'll try fiddling the modulus until it works.

@p01
Copy link

p01 commented Sep 24, 2011

e+=26982<e // skip the "if" keyword :p
Actually, if you wanna go the "skip keywords" road, the modulo should be applied after the "skips", or the "skips" should be applied inside the loop computing e which would allow us to keep the modulo at 3456

@jed
Copy link
Author

jed commented Sep 27, 2011

by the way, i've rolled these ideas into namedrop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment