Skip to content

Instantly share code, notes, and snippets.

@Me1000
Created August 17, 2012 23:57
Show Gist options
  • Save Me1000/3383530 to your computer and use it in GitHub Desktop.
Save Me1000/3383530 to your computer and use it in GitHub Desktop.
x = {
name: "x"
};
y = {
name: "y",
a: x
};
z = {
name: "z",
a: x,
b: y
};
x.b = y;
x.c = z;
y.b = z;
// When encoding x using https://gist.github.com/3373779 the output of the JSON is:
{
"name": "x",
"b": {
"name": "y",
"b": {
"name": "z"
}
},
"c": {
"name": "z",
"b": {
"name": "y"
}
}
}
// with a tree that looks like:
x
/ \
y z
/ \
z y
// An entierly different representation of the data you want to encode.
// Honestly it seems like more times than not, it would be frustrating to work with
// Because you can never get the original data back.
// I propose this format would be more useful:
{
objects:{"$0": {"name": "x", "b": "__SERIALIZER__$1", "c": "__SERIALIZER__$2"},
"$1": {"name": "y", "a": "__SERIALIZER__$0", "b": "__SERIALIZER__$2"},
"$2": {"name": "z", "a": "__SERIALIZER__$0", "b": "__SERIALIZER__$1"}
},
version: "JSONGraphSerializer1.0"
}
// The main advantage here is that you can always get the original object graph back in memory.
// While this basic example does output a larger string, the actual size of the
// JSON will always be linear with respect to the number of objects encoded.
// The circular ref remover will grow at a greater rate depending on how deep you
// have to go to find the circles in the graph.
// The disadvantage here is that if you're a crazy person and name an object
// attribute '__SERIALIZER__$0' then things get really messy.
@getify
Copy link

getify commented Aug 18, 2012

more times than not, it would be frustrating to work with Because you can never get the original data back.

I don't think this is a valid criticism of either my proposal in the linked gist (https://gist.github.com/3373779) or of JSON serialization in general. There are a number of ways that calling JSON.stringify() (without my proposed changes) on an object already does not produce a serialization that can get back to the original.

For instance, regular expressions get serialized to {}. Dates get serialized to an ISO string (which you have to take a special step to get back to an equivalent Date object). And functions don't get serialized with their toSource() equiv (for security reasons probably), they get dropped: for properties of an object, that property gets dropped from the JSON representation of that object altogether! For an element in an array that's a function, that element is kept, but set to the null value.

That's a strange behavior (between properties and array elements) if you ask me, but importantly, I kept that same precedent for handling circular refs. My attempt was to be as little impact to existing behavior as possible while still improving the current situation.

Bottom line: I think JSON serialization is already not a 2-way safe operation, so that mine also isn't is a moot argument at best, IMO. The way JSON.stringify() works now, your structure throws an error, so is that better than some sort of sane handling?

@getify
Copy link

getify commented Aug 18, 2012

Just added this note to explain more what my motivation for JSON serialization with ignoring of circular refs is: https://gist.github.com/3373779#gistcomment-400333

@Me1000
Copy link
Author

Me1000 commented Aug 18, 2012

Yeah, I agree with you JSON is already broken.
I'm just really hesitant to say we should throw away more data.

The interesting thing here is that all the limitations you mentioned above could be fixed in my proposal.
Since all the object encoded get shoved in a private data structure, you could easily amend extra data about the type of object you're encoding.

If the ultimate goal is to serialize and object into a string, throwing away data seems like poor practice.

@Me1000
Copy link
Author

Me1000 commented Aug 18, 2012

Your motivation for this (debugging) is interesting. Interestingly enough my proposal would also solve that issue… console.log(decode(encode(x))) and you have a live snapshot (duplicate) of x. We actually use something similar to this to duplicate object in Cappuccino.

@getify
Copy link

getify commented Aug 18, 2012

The difference between our opinions is, I don't think what I described makes JSON "broken". IMHO, JSON is about creating a portable serialization of data. By portable, I don't just mean "over the wire", I mean between different platforms/languages. JSON is universal because it's data that makes as much sense in python as c++ as ruby as php as javascript. But clearly non-data (code logic, etc) is nowhere near portable across all those lines.

Functions (in this respect) are not data. Nor are regexs (they're more code than data, I think). If you were going to transport that JSON data somewhere else, the regexes and functions would probably not make any sense in that other context. They should be ignored by a data serialization task.

Given that, the next question would be, is a circular object ref, a piece of data? I don't think so, not in the proper sense. It's meta data. It's not meta-data that likely means anything (or at least, not the same thing) on the receiving end. It should also be ignored in the data serialization.

PHP can serialize objects into a string for session persistence, and then reconstruct that object on the other end. But it doesn't store any special code data in that serialization. Only data. And it doesn't serialize any non-reconstitutable data, like "resource references" (database result list resources, etc). It ignores anything that can't be brought back in any useful way. JSON.stringify(), I think, rightly takes this same approach.


Now, there are clear reasons why other types of serialization (and more robust algorithms for it) are important. And all the ideas you suggest as useful would be arguments for those things.

But I don't think any of that belongs in JSON serialization. It's not that JSON is broken, it's that those things are fundamentally incompatible with JSON, and any attempt to force them in there is going to be proprietary and non-portable, at best.

@Me1000
Copy link
Author

Me1000 commented Aug 18, 2012

Alright, let me rephrase a bit... JSON is simply a notation to representing data.
But as we've seen it's been abused considerably because it works so well for simple data, yet complex data isn't represented quite as well because it's not bidirectional. This new proposal leverages JSON such that the state of an object can be preserved. In fact, the same data could be implemented using XML or a proprietary binary format (just like plists).

Now, does it make sense to serialize a method implementation of an object? Perhaps not, but I think serializing a regular expression does make sense. It's up to the decoder to decide what to do with that data. Basic data types like strings, arrays, dictionaries, dates, and regexs exist in a ton of languages. A decoder for this could be implemented in Objective-C tomorrow and work with almost ALL data being encoded from JS.

I don't think circular references are meta data. For a simple example:

button = {};
view = {
    subviews:[button]
}
button.controller = view;

Given the top down approach the serialization of view would throw away button.controller, but I can't think of a situation where that data wouldn't be important. Likewise, if you serialized button instead of view you get an array missing items... I can think of a lot of situations where that would confuse me...

So I think if you approach the problem with the understanding that you're only serializing data, not code/logic then this proposal is superior to any current JSON.stringify(x)... at least for complex data.


With all that said, I'm still not sure what I think about throwing away code. Granted, it's a bit beyond the scope of the original problem but there are a lot of really cool things that could happen if code were included. That said, I'm not sure it's that big of a deal.

I think ultimately we want to solve the same problem, we just have very different approaches. :)

I guess when it's all said and done I probably won't end up writing this because plists serve me quite well ;), but I do think it would be very valuable to have.

@getify
Copy link

getify commented Aug 18, 2012

JSON is simply a notation to representing data.

Actually, that's now how Crockford defines it. From http://json.org:

JSON (JavaScript Object Notation) is a lightweight data-interchange format.

JSON is about representing data in a universal way so that it can be interchanged. Interchanged with platforms of all different language types. It's recognizing the fundamental data subsets that are true of all languages, and codifying a way for data to be interchanged between all of them (syntax differences not-withstanding).

I think your slightly different view of JSON as just being a way to represent data is why you're missing the bigger argument I was trying to make about "data" which is not, fundamentally, interchangable.

For something to be represented in JSON, there must be a universally agreed upon way to represent it, that all JSON hosts would be able to do something meaningful with, an a non-onerous way.

I don't think there is such a "universal" representation for regexes, function code, or, importantly to this thread, even the concept of a "reference". Those are all things which are non-universal in their representations.

@getify
Copy link

getify commented Aug 18, 2012

To expound on the previous comment...

I think serializing a regular expression does make sense. It's up to the decoder to decide what to do with that data .... I'm still not sure what I think about throwing away code. Granted, it's a bit beyond the scope of the original problem but there are a lot of really cool things that could happen if code were included.

Other data types which JSON supports are, basically, universal... strings, numbers, booleans, key/value object hashes, arrays, etc. Those are all data types which, more or less, work the same in almost all languages, which means that for language XYZ, there's an easy 1-to-1 (and, importantly, reversable) mapping between receiving an array specified in JSON and how that array should be represented in that receiving host language.

Regex on the other hand is not universal. There are many different flavors/syntaxes of regex, and so to say that I could take the JavaScript representation of a regex, package it into a JSON document, and ship it off to Visual Basic, and expect that Visual Basic would be able to interpret that regex syntax... that's impractical.

A JSON encoder/decoder alone could NOT be expected to translate the complex syntax differences and nuances between all the different regex standards. If you really needed to exhange regex logic between two different worlds, you'd have to write your own custom serialization/deserialization on both ends. But that definitely would not be the job of the built-in JSON support in all the various languages.

The exact same argument goes for function code, but even more strongly and obviously. How on earth could Visual Basic be expected to do anything useful with a snippet of JS that it received in a JSON document? To exchange code logic between two platforms, again you'd have to have some sort of custom serialization (perhaps compiling code to byte-code or something) that both sides could agree on, but it wouldn't even remotely be something that the core JSON could be expected to do.


To the main point of this thread: you've proposed a serialization approach for object references. Fine. I think it's a bit intrusive and heavy-handed, given that when I look at the JSON it produces, it's not very obvious to me how to reconstruct what it represented... but setting aside personal "readability" for a moment (because binary representation is also hard to "read" and yet it's perfectly valid), I think you're still missing a really important point.

What if a snippet of this JSON with these reference paths were to be sent to a JSON parser for a language that has no concept of object references? What should that JSON parser do with what your proposed serialization is instructing it to do? Should it create duplicate objects? If it were old-style C, should it use pointers (which are both like and unlike references in important ways...)?

Because JSON is supposed to be about the universal interchange of data, I don't see how object references could ever be a candidate for inclusion. They face similar sticky questions to those that have left regexes and functions out of the mix.

@getify
Copy link

getify commented Aug 18, 2012

Given the top down approach the serialization of view would throw away button.controller, but I can't think of a situation where that data wouldn't be important. Likewise, if you serialized button instead of view you get an array missing items... I can think of a lot of situations where that would confuse me...

See the second point I made in this comment: https://gist.github.com/3373779#gistcomment-400333

@getify
Copy link

getify commented Aug 18, 2012

BTW, this seems right up your alley: https://github.com/douglascrockford/JSON-js/blob/master/cycle.js

It's not suitable for me, for the reasons I express here: https://github.com/douglascrockford/JSON-js/issues/39#issuecomment-7845004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment