public
Last active

Cypherlinks

  • Download Gist

I think this sounds awesome

Im leaving a comment!

Would this create a similar system to a graph database? Cause it abstracts the data model into a node, and nodes are linked. Am I correct?

Immutability + only linking to existing objects + subset replication + these very simple rules... seems like an amazing distributed foundation. Are the links immutable as well? Like hyperlinks on the Internet, need to think about what happens when an immutable object disappears, perhaps this is much better because there's a greater chance of replicas existing in the ether, immutability being key.

To @navaru's point do the links (edges) have meaning?

And then there's the hash collision problem. What if the hash had some scheme like the time since epoch appended of something, still can be calculated from the object (since you must know time of origination), but name spaced at millisecond resolution?

@navaru as long as you take into account of the direction of the edges, it can only be a tree.
to make this work as a graph (where cycles, etc, are possible) you'd have to index the edges,
and then ignore where they came from.

For example: As a tree

// x -> y means that x includes the hash of y in it's body.
// aka, x links to y
A -> B
C -> A
D -> A
D -> C

That would be the view of reality you'd get from reading D,
and then reading the documents that it's hashes link to.

If you ignored the directions, (you'd have to index the links separately), you'd get this:

// x <-> y means either x -> y, or y -> x is true..
// aka, x links to y or y links to x.
A <-> B
C <-> A
D <-> A
D <-> C
//then, we have a cycle
D <-> C <-> A <-> D

However, it's rather difficult to ignore the direction,
If you have a document, you know about the outgoing links,
but unless you actually have the documents that are the sources of those links,
(which would enable you to create a document linking to those sources, making it a rooted tree)
then you never know what is the total set of incoming links.

@mbrevoort so if you are worried about a important link disappearing, you just copy it to your machine.
since it's immutable, it doesn't matter who has it. You can ever redistribute it, and anyone looking for it will
be able to verify it's correct.

Also, the links are generally meaningful, although, that meaning is dependant on what the object represents,
so consider a blog post that links to images, raw text, the public key of the author, etc - that is all data that is essential to displaying the post.

Incoming links, on the other hand, can be anything... a document could be linked to by comments, other blog posts, spam... anything. In the case of comments and other posts, the author of the original document is probably eager to know of documents that link to him. If this case, they could be replicated to his machine via something like a git push!

Naturally, we will need to start using an expanded hash at some point...
Any suggestions on the best way to do so are highly welcome!

Sorry for the late response, my job is time consuming. A few questions to wrap my head around some simple mechanics.

How would an object structure look like?

docA = {
  __hash: 'hashA'    // every doc created needs to have an unique hash
, __prev: null       // internal cyperlink
, otherDoc: 'hashN'  // external cyperlink
, _id: 'uuid'
, name: 'Smith'
}

update(docA, { name: 'John Smith' }) // => docA

{
  __hash: 'hashB'
, __prev: 'hashA'  
, otherDoc: 'hashN'
, _id: 'uuid'  
, name: 'John Smith'
}

A document MUST link to it's previous version.
A document CAN link to other documents (or sources?)

"the links are generally meaningful, although, that meaning is dependant on what the object represents", can u provide a short example?

I want do draw a diagram, any detailed example would be useful, thanks.

@navaru a doc has a hash, but cannot contain it's own hash. This is impossible, because you do not know what the hash is when you create the document.

Also, not all documents require "previous versions". For example, there is no previous version of a git commit.
A commit just is. by "meaning dependant on what the object represents" I mean that there are many possible types of objects, and an object may link to multiple different types of objects.

An example might be a blog post:

{
title: 'first post',
author: HASH_OF_PUBKEY,
contents: HASH_OF_MARKDOWN_TEXT
date: new Date()
}

This document has two cypherlinks, one to the text of the document, another to the author's key...

A comment might look like this:

{
  commentOn: HASH_OF_POST,
  content: 'wat. i dont even.'
  author: HASH_OF_PUBKEY_OF_COMMENT_AUTHOR
}

This comment points to the original post.
and to it's author.

Both of these would require other signing documents, that might look like this:

{
  signer: HASH_OF_PUBKEY, //who does the signing
  signed: HASH_OF_DOC, //this is the value signed.
  signature: SIGNATURE................................... //this is the signature.
}

There are lots of other ways you could represent such data,
this is just an example. The idea here is to realize the core features,
hashes, signatures, links, and then experiment on the best ways to represent actual data.

How can I follow the progress of this idea? Are you going to be working on this in any other space, or just here for now?

I had your idea too in some way. What sucks on internet is that anything is evolving, its get deleted or changed. If you really want to link data between them, there should be another layer on top of it that ensure that everything is immutable and authentic, hash and merkle tree for compound objects are the tools.

You should take a look to the CCNx ( http://www.ccnx.org/ - described as the future of the internet, you request "data" and not a url) and some ingoing implementation in JS (like: https://github.com/named-data/ndn-js ).
Basically, this is "storing" data in a distributed way on "routers" that also use caching. Data is cut in chunks and signed (keys are being passed around not so clear to me).
This project is awesome for several reason, i will give only two: 1/ of course you get immutability but also 2/ it can be YOUR content that is distributed in all those node and thus if your content is popular you benefits from all the caching infrastructure - don't need to have the firepower of google to handle the load.

Even more, Leveldb would be a great fit as a backend for this. I was thinking about making a lightweight javascript implementation of some CCNx inspired system but it seemed as too much work in a field that I don't master, but I would be happy to contribute to such a project.

A few links:

Regarding the CCNx implementation, the core is in C, some tools are in Java, and the format is XML...
Specifications exist for many things, so we should be aware of it and choose to/or not to use it (eg: Json Web Signature).
I was thinking of using JavaScript and JSON (with b64 encoded stuff) and a leveldb backend.

Heh, this is mad|strong science :)

this looks quite interesting.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.