Skip to content

Instantly share code, notes, and snippets.

@mait
Last active July 8, 2023 20:27
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mait/8001883 to your computer and use it in GitHub Desktop.
Save mait/8001883 to your computer and use it in GitHub Desktop.

Thinking about 'meta' torrent file format.

Let's say I've downloaded big file using torrent. Then add very small file and recreate new torrent file. Like subtitle.

Now two torrent files are totally different file to machine. Tracker and torrent client would treat them different torrent. Of course we don't need duplicate original data file for multi seeding. But seeders and leechers split by two torrent file. They don't know about they have exact same file. Torrent client and tracker cannot connect people for exact same data. We have split share pool for exact same file. It's not efficient. More seeders, more speed.

Let's say original torrent file is 1.torrent.

[ file1 ]

Now I add some file and make new torrent file 2.torrent that looks like,

[
    [ file1 ] => This is 1.torrent.
    + file2
] => This is 2.torrent

Another person got reached 2.torrent. Hey maybe create new torrent file based 2.torrent. So we got 3.torrent.

[ 
    [
        [ file1 ] => This is 1.torrent.
        + file2
    ] => This is 2.torrent
    + file3
] => This is 3.torrent

So if you got 3.torrent, you are in same share pool with 1.torrent, 2.torrent people.

What if there is 4.torrent, 5.torrent or so more in near future?

We maybe query to torrent search engine or DHT, PEX.

"Please give me torrent list based on 3.torrent"

If there is new interesting torrent, we can upgrade 3.torrent -> X.torrent. We don't need any interaction to local files. Only added files will be downloaded.

If you know about source code management tools like git, this idea is basically 'git repoisitory in one torrent file'.

git init
make 1.torrent
git commit
make 2.torrent
git commit
...

TL;DR

  • Torrent file can contain another torrent file.
  • We can keep seeder/leecher pool big as possible as. Don't split us if we have exact same contents.
  • If there is other torrents based on particular torrent, we can discover them.

That's the key points.

How this idea can be real? Is that possible?

@kmag
Copy link

kmag commented Dec 17, 2013

"How can a BT client find an earlier version of this torrent?" is probably not the question you're actually trying to solve. "How can a BT client discover more sources for the data represented by this torrent, given that a subset of that data is also present in other torrents?" is probably the problem you're trying to solve.

Torrent file modifications aren't a branchless phenomenon... A may give rise to B, but then one person might modify B to get C while someone else modifies B to get D, and a third person modifies A to get E. All of these could benefit by knowing about each other.

Advertising a cryptographic Merkle tree root (or other cryptographic hash, though Merkle trees have several advantages) for each file in the DHT would allow the downloaders of these files to find seeders or peers from other torrent swarms, if the Merkle tree roots are added to the per-file descriptions in the "files" section of the torrent.

Normal SHA-256, SHA-3-256, or SHA-1 could be used, but the advantage of Merkle trees is that the 64-MB-granularity (or arbitrary granuality that's a power of two number of kilobytes) row of the Merkle tree can be asked from a peer and that information can be cryptographically verified without having any trust of the peer, and allows backwards-compatible tweaking of granularity without changing the torrent file and (more importantly) without changing the Merkle tree root.

Using Merkle trees would also open up the door for easily and verifiably advertising on the DHT availability of sub-ranges of files, say at the 64 MB granularity. Disk corruption and network corruption happen, and for large rare files, there very well may be another file out there that has only a few bits corrupted and has usable data that other downloaders could use... or the file has some header metadata changed by one person (maybe correcting the date of an MLK speech or something) while the rest of the data remains identical and identically aligned. If the 64 MB granularity row of the Merkle tree for that file is advertised via DHT and present in the torrent, then these "large and nearly identical" files can also be used as source for data. On a side note, anyone creating a file format should place user-editable metadata at the end of the file whenever possible, and the same goes for creators of metadata editing tools. That way, metadata differences don't affect alignment of data and won't prevent partial cross-sharing of files that differ only in their user-edited metadata. Yes, one can deal with the data misalignment issue by using a rolling hash and breaking the file into blocks at bounded-but-irregular lengths, but this is much more complicated than just putting the user-editable metadata at the end of the file.

@predakanga
Copy link

@kmag That seems like an elegant solution - in the non-DHT sphere you could also have search engines that let you find other sources for a particular file using the Merkle tree.

My instinct was to say that that may be rolled in as a de-facto tracker protocol update (/sources as compared to /announce, /scrape), but it seems that more separation of concerns is appropriate there.

That said, it would be good to have that question solved from the start, and I do think that it should be something that the torrent client should be able to automate, outside of the DHT.

@funklord
Copy link

I'm very glad to find this discussion, but I've noticed that the discussion has veered off into three distinct directions:

  1. A way for torrents to share swarms in a metadata independent fashion (cryptographic solutions etc.)
    Which is a very hard problem, and logic dictates that solutions may become necessarily inefficient in some cases.
  2. Torrents that auto-update content.
  3. Patching torrent content by creating a new format.

Obviously all 3 ideas have merit, and should be pursued, but I'm particularly interested in number 3, because there are valid use cases where an existing torrent needs to be changed, and this change needs to be as cheap as possible, also, reasonably easy to add to existing clients.

If we have an existing torrent with lots of activity, but we want to change a couple of filenames, edit some small bits of binary data and add a few files etc.

The previous post by @predakanga seems to be on the right track, so let me get this straight:
We create a new torrent with all the new data but we also add a non-standard extension with another torrent and describe which chunks are available from it.
A new "extended" torrent client can use both swarms for improved performance.
Standard clients will ignore the extension and only see a single torrent, which is still valid, but only has the new, much smaller swarm.
I'm not so sure a URI is a good idea unless it's signed, since, in this case a torrent is still considered to be "final".
This kind of solution would also be acceptable to private trackers, since it doesn't rely on DHT etc.

You seem to have a much firmer grasp on the exact technical issues... such as, how typical modification affects data alignment etc. (therefore invalidating all subsequent chunks)
Any further insight on this would be greatly appreciated.

And, what would such a feature be called?
Nested torrents?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment